重新格式化 XML 以标准化 tabs/indents

Question

我有一个 xml 文件需要进行漂亮的打印以供人们使用。多年来，我们使用 xmlspy 进行了更改，并使用它的 gridview 函数在签入 git 之前使缩进标准化。我不想将用户锁定在该程序中，因此计划在签入时向运行添加一个 python 脚本，该脚本将读入 xml 文件，使用标准缩进重新格式化，然后将其写到一个将被签入的文件中。我使用了下面的代码，在下面的许多类似问题的答案中引用了它。如果您的 xml 文件没有 tabs 和 carriage-returns，它可能工作正常，但它似乎没有触及已经存在的格式。例如，如果我的 xml 文件如下所示，我希望 <Grape> 排成一行，但这实际上并没有发生在我的输出中。<Grape> 在输出中有额外的缩进文件

例子

<Fruit>
  <Apple/>
     <Grape/>
  <Pear/>
</Fruit>

代码

import lxml.etree as etree
output_file = open("output.txt", "w")
parsed_file = etree.parse("input.xml")
parsed_bytes = (etree.tostring(parsed_file, pretty_print=True, encoding="unicode"))
output_file.write(parsed_bytes)

更多信息来自于更多 我认为部分问题是如果我在 mt xml 中有任何 tabs/spaces，pretty_print 似乎不会被调用。如果我的源文件是预先剥离的，那么漂亮的打印效果很好，但如果我把它分成两行，它就不会打印出来。

 <Fruit><Apple/><Grape/><Pear/></Fruit>

Answer 1

<2> 不符合，因为您没有提供 格式良好 XML 代码。要使您的 "XML" 代码 格式正确 ，您应该将其转换为如下形式：

<Numbers>
  <1 />
  <2 />
  <3 />
</Numbers>

或

<Numbers>
  <1>
     <2 />
  </1>
  <3 />
</Numbers>

最后：
根据XML specification v1.1 an NTName cannot start with a number. It must start with a NameStartChar（但我猜你的命名方案只是为了说明目的）。

考虑到所有这些，结果应该符合预期。

Answer 2

需要 remove_blank_text 的解析器设置

import lxml.etree as etree
output_file = open("output.xml", "w")

parser = etree.XMLParser(remove_blank_text=True) 
parsed_file = etree.parse("inputstrong text.xml", parser)
parsed_bytes = (etree.tostring(parsed_file, pretty_print=True))
parsed_string = str(parsed_bytes, 'utf-8')
output_file.write(parsed_string)

重新格式化 XML 以标准化 tabs/indents

Reformatting XML to standardize tabs/indents

python

xml

lxml

elementtree