为什么 LXML Write 不能很好地打印到新文件?

Why is LXML Write not pretty printing to a new file?

我想从一个文件加载一个 XML 模板,修改它,然后将结果保存到一个带格式的新文件中。然而,漂亮的打印并没有添加所需的格式。堆栈上的其他解决方案适用于将树写回同一文件而不是新文件的情况。例如:

from lxml import etree as ET 

parser = ET.XMLParser(remove_blank_text=True) 
tree = ET.parse("template.xml", parser) 
root = tree.getroot() 
A = ET.SubElement(root, "A") 
ET.SubElement(A, "a") 
B = ET.SubElement(root, "B") 
ET.SubElement(B, "b") 
tree.write("output.xml", pretty_print=True)

template.xml

<document>
</document>

output.xml写的没有格式化

<document>
<A><a/></A><B><b/></B></document>

template.xml里面的文字编辑成这样:

<document></document>

然后 运行 再次输入您的代码,您将得到:

<document>
  <A>
    <a/>
  </A>
  <B>
    <b/>
  </B>
</document>

但重要的问题是为什么?!

答案可以在 formal documentation 中找到,其中指出:

Pretty printing (or formatting) an XML document means adding white space to the content. These modifications are harmless if they only impact elements in the document that do not carry (text) data. They corrupt your data if they impact elements that contain data. If lxml cannot distinguish between whitespace and data, it will not alter your data. Whitespace is therefore only added between nodes that do not contain data. This is always the case for trees constructed element-by-element.