删除 Python 3 lxml 中的所有评论

Question

我有一个 XML 文件，之前我注释了一些元素，现在我想取消注释它们..

我有这个结构

<parent parId="22" attr="Alpha">
 <!--<reg regId="1">
  <cont>There is some content</cont><cont2 attr1="val">Another content</cont2>
 </reg>
--></parent>
<parent parId="23" attr="Alpha">
 <reg regId="1">
  <cont>There is more content</cont><cont2 attr1="noval">Morecont</cont2>
 </reg>
</parent>
<parent parId="24" attr="Alpha">
 <!--<reg regId="1">
  <cont>There is some content</cont><cont2 attr1="val">Another content</cont2>
 </reg>
--></parent>

我想取消注释文件的所有注释。因此，注释元素也是如此，我想取消注释它们。

我能够使用 xpath 找到注释的元素。这是我的代码片段。

def unhide_element():
    path = r'path_to_file\file.xml'
    xml_parser = et.parse(path)
    comments = root.xpath('//comment')
    for c in comments:
       print('Comment: ', c)
       parent_comment = c.getparent()
       parent_comment.replace(c,'')
       tree = et.ElementTree(root)
       tree.write(new_file)

但是，替换无法正常工作，因为它需要另一个元素。

我该如何解决这个问题？

Answer 1

既然您想取消注释所有内容，您真正需要做的就是删除每个“”：

import re

new_xml = ''.join(re.split('<!--|-->', xml))

或者：

new_xml = xml.replace('<!--', '').replace('-->', '')

Answer 2

您的代码缺少从评论文本创建新 XML 元素的关键部分。还有一些其他错误与不正确的 XPath 查询以及在循环内多次保存输出文件有关。

此外，您似乎将 xml.etree 与 lxml.etree 混合使用。根据 documentation, the former ignores comments when the XML file is parsed, so the best way to go is to use lxml.

修复以上所有问题后，我们得到了这样的结果。

import lxml.etree as ET


def unhide_element():
    path = r'test.xml'
    root = ET.parse(path)
    comments = root.xpath('//comment()')
    for c in comments:
        print('Comment: ', c)
        parent_comment = c.getparent()
        parent_comment.remove(c)  # skip this if you want to retain the comment
        new_elem = ET.XML(c.text)  # this bit creates the new element from comment text
        parent_comment.addnext(new_elem)

    root.write(r'new_file.xml')

删除 Python 3 lxml 中的所有评论

Remove all comments in Python 3 lxml

python

xpath

lxml

python-3.x