如何在python中评论xml的整个块和单个标签?
How to comment entire block and single tag of xml in python?
如何在python中评论xml的整个特定块和特定标签?
下面xml,有很多<list>
个标签。
1) 必须注释整个块 <list> {some_data}</list>
,其中 <list name="list_name1">
2) 如果观察 <list name="list_name3">
,在 <item>
中有 2 个 <p> tags
。
<p name="address1">some/address-3</p><p name="address1_1">some/address-1_1</p>
在这里,必须要评论第二个<p> tag
,即<p name="address1_1">some/address-1_1</p>
,所有这样的例子。
How can we achieve this in python ?
Which is best xml module in python ?
sample_file.xml
<raml xmlns="abcd.xsd" version="0.1">
<newData type="hw">
<header>
<log action="create" dateTime="2020-01-15T16:45:12.001Z" />
</header>
<sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845">
<p name="p_name1">true</p>
<list name="list_name1">
<item>
<p name="address1">some/address-1</p>
<p name="value">some/value-1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">some/value-2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">some/value-3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">some/value-4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">some/value-5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">some/value-6</p>
</item>
</list>
<list name="list_name2">
<item>
<p name="address1">some/address-1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
<list name="list_name3">
<item>
<p name="address1">some/address-1</p>
<p name="address1_1">some/address-1_1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">1_1</p>
<item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="address1_1">some/address-1_1</p>
<p name="value">3</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">3_3</p>
<item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
</sampleObject>
</newData>
</raml>
output_file.xml ,应该如下所示
<raml xmlns="abcd.xsd" version="0.1">
<newData type="hw">
<header>
<log action="create" dateTime="2020-01-15T16:45:12.001Z" />
</header>
<sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845">
<p name="p_name1">true</p>
<!--<list name="list_name1">
<item>
<p name="address1">some/address-1</p>
<p name="value">some/value-1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">some/value-2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">some/value-3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">some/value-4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">some/value-5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">some/value-6</p>
</item>
</list> -->
<list name="list_name2">
<item>
<p name="address1">some/address-1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
<list name="list_name3">
<item>
<p name="address1">some/address-1</p>
<!--<p name="address1_1">some/address-1_1</p>-->
<p name="value">1</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">1_1</p>
<item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<!--<p name="address1_1">some/address-1_1</p>-->
<p name="value">3</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">3_3</p>
<item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
</sampleObject>
</newData>
</raml>
lxml 可以用另一个元素替换任何元素,包括注释,
但不幸的是,如果您从现有元素创建此评论的文本,
lxml 将默认命名空间再次复制到注释文本中。
所以我决定使用 BeautifulSoup 而不是 lxml,它处理命名空间
更多 "leniently".
试试下面的代码:
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(open('Input.xml'), 'xml')
for elem in soup.findAll('list'):
elem.replace_with(Comment(str(elem)))
print(soup.prettify())
根据您的输入 XML,缩短了一点,我得到:
<?xml version="1.0" encoding="utf-8"?>
<raml version="0.1" xmlns="abcd.xsd">
<newData type="hw">
<header>
<log action="create" dateTime="2020-01-15T16:45:12.001Z"/>
</header>
<sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845">
<p name="p_name1">true</p>
<!--<list name="list_name1">
<item>
<p name="address1">some/address-1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
</list>-->
<!--<list name="list_name2">
<item>
<p name="address1">some/address-3</p>
<p name="value">3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
</list>-->
</sampleObject>
</newData>
</raml>
编辑
如果您只想注释掉一个 list 元素(例如 name
属性设置为'list_name1'),修正很简单:
findAll还有一个参数,即attrs(字典),
您可以在其中传递任何属性名称/值以缩小选择范围。
在这种情况下,将循环更改为:
for elem in soup.findAll('list', attrs={'name': 'list_name1'}):
elem.replace_with(Comment(str(elem)))
只删除选定的元素,方法的名称是
不太直观,即 decompose.
做到这一点,运行:
for elem in soup.findAll('list', attrs={'name': 'list_name1'}):
elem.decompose()
根据有关 XML 前缀的评论进行编辑
删除 XML 前缀的一个方法是调用 BeautifulSoup 没有
第二个 xml 参数。
但是输出中的根元素是html,里面包含body
元素和 raml 元素在其中。
因此要删除这 2 个 "outer" 元素,请将代码更改为:
soup = BeautifulSoup(open('Input.xml'))
for elem in soup.findAll('list'):
elem.replace_with(Comment(str(elem)))
print(soup.html.body.raml.prettify())
也例如<p>
元素保留在 单行 行中。
有点"dirty"解决方案,但希望能达到预期的结果。
如何在python中评论xml的整个特定块和特定标签?
下面xml,有很多<list>
个标签。
1) 必须注释整个块 <list> {some_data}</list>
,其中 <list name="list_name1">
2) 如果观察 <list name="list_name3">
,在 <item>
中有 2 个 <p> tags
。
<p name="address1">some/address-3</p><p name="address1_1">some/address-1_1</p>
在这里,必须要评论第二个<p> tag
,即<p name="address1_1">some/address-1_1</p>
,所有这样的例子。
How can we achieve this in python ?
Which is best xml module in python ?
sample_file.xml
<raml xmlns="abcd.xsd" version="0.1">
<newData type="hw">
<header>
<log action="create" dateTime="2020-01-15T16:45:12.001Z" />
</header>
<sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845">
<p name="p_name1">true</p>
<list name="list_name1">
<item>
<p name="address1">some/address-1</p>
<p name="value">some/value-1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">some/value-2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">some/value-3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">some/value-4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">some/value-5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">some/value-6</p>
</item>
</list>
<list name="list_name2">
<item>
<p name="address1">some/address-1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
<list name="list_name3">
<item>
<p name="address1">some/address-1</p>
<p name="address1_1">some/address-1_1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">1_1</p>
<item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="address1_1">some/address-1_1</p>
<p name="value">3</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">3_3</p>
<item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
</sampleObject>
</newData>
</raml>
output_file.xml ,应该如下所示
<raml xmlns="abcd.xsd" version="0.1">
<newData type="hw">
<header>
<log action="create" dateTime="2020-01-15T16:45:12.001Z" />
</header>
<sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845">
<p name="p_name1">true</p>
<!--<list name="list_name1">
<item>
<p name="address1">some/address-1</p>
<p name="value">some/value-1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">some/value-2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">some/value-3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">some/value-4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">some/value-5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">some/value-6</p>
</item>
</list> -->
<list name="list_name2">
<item>
<p name="address1">some/address-1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<p name="value">3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
<list name="list_name3">
<item>
<p name="address1">some/address-1</p>
<!--<p name="address1_1">some/address-1_1</p>-->
<p name="value">1</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">1_1</p>
<item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
<item>
<p name="address1">some/address-3</p>
<!--<p name="address1_1">some/address-1_1</p>-->
<p name="value">3</p>
</item>
<item>
<p name="address1_1">some/address-1_1</p>
<p name="value">3_3</p>
<item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
<item>
<p name="address1">some/address-5</p>
<p name="value">5</p>
</item>
<item>
<p name="address1">some/address-6</p>
<p name="value">6</p>
</item>
</list>
</sampleObject>
</newData>
</raml>
lxml 可以用另一个元素替换任何元素,包括注释, 但不幸的是,如果您从现有元素创建此评论的文本, lxml 将默认命名空间再次复制到注释文本中。
所以我决定使用 BeautifulSoup 而不是 lxml,它处理命名空间 更多 "leniently".
试试下面的代码:
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(open('Input.xml'), 'xml')
for elem in soup.findAll('list'):
elem.replace_with(Comment(str(elem)))
print(soup.prettify())
根据您的输入 XML,缩短了一点,我得到:
<?xml version="1.0" encoding="utf-8"?>
<raml version="0.1" xmlns="abcd.xsd">
<newData type="hw">
<header>
<log action="create" dateTime="2020-01-15T16:45:12.001Z"/>
</header>
<sampleObject class="com.abcd.efgh:VASDF" distName="some_unique_name" operation="update" version="HDGEKB_8363_845">
<p name="p_name1">true</p>
<!--<list name="list_name1">
<item>
<p name="address1">some/address-1</p>
<p name="value">1</p>
</item>
<item>
<p name="address1">some/address-2</p>
<p name="value">2</p>
</item>
</list>-->
<!--<list name="list_name2">
<item>
<p name="address1">some/address-3</p>
<p name="value">3</p>
</item>
<item>
<p name="address1">some/address-4</p>
<p name="value">4</p>
</item>
</list>-->
</sampleObject>
</newData>
</raml>
编辑
如果您只想注释掉一个 list 元素(例如 name 属性设置为'list_name1'),修正很简单:
findAll还有一个参数,即attrs(字典), 您可以在其中传递任何属性名称/值以缩小选择范围。
在这种情况下,将循环更改为:
for elem in soup.findAll('list', attrs={'name': 'list_name1'}):
elem.replace_with(Comment(str(elem)))
只删除选定的元素,方法的名称是 不太直观,即 decompose.
做到这一点,运行:
for elem in soup.findAll('list', attrs={'name': 'list_name1'}):
elem.decompose()
根据有关 XML 前缀的评论进行编辑
删除 XML 前缀的一个方法是调用 BeautifulSoup 没有 第二个 xml 参数。
但是输出中的根元素是html,里面包含body 元素和 raml 元素在其中。
因此要删除这 2 个 "outer" 元素,请将代码更改为:
soup = BeautifulSoup(open('Input.xml'))
for elem in soup.findAll('list'):
elem.replace_with(Comment(str(elem)))
print(soup.html.body.raml.prettify())
也例如<p>
元素保留在 单行 行中。
有点"dirty"解决方案,但希望能达到预期的结果。