根据元素的值创建更小的 XML
Create Smaller XML based on value of element
在 Python 3.7 上,我希望创建 XML 的一个子集。比如较大的XML是:
<data>
<student>
<result>
<grade>A</grade>
</result>
<details>
<name>John</name>
<id>100</id>
<age>16</age>
<email>john@mail.com</email>
</details>
</student>
<student>
<result>
<grade>B</grade>
</result>
<details>
<name>Alice</name>
<id>101</id>
<age>17</age>
<email>alice@mail.com</email>
</details>
</student>
<student>
<result>
<grade>F</grade>
</result>
<details>
<name>Bob</name>
<id>102</id>
<age>16</age>
<email>bob@mail.com</email>
</details>
</student>
<student>
<result>
<grade>A</grade>
</result>
<details>
<name>Hannah</name>
<id>103</id>
<age>17</age>
<email>hannah@mail.com</email>
</details>
</student>
</data>
并且正在寻找如下所示的新 XML,创建较小子集的条件取决于 ID 列表,在本例中为 101 和 102。所有其他学生块将被删除。
<data>
<student>
<result>
<grade>B</grade>
</result>
<details>
<name>Alice</name>
<id>101</id>
<age>17</age>
<email>alice@mail.com</email>
</details>
</student>
<student>
<result>
<grade>F</grade>
</result>
<details>
<name>Bob</name>
<id>102</id>
<age>16</age>
<email>bob@mail.com</email>
</details>
</student>
</data>
即输出 XML 将取决于 id 列表,在本例中为 ['101',102']
这是我试过的:
import lxml.etree
#Original Large XML
tree = etree.parse(open('students.xml'))
root = tree.getroot()
results = root.findall('student')
textnumbers = [r.find('details/id').text for r in results]
print(textnumbers)
required_ids = ['101','102']
wanted = tree.xpath("//student/details/[not(@id in required_ids)]")
for node in unwanted:
node.getparent().remove(node)
#New Smaller XML
tree.write(open('student_output.xml', 'wb'))
但我得到
的预期错误 "Invalid expression"
wanted = tree.xpath("//student/details/[not(@id in required_ids)]")
我知道这是一本读物,但我对 Python 还很陌生,在此先感谢您的帮助。
我想你可以这样做:
from lxml import etree as ET
required_ids = ['101','102']
for event, element in ET.iterparse('students.xml'):
if element.tag == 'student' and not(element.xpath('.//id/text()')[0] in required_ids):
element.clear()
element.getparent().remove(element)
if element.tag == 'data':
ET.dump(element)
而不是 dump
你当然想写入一个文件,那就是使用
if element.tag == 'data':
tree = ET.ElementTree(element)
tree.write('student_output.xml')
您的尝试失败,因为您不能简单地在 XPath 中使用 Python 列表变量,并且 in
不是 XPath 1.0 运算符。
在 Python 3.7 上,我希望创建 XML 的一个子集。比如较大的XML是:
<data>
<student>
<result>
<grade>A</grade>
</result>
<details>
<name>John</name>
<id>100</id>
<age>16</age>
<email>john@mail.com</email>
</details>
</student>
<student>
<result>
<grade>B</grade>
</result>
<details>
<name>Alice</name>
<id>101</id>
<age>17</age>
<email>alice@mail.com</email>
</details>
</student>
<student>
<result>
<grade>F</grade>
</result>
<details>
<name>Bob</name>
<id>102</id>
<age>16</age>
<email>bob@mail.com</email>
</details>
</student>
<student>
<result>
<grade>A</grade>
</result>
<details>
<name>Hannah</name>
<id>103</id>
<age>17</age>
<email>hannah@mail.com</email>
</details>
</student>
</data>
并且正在寻找如下所示的新 XML,创建较小子集的条件取决于 ID 列表,在本例中为 101 和 102。所有其他学生块将被删除。
<data>
<student>
<result>
<grade>B</grade>
</result>
<details>
<name>Alice</name>
<id>101</id>
<age>17</age>
<email>alice@mail.com</email>
</details>
</student>
<student>
<result>
<grade>F</grade>
</result>
<details>
<name>Bob</name>
<id>102</id>
<age>16</age>
<email>bob@mail.com</email>
</details>
</student>
</data>
即输出 XML 将取决于 id 列表,在本例中为 ['101',102']
这是我试过的:
import lxml.etree
#Original Large XML
tree = etree.parse(open('students.xml'))
root = tree.getroot()
results = root.findall('student')
textnumbers = [r.find('details/id').text for r in results]
print(textnumbers)
required_ids = ['101','102']
wanted = tree.xpath("//student/details/[not(@id in required_ids)]")
for node in unwanted:
node.getparent().remove(node)
#New Smaller XML
tree.write(open('student_output.xml', 'wb'))
但我得到
的预期错误 "Invalid expression"wanted = tree.xpath("//student/details/[not(@id in required_ids)]")
我知道这是一本读物,但我对 Python 还很陌生,在此先感谢您的帮助。
我想你可以这样做:
from lxml import etree as ET
required_ids = ['101','102']
for event, element in ET.iterparse('students.xml'):
if element.tag == 'student' and not(element.xpath('.//id/text()')[0] in required_ids):
element.clear()
element.getparent().remove(element)
if element.tag == 'data':
ET.dump(element)
而不是 dump
你当然想写入一个文件,那就是使用
if element.tag == 'data':
tree = ET.ElementTree(element)
tree.write('student_output.xml')
您的尝试失败,因为您不能简单地在 XPath 中使用 Python 列表变量,并且 in
不是 XPath 1.0 运算符。