使用 Python ElementTree 解析 XML
Parsing XML using Python ElementTree
我有一个 XML 格式如下的文档
<root>
<H D="14/11/2017">
<FC>
<F LV="0">The quick</F>
<F LV="1">brown</F>
<F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
<F LV="0">The lazy</F>
<F LV="1">fox</F>
</FC>
</H>
</root>
如何提取 H 标签内的 'D' 文本以及 F 标签内的所有文本。
We can import this data by reading from a file:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Or directly from a string:
root = ET.fromstring(country_data_as_string)
以及同一页面中的更高版本,20.5.1.4。寻找有趣的元素:
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
翻译成:
import xml.etree.ElementTree as ET
root = ET.fromstring("""
<root>
<H D="14/11/2017">
<FC>
<F LV="0">The quick</F>
<F LV="1">brown</F>
<F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
<F LV="0">The lazy</F>
<F LV="1">fox</F>
</FC>
</H>
</root>""")
# root = tree.getroot()
for h in root.iter("H"):
print (h.attrib["D"])
for f in root.iter("F"):
print (f.attrib, f.text)
输出:
14/11/2017
14/11/2017
{'LV': '0'} The quick
{'LV': '1'} brown
{'LV': '2'} fox
{'LV': '0'} The lazy
{'LV': '1'} fox
您没有具体说明您要使用什么,所以我建议 lxml 用于 python。为了获得你想要的价值,你有更多的可能性:
带循环:
from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
text = []
for element in root:
text.append(element.get('D',None))
for child in element:
for grandchild in child:
text.append(grandchild.text)
print(text)
输出:
['14/11/2017', 'The quick', 'brown', 'fox', '14/11/2017', 'The lazy', 'fox']
使用 xpath:
from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
D = root.xpath("./H")
F = root.xpath(".//F")
for each in D:
print(each.get('D',None))
for each in F:
print(each.text)
输出:
14/11/2017
14/11/2017
快速的
棕色的
狐狸
懒人
狐狸
两者各有优势,但都为您提供了一个良好的起点。
我推荐 xpath,因为它在值是
丢失的。
这应该对你有帮助
import xml.etree.ElementTree as ET
data='''
<root>
<H D="14/11/2017">
<FC>
<F LV="0">The quick</F>
<F LV="1">brown</F>
<F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
<F LV="0">The lazy</F>
<F LV="1">fox</F>
</FC>
</H>
</root>
'''
#list created to store data
D_data=[]
F_data=[]
#data parsed
root= ET.XML(data)
#This will get the value of D
for sub in root:
b=(sub.attrib.get('D'))
D_data.append(b)
#This will get all the text for F tag in xml
for f in root.iter("F"):
b=f.text
#print f.tag,f.attrib,f.text
F_data.append(b)
print D_data
print F_data
我有一个 XML 格式如下的文档
<root>
<H D="14/11/2017">
<FC>
<F LV="0">The quick</F>
<F LV="1">brown</F>
<F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
<F LV="0">The lazy</F>
<F LV="1">fox</F>
</FC>
</H>
</root>
如何提取 H 标签内的 'D' 文本以及 F 标签内的所有文本。
We can import this data by reading from a file:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Or directly from a string:
root = ET.fromstring(country_data_as_string)
以及同一页面中的更高版本,20.5.1.4。寻找有趣的元素:
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
翻译成:
import xml.etree.ElementTree as ET
root = ET.fromstring("""
<root>
<H D="14/11/2017">
<FC>
<F LV="0">The quick</F>
<F LV="1">brown</F>
<F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
<F LV="0">The lazy</F>
<F LV="1">fox</F>
</FC>
</H>
</root>""")
# root = tree.getroot()
for h in root.iter("H"):
print (h.attrib["D"])
for f in root.iter("F"):
print (f.attrib, f.text)
输出:
14/11/2017
14/11/2017
{'LV': '0'} The quick
{'LV': '1'} brown
{'LV': '2'} fox
{'LV': '0'} The lazy
{'LV': '1'} fox
您没有具体说明您要使用什么,所以我建议 lxml 用于 python。为了获得你想要的价值,你有更多的可能性:
带循环:
from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
text = []
for element in root:
text.append(element.get('D',None))
for child in element:
for grandchild in child:
text.append(grandchild.text)
print(text)
输出: ['14/11/2017', 'The quick', 'brown', 'fox', '14/11/2017', 'The lazy', 'fox']
使用 xpath:
from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
D = root.xpath("./H")
F = root.xpath(".//F")
for each in D:
print(each.get('D',None))
for each in F:
print(each.text)
输出: 14/11/2017 14/11/2017 快速的 棕色的 狐狸 懒人 狐狸
两者各有优势,但都为您提供了一个良好的起点。 我推荐 xpath,因为它在值是 丢失的。
这应该对你有帮助
import xml.etree.ElementTree as ET
data='''
<root>
<H D="14/11/2017">
<FC>
<F LV="0">The quick</F>
<F LV="1">brown</F>
<F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
<F LV="0">The lazy</F>
<F LV="1">fox</F>
</FC>
</H>
</root>
'''
#list created to store data
D_data=[]
F_data=[]
#data parsed
root= ET.XML(data)
#This will get the value of D
for sub in root:
b=(sub.attrib.get('D'))
D_data.append(b)
#This will get all the text for F tag in xml
for f in root.iter("F"):
b=f.text
#print f.tag,f.attrib,f.text
F_data.append(b)
print D_data
print F_data