正在解析 XML Python
Parsing XML Python
我正在使用 xml.etree.ElementTree
来解析 XML 文件。我有个问题。我不知道如何获取标签之间的纯文本行。
<Sync time="4.496"/>
<Background time="4.496" type="music" level="high"/>
<Event desc="pause" type="noise" extent="instantaneous"/>
Plain text
<Sync time="7.186"/>
<Event desc="b" type="noise" extent="instantaneous"/>
Plain text
<Sync time="10.949"/>
Plain text
我已经有了这个代码:
import xml.etree.ElementTree as etree
import os
data_file = "./file.xml"
xmlD = etree.parse(data_file)
root = xmlD.getroot()
sections = root.getchildren()[2].getchildren()
for section in sections:
turns = section.getchildren()
for turn in turns:
speaker = turn.get('speaker')
mode = turn.get('mode')
childs = turn.getchildren()
for child in childs:
time = child.get('time')
opt = child.get('desc')
if opt == 'es':
opt = "ESP:"
elif opt == "la":
opt = "LATIN:"
elif opt == "*":
opt = "-ININT-"
elif opt == "fs":
opt = "-FS-"
elif opt == "throat":
opt = "-THROAT-"
elif opt == "laugh":
opt = "-LAUGH-"
else:
opt = ""
print speaker, mode, time, opt+child.tail.encode('latin-1')
我可以通过XML访问到Sync|Background|Event标签,但无法提取这些标签后的文字。我放了 XML 文件的一部分,而不是整个文件。我只有最后一段代码有问题
非常感谢@alecxe。现在我可以获得我需要的信息。但是现在我有一个新的小问题。我获得了输入 tail
命令的行,但是之前生成了一个换行符 \n
或类似的东西,所以,我需要类似的东西:
spk1 planned LAN: Plain text from tail
>
但我明白了:
spk1 planned LAN:
Plain text from tail
我已经尝试了很多东西,re.match()
模块,sed
处理 XML 后的命令,但似乎没有 \n
换行符,但我不能"put up"纯文本!提前谢谢你
有人吗?谢谢!
这叫做 tail
of an element:
The tail attribute can be used to hold additional data associated with
the element. This attribute is usually a string but may be any
application-specific object. If the element is created from an XML
file the attribute will contain any text found after the element’s end
tag and before the next tag.
找到Event
标签,得到尾巴,例子:
section.find("Event").tail
我正在使用 xml.etree.ElementTree
来解析 XML 文件。我有个问题。我不知道如何获取标签之间的纯文本行。
<Sync time="4.496"/>
<Background time="4.496" type="music" level="high"/>
<Event desc="pause" type="noise" extent="instantaneous"/>
Plain text
<Sync time="7.186"/>
<Event desc="b" type="noise" extent="instantaneous"/>
Plain text
<Sync time="10.949"/>
Plain text
我已经有了这个代码:
import xml.etree.ElementTree as etree
import os
data_file = "./file.xml"
xmlD = etree.parse(data_file)
root = xmlD.getroot()
sections = root.getchildren()[2].getchildren()
for section in sections:
turns = section.getchildren()
for turn in turns:
speaker = turn.get('speaker')
mode = turn.get('mode')
childs = turn.getchildren()
for child in childs:
time = child.get('time')
opt = child.get('desc')
if opt == 'es':
opt = "ESP:"
elif opt == "la":
opt = "LATIN:"
elif opt == "*":
opt = "-ININT-"
elif opt == "fs":
opt = "-FS-"
elif opt == "throat":
opt = "-THROAT-"
elif opt == "laugh":
opt = "-LAUGH-"
else:
opt = ""
print speaker, mode, time, opt+child.tail.encode('latin-1')
我可以通过XML访问到Sync|Background|Event标签,但无法提取这些标签后的文字。我放了 XML 文件的一部分,而不是整个文件。我只有最后一段代码有问题
非常感谢@alecxe。现在我可以获得我需要的信息。但是现在我有一个新的小问题。我获得了输入 tail
命令的行,但是之前生成了一个换行符 \n
或类似的东西,所以,我需要类似的东西:
spk1 planned LAN: Plain text from tail
>
但我明白了:
spk1 planned LAN:
Plain text from tail
我已经尝试了很多东西,re.match()
模块,sed
处理 XML 后的命令,但似乎没有 \n
换行符,但我不能"put up"纯文本!提前谢谢你
有人吗?谢谢!
这叫做 tail
of an element:
The tail attribute can be used to hold additional data associated with the element. This attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found after the element’s end tag and before the next tag.
找到Event
标签,得到尾巴,例子:
section.find("Event").tail