正在解析 XML Python

Question

我正在使用 xml.etree.ElementTree 来解析 XML 文件。我有个问题。我不知道如何获取标签之间的纯文本行。

<Sync time="4.496"/>
<Background time="4.496" type="music" level="high"/>

<Event desc="pause" type="noise" extent="instantaneous"/>
Plain text
<Sync time="7.186"/>

<Event desc="b" type="noise" extent="instantaneous"/>
Plain text
<Sync time="10.949"/>
Plain text

我已经有了这个代码：

import xml.etree.ElementTree as etree
import os

data_file = "./file.xml"

xmlD = etree.parse(data_file)
root = xmlD.getroot()
sections = root.getchildren()[2].getchildren()
for section in sections:
    turns = section.getchildren()
    for turn in turns:
        speaker = turn.get('speaker')
    mode = turn.get('mode')
    childs = turn.getchildren()

        for child in childs:
            time = child.get('time')
            opt = child.get('desc')
            if opt == 'es':
                 opt = "ESP:"
            elif opt == "la":
                 opt = "LATIN:"
            elif opt == "*":
                 opt = "-ININT-"
            elif opt == "fs":
                 opt = "-FS-"
            elif opt == "throat":
                 opt = "-THROAT-"
            elif opt == "laugh":
                 opt = "-LAUGH-"
            else:
                 opt = ""

            print speaker, mode, time, opt+child.tail.encode('latin-1')

我可以通过XML访问到Sync|Background|Event标签，但无法提取这些标签后的文字。我放了 XML 文件的一部分，而不是整个文件。我只有最后一段代码有问题

非常感谢@alecxe。现在我可以获得我需要的信息。但是现在我有一个新的小问题。我获得了输入 tail 命令的行，但是之前生成了一个换行符 \n 或类似的东西，所以，我需要类似的东西： spk1 planned LAN: Plain text from tail>

但我明白了：

spk1 planned LAN: Plain text from tail

我已经尝试了很多东西，re.match() 模块，sed 处理 XML 后的命令，但似乎没有 \n 换行符，但我不能"put up"纯文本！提前谢谢你

有人吗？谢谢！

Answer 1

这叫做 tail of an element:

The tail attribute can be used to hold additional data associated with the element. This attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found after the element’s end tag and before the next tag.

找到Event标签，得到尾巴，例子：

section.find("Event").tail

正在解析 XML Python

Parsing XML Python

python

xml

parsing

newline

xml-parsing