通过 lxml etree 提取 Raw XML

Question

我正在尝试从 XML 文件中提取原始 XML。

所以如果我的数据是：

        <xml>
            ... Lots of XML ...

            <getThese>
                <clonedKey>1</clonedKey>
                <clonedKey>2</clonedKey>
                <clonedKey>3</clonedKey>
                <randomStuff>this is a sentence</randomStuff>
            </getThese>         
            <getThese>
                <clonedKey>6</clonedKey>
                <clonedKey>8</clonedKey>
                <clonedKey>3</clonedKey>
                <randomStuff>more words</randomStuff>
            </getThese>

            ... Lots of XML ...

        </xml>

我可以使用 etree 轻松获得我想要的密钥：

from lxml import etree
search_me = etree.fromstring(xml_str)
search_me.findall('./xml/getThis')

但是我如何获得原始的实际内容 XML？我在文档中看到的所有内容都是为了获取 elements/text/attributes 而不是原始的 XML.

我想要的输出是一个包含两个元素的列表：

["<getThese>
                <clonedKey>1</clonedKey>
                <clonedKey>2</clonedKey>
                <clonedKey>3</clonedKey>
                <randomStuff>this is a sentence</randomStuff>
            </getThese>",
"<getThese>
                <clonedKey>6</clonedKey>
                <clonedKey>8</clonedKey>
                <clonedKey>3</clonedKey>
                <randomStuff>more words</randomStuff>
            </getThese>"]

Answer 1

您应该可以使用 tostring() to serialize 和 XML。

示例...

from lxml import etree

xml = """
<xml>
    <getThese>
        <clonedKey>1</clonedKey>
        <clonedKey>2</clonedKey>
        <clonedKey>3</clonedKey>
        <randomStuff>this is a sentence</randomStuff>
    </getThese>         
    <getThese>
        <clonedKey>6</clonedKey>
        <clonedKey>8</clonedKey>
        <clonedKey>3</clonedKey>
        <randomStuff>more words</randomStuff>
    </getThese>
</xml>
"""

parser = etree.XMLParser(remove_blank_text=True)

tree = etree.fromstring(xml, parser=parser)

elems = []

for elem in tree.xpath("getThese"):
    elems.append(etree.tostring(elem).decode())

print(elems)

打印输出...

['<getThese><clonedKey>1</clonedKey><clonedKey>2</clonedKey><clonedKey>3</clonedKey><randomStuff>this is a sentence</randomStuff></getThese>', '<getThese><clonedKey>6</clonedKey><clonedKey>8</clonedKey><clonedKey>3</clonedKey><randomStuff>more words</randomStuff></getThese>']

通过 lxml etree 提取 Raw XML

Extracting Raw XML via lxml etree

python

lxml

python-3.x