Python3 无法使用 lxml.etree.find 获取 XML 元素值

Python3 cannot get XML element value with lxml.etree.find

我正在尝试处理 POST 响应,但我收到了 XML。 结果保存为字节 b'':

<?xml version="1.0" encoding="utf-8"?>
<result xmlns="http://something.com/Schema/V2/Result">
    <success>false</success>
    <returnType>ERROR</returnType>
    <errors>
        <error>
            <message>Invalid signature</message>
            <code>3002</code>
        </error>
    </errors>
</result>

代码:

from lxml import etree as et

root_node = et.fromstring(response.content)
print('{}'.format(root_node.find('.//returnType')))
return_type = root_node.find('.//returnType').text

打印语句returnNone,所以find().text抛出异常

如果我使用 for 遍历子节点,我得到了节点,但是我无法处理命名空间。

for tag in root_node.getchildren():
    print(tag)

<Element {http://something.com/Schema/V2/Result}returnType at 0x7f6c95542648>

如何获取 XML 节点及其值?我已经针对类似的问题尝试了 Whosebug 的答案,但没有任何效果。尝试使用正则表达式删除架构并向 NS 添加前缀。

编辑:尝试了答案并得到了我无法获取节点的标准错误。

    /usr/bin/python3 /home/samoa/Scripts/Python/lxml_test.py
Traceback (most recent call last):
  File "/home/samoa/Scripts/Python/lxml_test.py", line 17, in <module>
    print(root.find("returnType", root.nsmap).text)
  File "src/lxml/lxml.etree.pyx", line 1537, in lxml.etree._Element.find (src/lxml/lxml.etree.c:58520)
  File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 288, in find
    it = iterfind(elem, path, namespaces)
  File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 277, in iterfind
    selector = _build_path_iterator(path, namespaces)
  File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 234, in _build_path_iterator
    raise ValueError("empty namespace prefix is not supported in ElementPath")
ValueError: empty namespace prefix is not supported in ElementPath

将命名空间映射传递给 find() 方法。由于 http://something.com/Schema/V2/Result 是文档中的默认命名空间,因此您只需要做这些:

return_type_element = root_node.find('.//returnType', root_node.nsmap)

或:

return_type_element = root_node.find('returnType', root_node.nsmap)

此外,str.format() 位于:

print('{}'.format(root_node.find('.//returnType')))

是不必要的,可以缩短为:

return_type_element = root_node.find('returnType', root_node.nsmap)
print(return_type_element)

# <Element {http://something.com/Schema/V2/Result}returnType at 0x107c28bc0>

但是,如果您想将 return_type_element 打印为 XML,请使用 lxml.etree.tostring() 函数:

print(ET.tostring(return_type_element))

# b'<returnType xmlns="http://something.com/Schema/V2/Result">ERROR</returnType>\n    '

因此,您的return_type可以通过以下方式获得:

return_type = root_node.find('returnType', root_node.nsmap).text

我的测试脚本是:

#!/usr/bin/env python3
from lxml import etree as ET

content = b'''<?xml version="1.0" encoding="utf-8"?>
<result xmlns="http://something.com/Schema/V2/Result">
    <success>false</success>
    <returnType>ERROR</returnType>
    <errors>
        <error>
            <message>Invalid signature</message>
            <code>3002</code>
        </error>
    </errors>
</result>
'''

root = ET.fromstring(content)
emptyns = root.nsmap[None]
print(root.find("{%s}returnType" % (emptyns)).text)

# step-by-step

root = ET.fromstring(content)
print("Root element: %s" % (root))

emptyns = root.nsmap[None]
print("Empty namespace: %s" % (emptyns))

return_type_element = root.find("{%s}returnType" % (emptyns))
print("<returnType> element: %s" % (return_type_element))
print("<returnType> element as XML: %s" % (ET.tostring(return_type_element)))

return_type = return_type_element.text
print('<returnType> text: %s' % (return_type))

# children

for element in root.getchildren():
    print("Element tag (with namespace): %s" % (element.tag))
    _, _, tag = element.tag.rpartition("}")
    print("Element tag (without namespace): %s" % (tag))

其结果为:

ERROR
Root element: <Element {http://something.com/Schema/V2/Result}result at 0x102f63188>
Empty namespace: http://something.com/Schema/V2/Result
<returnType> element: <Element {http://something.com/Schema/V2/Result}returnType at 0x102f630c8>
<returnType> element as XML: b'<returnType xmlns="http://something.com/Schema/V2/Result">ERROR</returnType>\n    '
<returnType> text: ERROR
Element tag (with namespace): {http://something.com/Schema/V2/Result}success
Element tag (without namespace): success
Element tag (with namespace): {http://something.com/Schema/V2/Result}returnType
Element tag (without namespace): returnType
Element tag (with namespace): {http://something.com/Schema/V2/Result}errors
Element tag (without namespace): errors