xml 使用 xPath 解析和提取属性值

xml parsing and extract attribute value with xPath

我想用 xPath

提取 N.1.2、N.1.1、N.2.r.1、....、N.1.3、N.1.4

所以,我的字典里有xpath。

# Value - Types of Message in batch
"N.1.1": R3Item(
    elemId="N.1.1",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@code",
    required=True,
    comment="N.1.1 - Types of Message in batch",
),
# Types of Message in batch
"N.1.1_csv": R3Item(
    elemId="N.1.1_csv",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@codeSystemVersion",
    required=True,
),
# Value - Batch Number
"N.1.2": R3Item(
    elemId="N.1.2",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/id[@root='2.16.840.1.113883.3.989.2.1.3.22']/@extension",
    required=True,
    comment="N.1.2 - Batch Number",
),
# Value - Batch Sender Identifier
"N.1.3": R3Item(
    elemId="N.1.3",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.13'][1]/@extension",
    required=True,
    comment="N.1.3 - Batch Sender Identifier",
),
# Value - Batch Receiver Identifier
"N.1.4": R3Item(
    elemId="N.1.4",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.14'][1]/@extension",
    required=True,
    comment="N.1.4 - Batch Receiver Identifier",
),
# Value - Date of Batch Transmission
"N.1.5": R3Item(
    elemId="N.1.5",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/creationTime/@value",
    required=True,
    comment="N.1.5 - Date of Batch Transmission",
),
# Value - Message Identifier
"N.2.r.1": R3Item(
    elemId="N.2.r.1",
    xPath="//PORR_IN049016UV[r]/id[@root='2.16.840.1.113883.3.989.2.1.3.1'][1]/@extension",
    required=True,
    comment="N.2.r.1 - Message Identifier",
),
# Value - Message Sender Identifier
"N.2.r.2": R3Item(
    elemId="N.2.r.2",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.11'][1]/@extension",
    required=True,
    comment="N.2.r.2 - Message Sender Identifier",
),
# Value - Message Receiver Identifier
"N.2.r.3": R3Item(
    elemId="N.2.r.3",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.12'][1]/@extension",
    required=True,
    comment="N.2.r.3 - Message Receiver Identifier",
),
# Value - Date of Message Creation
"N.2.r.4": R3Item(
    elemId="N.2.r.4",
    xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/creationTime/@value",
    required=True,
    comment="N.2.r.4 - Date of Message Creation",
),

下面是部分样例xml

<?xml version="1.0" encoding="UTF-8"?>
<MCCI_IN200100UV01 ITSVersion="XML_1.0" xsi:schemaLocation="urn:hl7-org:v3 MCCI_IN200100UV01.xsd" xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <id extension="N.1.2" root="2.16.840.1.113883.3.989.2.1.3.22"/>
    <creationTime value="N.1.5"/>
    <responseModeCode code="D"/>
    <interactionId extension="MCCI_IN200100UV01" root="2.16.840.1.113883.1.6"/>
    <name code="N.1.1" codeSystem="2.16.840.1.113883.3.989.2.1.1.1" codeSystemVersion="1.01"/>
    <PORR_IN049016UV>
        <id extension="N.2.r.1" root="2.16.840.1.113883.3.989.2.1.3.1"/>
        <creationTime value="N.2.r.4"/>
        <interactionId extension="PORR_IN049016UV" root="2.16.840.1.113883.1.6"/>
        <processingCode code="P"/>
        <processingModeCode code="T"/>
        <acceptAckCode code="AL"/>
        <receiver typeCode="RCV">
            <device classCode="DEV" determinerCode="INSTANCE">
                <id extension="N.2.r.3" root="2.16.840.1.113883.3.989.2.1.3.12"/>
            </device>
        </receiver>
    </PORR_IN049016UV>
    <receiver typeCode="RCV">
        <device classCode="DEV" determinerCode="INSTANCE">
            <id extension="N.1.4" root="2.16.840.1.113883.3.989.2.1.3.14"/>
        </device>
    </receiver>
    <sender typeCode="SND">
        <device classCode="DEV" determinerCode="INSTANCE">
            <id extension="N.1.3" root="2.16.840.1.113883.3.989.2.1.3.13"/>
        </device>
    </sender>
</MCCI_IN200100UV01>                                                

下面是我的代码,但结果是空列表。 我想提取像“N.1.1”

def extractData(tree):
    """r3 data extracted by xpath"""
    root = tree.getroot()
    keys = getList(R3_DATA)
    for key in keys:
        xPath = getxPath(key)
        print(root.xpath(xPath))

我应该如何解决这个问题或者我应该怎么做? 如果有其他库或示例代码能够做到这一点,那么你能告诉我吗?

如前所述,您的 xpath 需要命名空间。下面是一个如何在 lxml 中使用名称空间的示例。请注意 xpath 中的 u:x: 前缀。

In [1]: from lxml import etree

In [2]: root = etree.parse('mcci.xml')

In [3]: NS = {'u': 'urn:hl7-org:v3', 'x': 'http://www.w3.org/2001/XMLSchema-instance'}

In [4:] xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@x:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/u:creationTime/@value"

In [5]: root.xpath(xpath, namespaces=NS)
Out[5]: ['N.1.5']

我可能会建议删除涉及模式位置的谓词以稍微简化事情。

In [6]: NS = {'u': 'urn:hl7-org:v3'}

In [7]: xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0']/u:creationTime/@value"

In [8]: root.xpath(xpath, namespaces=NS)
Out[8]: ['N.1.5']