xml 使用 xPath 解析和提取属性值
xml parsing and extract attribute value with xPath
我想用 xPath
提取 N.1.2、N.1.1、N.2.r.1、....、N.1.3、N.1.4
所以,我的字典里有xpath。
# Value - Types of Message in batch
"N.1.1": R3Item(
elemId="N.1.1",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@code",
required=True,
comment="N.1.1 - Types of Message in batch",
),
# Types of Message in batch
"N.1.1_csv": R3Item(
elemId="N.1.1_csv",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@codeSystemVersion",
required=True,
),
# Value - Batch Number
"N.1.2": R3Item(
elemId="N.1.2",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/id[@root='2.16.840.1.113883.3.989.2.1.3.22']/@extension",
required=True,
comment="N.1.2 - Batch Number",
),
# Value - Batch Sender Identifier
"N.1.3": R3Item(
elemId="N.1.3",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.13'][1]/@extension",
required=True,
comment="N.1.3 - Batch Sender Identifier",
),
# Value - Batch Receiver Identifier
"N.1.4": R3Item(
elemId="N.1.4",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.14'][1]/@extension",
required=True,
comment="N.1.4 - Batch Receiver Identifier",
),
# Value - Date of Batch Transmission
"N.1.5": R3Item(
elemId="N.1.5",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/creationTime/@value",
required=True,
comment="N.1.5 - Date of Batch Transmission",
),
# Value - Message Identifier
"N.2.r.1": R3Item(
elemId="N.2.r.1",
xPath="//PORR_IN049016UV[r]/id[@root='2.16.840.1.113883.3.989.2.1.3.1'][1]/@extension",
required=True,
comment="N.2.r.1 - Message Identifier",
),
# Value - Message Sender Identifier
"N.2.r.2": R3Item(
elemId="N.2.r.2",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.11'][1]/@extension",
required=True,
comment="N.2.r.2 - Message Sender Identifier",
),
# Value - Message Receiver Identifier
"N.2.r.3": R3Item(
elemId="N.2.r.3",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.12'][1]/@extension",
required=True,
comment="N.2.r.3 - Message Receiver Identifier",
),
# Value - Date of Message Creation
"N.2.r.4": R3Item(
elemId="N.2.r.4",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/creationTime/@value",
required=True,
comment="N.2.r.4 - Date of Message Creation",
),
下面是部分样例xml
<?xml version="1.0" encoding="UTF-8"?>
<MCCI_IN200100UV01 ITSVersion="XML_1.0" xsi:schemaLocation="urn:hl7-org:v3 MCCI_IN200100UV01.xsd" xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<id extension="N.1.2" root="2.16.840.1.113883.3.989.2.1.3.22"/>
<creationTime value="N.1.5"/>
<responseModeCode code="D"/>
<interactionId extension="MCCI_IN200100UV01" root="2.16.840.1.113883.1.6"/>
<name code="N.1.1" codeSystem="2.16.840.1.113883.3.989.2.1.1.1" codeSystemVersion="1.01"/>
<PORR_IN049016UV>
<id extension="N.2.r.1" root="2.16.840.1.113883.3.989.2.1.3.1"/>
<creationTime value="N.2.r.4"/>
<interactionId extension="PORR_IN049016UV" root="2.16.840.1.113883.1.6"/>
<processingCode code="P"/>
<processingModeCode code="T"/>
<acceptAckCode code="AL"/>
<receiver typeCode="RCV">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.2.r.3" root="2.16.840.1.113883.3.989.2.1.3.12"/>
</device>
</receiver>
</PORR_IN049016UV>
<receiver typeCode="RCV">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.1.4" root="2.16.840.1.113883.3.989.2.1.3.14"/>
</device>
</receiver>
<sender typeCode="SND">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.1.3" root="2.16.840.1.113883.3.989.2.1.3.13"/>
</device>
</sender>
</MCCI_IN200100UV01>
下面是我的代码,但结果是空列表。
我想提取像“N.1.1”
def extractData(tree):
"""r3 data extracted by xpath"""
root = tree.getroot()
keys = getList(R3_DATA)
for key in keys:
xPath = getxPath(key)
print(root.xpath(xPath))
我应该如何解决这个问题或者我应该怎么做?
如果有其他库或示例代码能够做到这一点,那么你能告诉我吗?
如前所述,您的 xpath 需要命名空间。下面是一个如何在 lxml 中使用名称空间的示例。请注意 xpath 中的 u:
和 x:
前缀。
In [1]: from lxml import etree
In [2]: root = etree.parse('mcci.xml')
In [3]: NS = {'u': 'urn:hl7-org:v3', 'x': 'http://www.w3.org/2001/XMLSchema-instance'}
In [4:] xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@x:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/u:creationTime/@value"
In [5]: root.xpath(xpath, namespaces=NS)
Out[5]: ['N.1.5']
我可能会建议删除涉及模式位置的谓词以稍微简化事情。
In [6]: NS = {'u': 'urn:hl7-org:v3'}
In [7]: xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0']/u:creationTime/@value"
In [8]: root.xpath(xpath, namespaces=NS)
Out[8]: ['N.1.5']
我想用 xPath
提取 N.1.2、N.1.1、N.2.r.1、....、N.1.3、N.1.4所以,我的字典里有xpath。
# Value - Types of Message in batch
"N.1.1": R3Item(
elemId="N.1.1",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@code",
required=True,
comment="N.1.1 - Types of Message in batch",
),
# Types of Message in batch
"N.1.1_csv": R3Item(
elemId="N.1.1_csv",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@codeSystemVersion",
required=True,
),
# Value - Batch Number
"N.1.2": R3Item(
elemId="N.1.2",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/id[@root='2.16.840.1.113883.3.989.2.1.3.22']/@extension",
required=True,
comment="N.1.2 - Batch Number",
),
# Value - Batch Sender Identifier
"N.1.3": R3Item(
elemId="N.1.3",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.13'][1]/@extension",
required=True,
comment="N.1.3 - Batch Sender Identifier",
),
# Value - Batch Receiver Identifier
"N.1.4": R3Item(
elemId="N.1.4",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.14'][1]/@extension",
required=True,
comment="N.1.4 - Batch Receiver Identifier",
),
# Value - Date of Batch Transmission
"N.1.5": R3Item(
elemId="N.1.5",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/creationTime/@value",
required=True,
comment="N.1.5 - Date of Batch Transmission",
),
# Value - Message Identifier
"N.2.r.1": R3Item(
elemId="N.2.r.1",
xPath="//PORR_IN049016UV[r]/id[@root='2.16.840.1.113883.3.989.2.1.3.1'][1]/@extension",
required=True,
comment="N.2.r.1 - Message Identifier",
),
# Value - Message Sender Identifier
"N.2.r.2": R3Item(
elemId="N.2.r.2",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.11'][1]/@extension",
required=True,
comment="N.2.r.2 - Message Sender Identifier",
),
# Value - Message Receiver Identifier
"N.2.r.3": R3Item(
elemId="N.2.r.3",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.12'][1]/@extension",
required=True,
comment="N.2.r.3 - Message Receiver Identifier",
),
# Value - Date of Message Creation
"N.2.r.4": R3Item(
elemId="N.2.r.4",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/creationTime/@value",
required=True,
comment="N.2.r.4 - Date of Message Creation",
),
下面是部分样例xml
<?xml version="1.0" encoding="UTF-8"?>
<MCCI_IN200100UV01 ITSVersion="XML_1.0" xsi:schemaLocation="urn:hl7-org:v3 MCCI_IN200100UV01.xsd" xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<id extension="N.1.2" root="2.16.840.1.113883.3.989.2.1.3.22"/>
<creationTime value="N.1.5"/>
<responseModeCode code="D"/>
<interactionId extension="MCCI_IN200100UV01" root="2.16.840.1.113883.1.6"/>
<name code="N.1.1" codeSystem="2.16.840.1.113883.3.989.2.1.1.1" codeSystemVersion="1.01"/>
<PORR_IN049016UV>
<id extension="N.2.r.1" root="2.16.840.1.113883.3.989.2.1.3.1"/>
<creationTime value="N.2.r.4"/>
<interactionId extension="PORR_IN049016UV" root="2.16.840.1.113883.1.6"/>
<processingCode code="P"/>
<processingModeCode code="T"/>
<acceptAckCode code="AL"/>
<receiver typeCode="RCV">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.2.r.3" root="2.16.840.1.113883.3.989.2.1.3.12"/>
</device>
</receiver>
</PORR_IN049016UV>
<receiver typeCode="RCV">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.1.4" root="2.16.840.1.113883.3.989.2.1.3.14"/>
</device>
</receiver>
<sender typeCode="SND">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.1.3" root="2.16.840.1.113883.3.989.2.1.3.13"/>
</device>
</sender>
</MCCI_IN200100UV01>
下面是我的代码,但结果是空列表。 我想提取像“N.1.1”
def extractData(tree):
"""r3 data extracted by xpath"""
root = tree.getroot()
keys = getList(R3_DATA)
for key in keys:
xPath = getxPath(key)
print(root.xpath(xPath))
我应该如何解决这个问题或者我应该怎么做? 如果有其他库或示例代码能够做到这一点,那么你能告诉我吗?
如前所述,您的 xpath 需要命名空间。下面是一个如何在 lxml 中使用名称空间的示例。请注意 xpath 中的 u:
和 x:
前缀。
In [1]: from lxml import etree
In [2]: root = etree.parse('mcci.xml')
In [3]: NS = {'u': 'urn:hl7-org:v3', 'x': 'http://www.w3.org/2001/XMLSchema-instance'}
In [4:] xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@x:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/u:creationTime/@value"
In [5]: root.xpath(xpath, namespaces=NS)
Out[5]: ['N.1.5']
我可能会建议删除涉及模式位置的谓词以稍微简化事情。
In [6]: NS = {'u': 'urn:hl7-org:v3'}
In [7]: xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0']/u:creationTime/@value"
In [8]: root.xpath(xpath, namespaces=NS)
Out[8]: ['N.1.5']