如何解析多个子组的 XML 嵌套值
How to parse XML nested value for multiple child group
我有一个很大的 .xml
文件,其中的一部分如下所示:
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
我正在尝试从中解析出所需的数据并将其 csv 到 CSV 文件中。
我遇到的问题是,在 xml 文件中,<measTypes>
被重复了一次,之后提到了 group1 和 Group2 的 <measTypes>
的值。
对于不同的<Mesurment Id>
,它是不同的,并且每个 <measTypes>
报告的组值可能超过 10 个
问题在这里,我不知道如何为一个 measTypes
报告多个 measResults
我有以下代码来获取值:
import xml.etree.ElementTree as ET
import pandas as pd
parsDict = dict()
tree = ET.parse('new.xml')
root = tree.getroot()
for itm in tree.iter():
if (itm.tag.split('}')[-1] == 'Mesurment'):
parsDict['Mesurment'] = [itm.attrib['Id']]
if (itm.tag.split('}')[-1] == 'granPeriod'):
parsDict['duration'] = [itm.attrib['duration']]
parsDict['endTime'] = [itm.attrib['endTime']]
if (itm.tag.split('}')[-1] == 'measTypes'):
parsDict['CounterID'] = [itm.text]
if (itm.tag.split('}')[-1] == 'measValue'):
parsDict['measObj'] = [itm.attrib['measObj']]
if (itm.tag.split('}')[-1] == 'measResults'):
parsDict['value'] = [itm.text]
df2 = pd.DataFrame(parsDict)
df2.to_csv('123.csv',index=False)
print('finish')
结果如下
报告最新组
我想要的结果如下所示,需要能够扩展多个组和测量 Id
使用 BeautifulSoup
库可能更容易做到这一点。在使用它之前,你应该安装这些依赖项:
beautifulsoup4 = "4.9.3"
lxml = "^4.6.1"
from bs4 import BeautifulSoup, Tag
soup = BeautifulSoup("""
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
""", features="xml")
response = []
for tag in soup.data.measData:
if not isinstance(tag, Tag):
continue
# please, update this dict with all the top level attributes you need
data = {"duration": tag.granPeriod.attrs["duration"], }
for measValue in tag:
if not isinstance(measValue, Tag) or getattr(measValue, "measResults") is None:
continue
response.append({
**data,
"measObj": measValue.attrs["measObj"],
"value": measValue.measResults.text
})
print(response)
更新
使用你做的库,可以这样做:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('new.xml')
root = tree.getroot()
response = []
for mesurment in tree.iter("Mesurment"):
granPeriod = next(
it for it in mesurment if it.tag == "granPeriod"
)
measTypes = next(
it for it in mesurment if it.tag == "measTypes"
)
measValues = [it for it in mesurment if it.tag == "measValue"]
mesurment_data = {
"Mesurment": mesurment.attrib["Id"],
"duration": granPeriod.attrib["duration"],
"endTime": granPeriod.attrib["endTime"],
"CounterId": measTypes.text,
}
for value in measValues:
response.append({
**mesurment_data,
"measObj": value.attrib["measObj"],
"value": next(
it.text for it in value if it.tag == "measResults"
)
})
df2 = pd.DataFrame(response)
print(df2)
我有一个很大的 .xml
文件,其中的一部分如下所示:
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
我正在尝试从中解析出所需的数据并将其 csv 到 CSV 文件中。
我遇到的问题是,在 xml 文件中,<measTypes>
被重复了一次,之后提到了 group1 和 Group2 的 <measTypes>
的值。
对于不同的<Mesurment Id>
,它是不同的,并且每个 <measTypes>
报告的组值可能超过 10 个
问题在这里,我不知道如何为一个 measTypes
报告多个 measResults
我有以下代码来获取值:
import xml.etree.ElementTree as ET
import pandas as pd
parsDict = dict()
tree = ET.parse('new.xml')
root = tree.getroot()
for itm in tree.iter():
if (itm.tag.split('}')[-1] == 'Mesurment'):
parsDict['Mesurment'] = [itm.attrib['Id']]
if (itm.tag.split('}')[-1] == 'granPeriod'):
parsDict['duration'] = [itm.attrib['duration']]
parsDict['endTime'] = [itm.attrib['endTime']]
if (itm.tag.split('}')[-1] == 'measTypes'):
parsDict['CounterID'] = [itm.text]
if (itm.tag.split('}')[-1] == 'measValue'):
parsDict['measObj'] = [itm.attrib['measObj']]
if (itm.tag.split('}')[-1] == 'measResults'):
parsDict['value'] = [itm.text]
df2 = pd.DataFrame(parsDict)
df2.to_csv('123.csv',index=False)
print('finish')
结果如下
报告最新组 我想要的结果如下所示,需要能够扩展多个组和测量 Id
使用 BeautifulSoup
库可能更容易做到这一点。在使用它之前,你应该安装这些依赖项:
beautifulsoup4 = "4.9.3"
lxml = "^4.6.1"
from bs4 import BeautifulSoup, Tag
soup = BeautifulSoup("""
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
""", features="xml")
response = []
for tag in soup.data.measData:
if not isinstance(tag, Tag):
continue
# please, update this dict with all the top level attributes you need
data = {"duration": tag.granPeriod.attrs["duration"], }
for measValue in tag:
if not isinstance(measValue, Tag) or getattr(measValue, "measResults") is None:
continue
response.append({
**data,
"measObj": measValue.attrs["measObj"],
"value": measValue.measResults.text
})
print(response)
更新
使用你做的库,可以这样做:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('new.xml')
root = tree.getroot()
response = []
for mesurment in tree.iter("Mesurment"):
granPeriod = next(
it for it in mesurment if it.tag == "granPeriod"
)
measTypes = next(
it for it in mesurment if it.tag == "measTypes"
)
measValues = [it for it in mesurment if it.tag == "measValue"]
mesurment_data = {
"Mesurment": mesurment.attrib["Id"],
"duration": granPeriod.attrib["duration"],
"endTime": granPeriod.attrib["endTime"],
"CounterId": measTypes.text,
}
for value in measValues:
response.append({
**mesurment_data,
"measObj": value.attrib["measObj"],
"value": next(
it.text for it in value if it.tag == "measResults"
)
})
df2 = pd.DataFrame(response)
print(df2)