如何正确解析和堆叠 XML 个节点和子节点?

How to parse and stack XML nodes and children correctly?

我目前正在尝试使用议会 XML 界面分析欧洲议会中的一些投票行为。然而,即使我能够导入信息并以某种方式操作它们,我也无法获得有意义的 pandas DataFrame。

例如我尝试用“赞成”和“反对”投票设置两个数据框。但是,两个数据框产生相同的大小和相同的顺序...有人可以帮忙吗?

谢谢!

import lxml
import xml.etree.ElementTree as ET
from itertools import product, chain
from urllib.request import urlopen

import io

var_url = urlopen('https://www.europarl.europa.eu/doceo/document/PV-9-2020-12-18-RCV_FR.xml')
xmldoc = ET.parse(var_url)
xmlroot = xmldoc.getroot()

vote_items = []
all_vote_items = []
for avote in xmlroot.iter('RollCallVote.Result'):
    vote_Nr = avote.attrib.get('Identifier')
    for anitem in avote.iter('Result.For'):
            for amep in avote.iter('PoliticalGroup.Member.Name'):
                mep_id = amep.get('MepId')
                vote_items = [vote_Nr, mep_id]
                all_vote_items.append(vote_items)
for_meps = pd.DataFrame(all_vote_items,columns=['VOTE_NUMBER','vmep_id'])      


vote_items = []
all_vote_items = []
for avote in xmlroot.iter('RollCallVote.Result'):
    vote_Nr = avote.attrib.get('Identifier')
    for anitem in avote.iter('Result.Against'):
            for amep in avote.iter('PoliticalGroup.Member.Name'):
                mep_id = amep.get('MepId')
                vote_items = [vote_Nr, mep_id]
                all_vote_items.append(vote_items)
against_meps = pd.DataFrame(all_vote_items,columns=['VOTE_NUMBER','vmep_id'])   

更新:

我现在尝试将这三者结合起来,但又回到了 (39,4) 数据框。我怎样才能正确堆叠?

vote_items = []
all_vote_items = []

for avote in xmlroot.iter('RollCallVote.Result'):
    vote_Nr = avote.attrib.get('Identifier')
    
    for anitem in avote.iter('Result.For'):
        for agroup in anitem.iter('Result.PoliticalGroup.List'):
            for amep in agroup.iter('PoliticalGroup.Member.Name'):
                mep_id_for = amep.get('MepId')
    
    for anitem in avote.iter('Result.Against'):
        for agroup in anitem.iter('Result.PoliticalGroup.List'):
            for amep in agroup.iter('PoliticalGroup.Member.Name'):
                mep_id_against = amep.get('MepId')
                
    for anitem in avote.iter('Result.Abstention'):
        for agroup in anitem.iter('Result.PoliticalGroup.List'):
            for amep in agroup.iter('PoliticalGroup.Member.Name'):
                mep_id_abstention = amep.get('MepId')
    
    
    vote_items = [vote_Nr, mep_id_for, mep_id_against, mep_id_abstention]
    all_vote_items.append(vote_items)
            
all_meps = pd.DataFrame(all_vote_items,columns=['vote_nr','vote_for','vote_against','vote_abstention'])```

我认为您的代码在这一行中存在错误:

for amep in avote.iter('PoliticalGroup.Member.Name'):

您可能应该遍历 anitem 个对象而不是 avote。有两个地方需要修复。我刚刚检查过,这导致了不同的 all_vote_items 列表。