使用 python "None type error" 将 XML 解析为 CSV

parsing XML to CSV using python "None type error"

正在尝试将 XML 转换为 CSV。我是 python 解析的新手。

数据样本(“虚拟数据”)

<users>
  <user firstName="Hannah" lastName="Jones" age="21" sex="Female" retired="False" dependants="2" marital_status="married or civil partner" salary="20603" pension="0" company="Ward and Sons" commute_distance="6.56" address_postcode="N06 4LG"/>
  <user firstName="Tracy" lastName="Rowley" age="50" sex="Female" retired="False" dependants="1" marital_status="single" salary="39509" pension="0" company="Fuller, King and Robinson" commute_distance="11.01" address_postcode="M1 6JD"/>
  <user firstName="Shane" lastName="Thompson" age="87" sex="Male" retired="True" dependants="2" marital_status="single" salary="53134" pension="13409" company="N/A" commute_distance="0" address_postcode="WF84 1EA"/>
  <user firstName="Michael" lastName="Anderson" age="85" sex="Male" retired="True" dependants="2" marital_status="married or civil partner" salary="58524" pension="39479" company="N/A" commute_distance="0" address_postcode="BN1 7TL"/>
</users>

我试过这个代码

import csv
import xml.etree.ElementTree as Xet
import pandas as pd

cols = ["firstName", "lastName", "age", "sex", "retired", "dependants", "marital_status", "salary", "pension", "company", "commute_distance", "address_postcode"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('/content/drive/MyDrive/DATAtask1/user_data.xml')
root = xmlparse.getroot()
for user in root:
    firstName = user.find("firstName").text
    lastName = user.find("lastName").text
    age = user.find("age").text
    sex = user.find("sex").text
    retired = user.find("retired").text
    dependants = user.find("dependants").text
    marital_status = user.find("marital_status").text
    salary = user.find("salary").text
    pension = user.find("pension").text
    company = user.find("company").text
    commute_distance = user.find("commute_distance").text
    address_postcode = user.find("address_postcode").text
  
    rows.append({"firstName": firstName,
                 "lastName": lastName,
                 "age": age,
                 "sex": sex,
                 "retired": retired,
                 "dependants": dependants,
                 "marital_status": marital_status,
                 "pension": pension,
                 "salary": salary,
                 "company": company,
                 "commute_distance": commute_distance,
                 "address_postcode": address_postcode})
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('/content/drive/MyDrive/DATAtask1/XMLtoCSV.csv') 

收到此错误

AttributeError Traceback(最后一次调用)

<ipython-input-3-c6016197ed71> in <module>()
      7 root = xmlparse.getroot()
      8 for user in root:
----> 9     firstName = user.find("firstName").text
     10     lastName = user.find("lastName").text
     11     age = user.find("age").text

AttributeError: 'NoneType' object has no attribute 'text'

该标签有一个带有 firstName 的属性。因此你应该使用:

user.attrib['firstName']

如果你检查:user.attrib,它会return一个字典(这不是真的,它returns lxml.etree._Attrib 可以使用 (dict(user.attrib)) 将其转换为字典。这将使您有机会简化代码,因为您可以像普通 python 字典一样使用字典。

例如,您可以创建一个列表并将所有词典附加到列表中。最后,可以将字典列表转换为 pandas 数据帧:

d1 = {'name': 'john', 'age': 19}
d2 = {'name': 'Steve', 'age': 16}
# A dictionary with an extra key:
d3 = {'name': 'Jim', 'age': 25, 'additional': 'something'}
df = pd.DataFrame([d1, d2, d3])

    name  age additional
0   john   19        NaN
1  Steve   16        NaN
2    Jim   25  something