使用 python "None type error" 将 XML 解析为 CSV
parsing XML to CSV using python "None type error"
正在尝试将 XML 转换为 CSV。我是 python 解析的新手。
数据样本(“虚拟数据”)
<users>
<user firstName="Hannah" lastName="Jones" age="21" sex="Female" retired="False" dependants="2" marital_status="married or civil partner" salary="20603" pension="0" company="Ward and Sons" commute_distance="6.56" address_postcode="N06 4LG"/>
<user firstName="Tracy" lastName="Rowley" age="50" sex="Female" retired="False" dependants="1" marital_status="single" salary="39509" pension="0" company="Fuller, King and Robinson" commute_distance="11.01" address_postcode="M1 6JD"/>
<user firstName="Shane" lastName="Thompson" age="87" sex="Male" retired="True" dependants="2" marital_status="single" salary="53134" pension="13409" company="N/A" commute_distance="0" address_postcode="WF84 1EA"/>
<user firstName="Michael" lastName="Anderson" age="85" sex="Male" retired="True" dependants="2" marital_status="married or civil partner" salary="58524" pension="39479" company="N/A" commute_distance="0" address_postcode="BN1 7TL"/>
</users>
我试过这个代码
import csv
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["firstName", "lastName", "age", "sex", "retired", "dependants", "marital_status", "salary", "pension", "company", "commute_distance", "address_postcode"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('/content/drive/MyDrive/DATAtask1/user_data.xml')
root = xmlparse.getroot()
for user in root:
firstName = user.find("firstName").text
lastName = user.find("lastName").text
age = user.find("age").text
sex = user.find("sex").text
retired = user.find("retired").text
dependants = user.find("dependants").text
marital_status = user.find("marital_status").text
salary = user.find("salary").text
pension = user.find("pension").text
company = user.find("company").text
commute_distance = user.find("commute_distance").text
address_postcode = user.find("address_postcode").text
rows.append({"firstName": firstName,
"lastName": lastName,
"age": age,
"sex": sex,
"retired": retired,
"dependants": dependants,
"marital_status": marital_status,
"pension": pension,
"salary": salary,
"company": company,
"commute_distance": commute_distance,
"address_postcode": address_postcode})
df = pd.DataFrame(rows, columns=cols)
# Writing dataframe to csv
df.to_csv('/content/drive/MyDrive/DATAtask1/XMLtoCSV.csv')
收到此错误
AttributeError Traceback(最后一次调用)
<ipython-input-3-c6016197ed71> in <module>()
7 root = xmlparse.getroot()
8 for user in root:
----> 9 firstName = user.find("firstName").text
10 lastName = user.find("lastName").text
11 age = user.find("age").text
AttributeError: 'NoneType' object has no attribute 'text'
该标签有一个带有 firstName 的属性。因此你应该使用:
user.attrib['firstName']
如果你检查:user.attrib
,它会return一个字典(这不是真的,它returns lxml.etree._Attrib 可以使用 (dict(user.attrib)) 将其转换为字典。这将使您有机会简化代码,因为您可以像普通 python 字典一样使用字典。
例如,您可以创建一个列表并将所有词典附加到列表中。最后,可以将字典列表转换为 pandas 数据帧:
d1 = {'name': 'john', 'age': 19}
d2 = {'name': 'Steve', 'age': 16}
# A dictionary with an extra key:
d3 = {'name': 'Jim', 'age': 25, 'additional': 'something'}
df = pd.DataFrame([d1, d2, d3])
name age additional
0 john 19 NaN
1 Steve 16 NaN
2 Jim 25 something
正在尝试将 XML 转换为 CSV。我是 python 解析的新手。
数据样本(“虚拟数据”)
<users>
<user firstName="Hannah" lastName="Jones" age="21" sex="Female" retired="False" dependants="2" marital_status="married or civil partner" salary="20603" pension="0" company="Ward and Sons" commute_distance="6.56" address_postcode="N06 4LG"/>
<user firstName="Tracy" lastName="Rowley" age="50" sex="Female" retired="False" dependants="1" marital_status="single" salary="39509" pension="0" company="Fuller, King and Robinson" commute_distance="11.01" address_postcode="M1 6JD"/>
<user firstName="Shane" lastName="Thompson" age="87" sex="Male" retired="True" dependants="2" marital_status="single" salary="53134" pension="13409" company="N/A" commute_distance="0" address_postcode="WF84 1EA"/>
<user firstName="Michael" lastName="Anderson" age="85" sex="Male" retired="True" dependants="2" marital_status="married or civil partner" salary="58524" pension="39479" company="N/A" commute_distance="0" address_postcode="BN1 7TL"/>
</users>
我试过这个代码
import csv
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["firstName", "lastName", "age", "sex", "retired", "dependants", "marital_status", "salary", "pension", "company", "commute_distance", "address_postcode"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('/content/drive/MyDrive/DATAtask1/user_data.xml')
root = xmlparse.getroot()
for user in root:
firstName = user.find("firstName").text
lastName = user.find("lastName").text
age = user.find("age").text
sex = user.find("sex").text
retired = user.find("retired").text
dependants = user.find("dependants").text
marital_status = user.find("marital_status").text
salary = user.find("salary").text
pension = user.find("pension").text
company = user.find("company").text
commute_distance = user.find("commute_distance").text
address_postcode = user.find("address_postcode").text
rows.append({"firstName": firstName,
"lastName": lastName,
"age": age,
"sex": sex,
"retired": retired,
"dependants": dependants,
"marital_status": marital_status,
"pension": pension,
"salary": salary,
"company": company,
"commute_distance": commute_distance,
"address_postcode": address_postcode})
df = pd.DataFrame(rows, columns=cols)
# Writing dataframe to csv
df.to_csv('/content/drive/MyDrive/DATAtask1/XMLtoCSV.csv')
收到此错误
AttributeError Traceback(最后一次调用)
<ipython-input-3-c6016197ed71> in <module>()
7 root = xmlparse.getroot()
8 for user in root:
----> 9 firstName = user.find("firstName").text
10 lastName = user.find("lastName").text
11 age = user.find("age").text
AttributeError: 'NoneType' object has no attribute 'text'
该标签有一个带有 firstName 的属性。因此你应该使用:
user.attrib['firstName']
如果你检查:user.attrib
,它会return一个字典(这不是真的,它returns lxml.etree._Attrib 可以使用 (dict(user.attrib)) 将其转换为字典。这将使您有机会简化代码,因为您可以像普通 python 字典一样使用字典。
例如,您可以创建一个列表并将所有词典附加到列表中。最后,可以将字典列表转换为 pandas 数据帧:
d1 = {'name': 'john', 'age': 19}
d2 = {'name': 'Steve', 'age': 16}
# A dictionary with an extra key:
d3 = {'name': 'Jim', 'age': 25, 'additional': 'something'}
df = pd.DataFrame([d1, d2, d3])
name age additional
0 john 19 NaN
1 Steve 16 NaN
2 Jim 25 something