使用 Python 和 LXML 从数据帧创建一个 XML 文件
Creating an XML File from a dataframe using Python and LXML
我正在尝试创建一个 XML 文件,该文件使用 Pandas 数据框来填充元素和子元素。这是我写的代码:
import pandas as pd
from lxml import etree as et
df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})
root = et.Element('ClientReport')
idnum = et.SubElement(root, 'ID')
prime_serv = et.SubElement(root, 'ServiceCategory')
serv_date = et.SubElement(root, 'ServiceDate')
for row in df.iterrows():
idnum.text = df['id_profile']
prime_serv.text = df['PrimaryServiceCategory']
serv_date.text = df['ServiceDate']
print(et.tostring(root, pretty_print=True))
我的预期结果是:
<ClientReport>
<ID>439</ID>
<ServiceCategory>25</ServiceCategory>
<ServiceDate>2017-12-05</ServiceDate>
</ClientReport>
<ClientReport>
<ID>444654</ID>
<ServiceCategory>25</ServiceCategory>
<ServiceDate>2017-01-25</ServiceDate>
</ClientReport>
<ClientReport>
<ID>12222</ID>
<ServiceCategory>25</ServiceCategory>
<ServiceDate>2017-01-25</ServiceDate>
</ClientReport>
相反,我得到:
TypeError: Argument must be bytes or unicode, got 'Series'
我不确定如何获取迭代行而不是静态值来填充 XML 文件的属性。 Pandas 这里的包裹合适吗? LXML 也是正确的吗?
一种解决方案是遍历数据框,分别填充每个 XML 元素:
import pandas as pd
from lxml import etree as et
df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})
root = et.Element('root')
for row in df.iterrows():
report = et.SubElement(root, 'ClientReport')
idnum = et.SubElement(report, 'ID')
prime_serv = et.SubElement(report, 'ServiceCategory')
serv_date = et.SubElement(report, 'ServiceDate')
idnum.text = str(row[1]['id_profile'])
prime_serv.text = str(row[1]['PrimaryServiceCategory'])
serv_date.text = str(row[1]['ServiceDate'])
print(et.tostring(root, pretty_print=True).decode('utf-8'))
我正在尝试创建一个 XML 文件,该文件使用 Pandas 数据框来填充元素和子元素。这是我写的代码:
import pandas as pd
from lxml import etree as et
df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})
root = et.Element('ClientReport')
idnum = et.SubElement(root, 'ID')
prime_serv = et.SubElement(root, 'ServiceCategory')
serv_date = et.SubElement(root, 'ServiceDate')
for row in df.iterrows():
idnum.text = df['id_profile']
prime_serv.text = df['PrimaryServiceCategory']
serv_date.text = df['ServiceDate']
print(et.tostring(root, pretty_print=True))
我的预期结果是:
<ClientReport>
<ID>439</ID>
<ServiceCategory>25</ServiceCategory>
<ServiceDate>2017-12-05</ServiceDate>
</ClientReport>
<ClientReport>
<ID>444654</ID>
<ServiceCategory>25</ServiceCategory>
<ServiceDate>2017-01-25</ServiceDate>
</ClientReport>
<ClientReport>
<ID>12222</ID>
<ServiceCategory>25</ServiceCategory>
<ServiceDate>2017-01-25</ServiceDate>
</ClientReport>
相反,我得到:
TypeError: Argument must be bytes or unicode, got 'Series'
我不确定如何获取迭代行而不是静态值来填充 XML 文件的属性。 Pandas 这里的包裹合适吗? LXML 也是正确的吗?
一种解决方案是遍历数据框,分别填充每个 XML 元素:
import pandas as pd
from lxml import etree as et
df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})
root = et.Element('root')
for row in df.iterrows():
report = et.SubElement(root, 'ClientReport')
idnum = et.SubElement(report, 'ID')
prime_serv = et.SubElement(report, 'ServiceCategory')
serv_date = et.SubElement(report, 'ServiceDate')
idnum.text = str(row[1]['id_profile'])
prime_serv.text = str(row[1]['PrimaryServiceCategory'])
serv_date.text = str(row[1]['ServiceDate'])
print(et.tostring(root, pretty_print=True).decode('utf-8'))