从 XML 文件创建 DataFrame
Create a DataFrame from a XML File
我是 XML 的新手,我想知道如何从这个 XML 文件在 python 中创建数据框。
<EXTENDEDPROPERTIES>
<DEBTCONFIGURATION>
<row Key="guid" Value="2018438038"/>
<row Key="status" Value="0"/>
<row Key="forma_pago" Value="DEBITO A CUENTA"/>
<row Key="monto" Value="23699.1"/>
<row Key="monto_abono" Value="360.55"/>
<row Key="entidad" Value="BANCO CAPRICHOSO S.A."/>
<row Key="tipo" Value="PREST. AUTO"/>
<row Key="balance" Value="19617.5"/>
<row Key="KIND_ID" Value="PRINCIPAL"/>
<row Key="TYPE_ID" Value="CEDULA_IDENTIDAD"/>
<row Key="CUSTOMER_ID" Value="777-555-888"/>
<row Key="MEMBER_TYPE" Value="DEUDOR"/>
</DEBTCONFIGURATION>
我有以下代码,它创建了 DataFrame,但是当我尝试附加该行的值时,我不知道为什么它一直出现“None”。
我不知道是否必须更改调用参数,即 Attrib.get。
我也尝试将 attrib.get 更改为 find("value").text 但它给我的错误是它没有文本属性。
import pandas as pd
import xml.etree.ElementTree as ET
xtree = ET.parse("davi_apc.xml")
xroot = xtree.getroot()
df_cols = ["guid", "status", "forma_pago", "monto", "monto_abono", "entidad", "tipo", "balance","KIND_ID", "TYPE_ID", "CUSTOMER_ID", "MEMBER_TYPE"]
rows = []
for node in xroot:
s_guid = node.attrib.get("guid")
s_status = node.attrib.get("status")
s_formapago = node.attrib.get("forma_pago")
s_monto = node.attrib.get("monto")
s_monto_abono = node.attrib.get("monto_abono")
s_entidad = node.attrib.get("entidad")
s_tipo = node.attrib.get("tipo")
s_balance = node.attrib.get("balance")
s_kind_id = node.attrib.get("KIND_ID")
s_type_id = node.attrib.get("TYPE_ID")
s_customer_id = node.attrib.get("CUSTOMER_ID")
s_mebder_type = node.attrib.get("MEMBER_TYPE")
rows.append({
"guid" : s_guid,
"status" : s_status,
"forma_pago" : s_formapago,
"monto" : s_monto,
"monto_abono" : s_monto_abono,
"entidad" : s_entidad,
"tipo" : s_tipo,
"balance" : s_balance,
"KIND_ID" : s_kind_id,
"TYPE_ID" : s_type_id,
"CUSTOMER_ID" : s_customer_id,
"MEMBER_TYPE" : s_mebder_type
})
out_df = pd.DataFrame(rows, columns = df_cols)
这是 print(rows) 的打印输出
[{'guid': None, 'status': None, 'forma_pago': None, 'monto': None, 'monto_abono': None, 'entidad': None, 'tipo': None, 'balance': None, 'KIND_ID': None, 'TYPE_ID': None, 'CUSTOMER_ID': None, 'MEMBER_TYPE': None}]
这是数据框的打印输出
guid status forma_pago monto monto_abono entidad tipo balance KIND_ID
0 None None None None None None None None None
TYPE_ID CUSTOMER_ID MEMBER_TYPE
0 None None None
这是一个可行的解决方案:
1/ 从 xml 文件中删除第一行,我不确定第一个标签是否符合 xml 标准?
<DEBTCONFIGURATION>
<row Key="guid" Value="2018438038"/>
<row Key="status" Value="0"/>
<row Key="forma_pago" Value="DEBITO A CUENTA"/>
<row Key="monto" Value="23699.1"/>
<row Key="monto_abono" Value="360.55"/>
<row Key="entidad" Value="BANCO CAPRICHOSO S.A."/>
<row Key="tipo" Value="PREST. AUTO"/>
<row Key="balance" Value="19617.5"/>
<row Key="KIND_ID" Value="PRINCIPAL"/>
<row Key="TYPE_ID" Value="CEDULA_IDENTIDAD"/>
<row Key="CUSTOMER_ID" Value="777-555-888"/>
<row Key="MEMBER_TYPE" Value="DEUDOR"/>
</DEBTCONFIGURATION>
2/代码:
import pandas as pd
import xml.etree.ElementTree as ET
xtree = ET.parse("davi_apc.xml")
xroot = xtree.getroot()
rows = [{}]
for node in xroot:
print(node.attrib)
rows[0].update({node.attrib['Key']:node.attrib['Value']})
out_df = pd.DataFrame(rows)
3/ out_df 的输出:
out_df.head(10)
guid status ... CUSTOMER_ID MEMBER_TYPE
0 2018438038 0 ... 777-555-888 DEUDOR
我是 XML 的新手,我想知道如何从这个 XML 文件在 python 中创建数据框。
<EXTENDEDPROPERTIES>
<DEBTCONFIGURATION>
<row Key="guid" Value="2018438038"/>
<row Key="status" Value="0"/>
<row Key="forma_pago" Value="DEBITO A CUENTA"/>
<row Key="monto" Value="23699.1"/>
<row Key="monto_abono" Value="360.55"/>
<row Key="entidad" Value="BANCO CAPRICHOSO S.A."/>
<row Key="tipo" Value="PREST. AUTO"/>
<row Key="balance" Value="19617.5"/>
<row Key="KIND_ID" Value="PRINCIPAL"/>
<row Key="TYPE_ID" Value="CEDULA_IDENTIDAD"/>
<row Key="CUSTOMER_ID" Value="777-555-888"/>
<row Key="MEMBER_TYPE" Value="DEUDOR"/>
</DEBTCONFIGURATION>
我有以下代码,它创建了 DataFrame,但是当我尝试附加该行的值时,我不知道为什么它一直出现“None”。
我不知道是否必须更改调用参数,即 Attrib.get。
我也尝试将 attrib.get 更改为 find("value").text 但它给我的错误是它没有文本属性。
import pandas as pd
import xml.etree.ElementTree as ET
xtree = ET.parse("davi_apc.xml")
xroot = xtree.getroot()
df_cols = ["guid", "status", "forma_pago", "monto", "monto_abono", "entidad", "tipo", "balance","KIND_ID", "TYPE_ID", "CUSTOMER_ID", "MEMBER_TYPE"]
rows = []
for node in xroot:
s_guid = node.attrib.get("guid")
s_status = node.attrib.get("status")
s_formapago = node.attrib.get("forma_pago")
s_monto = node.attrib.get("monto")
s_monto_abono = node.attrib.get("monto_abono")
s_entidad = node.attrib.get("entidad")
s_tipo = node.attrib.get("tipo")
s_balance = node.attrib.get("balance")
s_kind_id = node.attrib.get("KIND_ID")
s_type_id = node.attrib.get("TYPE_ID")
s_customer_id = node.attrib.get("CUSTOMER_ID")
s_mebder_type = node.attrib.get("MEMBER_TYPE")
rows.append({
"guid" : s_guid,
"status" : s_status,
"forma_pago" : s_formapago,
"monto" : s_monto,
"monto_abono" : s_monto_abono,
"entidad" : s_entidad,
"tipo" : s_tipo,
"balance" : s_balance,
"KIND_ID" : s_kind_id,
"TYPE_ID" : s_type_id,
"CUSTOMER_ID" : s_customer_id,
"MEMBER_TYPE" : s_mebder_type
})
out_df = pd.DataFrame(rows, columns = df_cols)
这是 print(rows) 的打印输出 [{'guid': None, 'status': None, 'forma_pago': None, 'monto': None, 'monto_abono': None, 'entidad': None, 'tipo': None, 'balance': None, 'KIND_ID': None, 'TYPE_ID': None, 'CUSTOMER_ID': None, 'MEMBER_TYPE': None}]
这是数据框的打印输出
guid status forma_pago monto monto_abono entidad tipo balance KIND_ID
0 None None None None None None None None None
TYPE_ID CUSTOMER_ID MEMBER_TYPE
0 None None None
这是一个可行的解决方案:
1/ 从 xml 文件中删除第一行,我不确定第一个标签是否符合 xml 标准?
<DEBTCONFIGURATION>
<row Key="guid" Value="2018438038"/>
<row Key="status" Value="0"/>
<row Key="forma_pago" Value="DEBITO A CUENTA"/>
<row Key="monto" Value="23699.1"/>
<row Key="monto_abono" Value="360.55"/>
<row Key="entidad" Value="BANCO CAPRICHOSO S.A."/>
<row Key="tipo" Value="PREST. AUTO"/>
<row Key="balance" Value="19617.5"/>
<row Key="KIND_ID" Value="PRINCIPAL"/>
<row Key="TYPE_ID" Value="CEDULA_IDENTIDAD"/>
<row Key="CUSTOMER_ID" Value="777-555-888"/>
<row Key="MEMBER_TYPE" Value="DEUDOR"/>
</DEBTCONFIGURATION>
2/代码:
import pandas as pd
import xml.etree.ElementTree as ET
xtree = ET.parse("davi_apc.xml")
xroot = xtree.getroot()
rows = [{}]
for node in xroot:
print(node.attrib)
rows[0].update({node.attrib['Key']:node.attrib['Value']})
out_df = pd.DataFrame(rows)
3/ out_df 的输出:
out_df.head(10)
guid status ... CUSTOMER_ID MEMBER_TYPE
0 2018438038 0 ... 777-555-888 DEUDOR