是否可以将 pandas 数据框的每一行转换为预定义的文本文件?
Is there a possibility to convert each row of a pandas data-frame into a predefined text file?
我的数据框看起来是这样的:
我希望将每一行插入到一个预定义的文本文件中,以便这些值在文档中有一个特定的位置。
这是我想出的:
for i in range(len(df)):
with open("%s.xml" %index, "w") as f:
f.write(
"""<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID> """df[Factuurdatum[i]]" </cbc:ID>
<cbc:IssueDate> Totaal </cbc:IssueDate>
<cbc:DueDate> Factuurdatum[i] </cbc:DueDate>"
<cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
我的理想输出是第一行:
<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID> <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID> ""0606194584" </cbc:ID>
<cbc:IssueDate> 12.93 </cbc:IssueDate>
<cbc:DueDate> 2020-09-18 </cbc:DueDate>"
<cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
我理想的输出是第二行:
<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID> <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID> ""20200633369" </cbc:ID>
<cbc:IssueDate> 30.25 </cbc:IssueDate>
<cbc:DueDate> 2020-06-26 </cbc:DueDate>"
<cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
每一行等等。
这样做的可能方法是什么?有人可以帮助我吗?
你快到了。您可以使用字符串格式将您的值插入字符串中,如下所示:
data = "some data i want to insert"
result = "This is what I want to say: {}".format(data)
# or
result = f"This is what I want to say: {data}"
参考文献:
https://docs.python.org/3/library/stdtypes.html?highlight=format#str.format
如果您迭代这些行,您会得到一个 (index, series)
的元组,其中 series
包含单个行的列值。该系列可以扩展为一个 str.format
调用,其中包含您要生成的 xml 的模板。举个简单的例子
>>> df=pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df
A B C
0 1 2 3
1 4 5 6
>>> template = "<xml>\n <a>{A}</a>\n <b>{B}</b>\n <c>{C}</c>\n</xml>"
>>> for row in df.iterrows():
... print(template.format(**row[1]))
...
<xml>
<a>1</a>
<b>2</b>
<c>3</c>
</xml>
<xml>
<a>4</a>
<b>5</b>
<c>6</c>
</xml>
扩展此示例,我将需要的文档分解为罐装周围 xml 文档的样板和用于唯一信息的 {fac_details}
格式变量。我不知道这个数据的好名字所以我称它为“fac”——你会想要更具描述性的东西。我试图使 xml 更加完整,但并未涵盖您感兴趣的所有专栏。
注意:OP 不提供完整的 运行 程序,因此这是未经测试的伪代码。
# xml document to be expanding with per row details
fac_doc_template = """<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
{fac_details}
</cbc:CustomizationID>
</Invoice>"""
# per row details
# todo: expand for all of the column values you want
fac_details_xml_template = """
<cbc:ID>{Factuurnumer}</cbc:ID>
<cbc:IssueDate>{Factuurdatum}</cbc:IssueDate>
"""
def series_to_fac_details_xml(s):
return fac_details_xml_template.format(**s)
for index, row in df.iterrows():
details = series_to_fac_details_xml(row)
with open(f"{index}.xml", "w") as f:
f.write(fac_doc_template.format(fac_details=details))
我的数据框看起来是这样的:
for i in range(len(df)):
with open("%s.xml" %index, "w") as f:
f.write(
"""<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID> """df[Factuurdatum[i]]" </cbc:ID>
<cbc:IssueDate> Totaal </cbc:IssueDate>
<cbc:DueDate> Factuurdatum[i] </cbc:DueDate>"
<cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
我的理想输出是第一行:
<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID> <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID> ""0606194584" </cbc:ID>
<cbc:IssueDate> 12.93 </cbc:IssueDate>
<cbc:DueDate> 2020-09-18 </cbc:DueDate>"
<cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
我理想的输出是第二行:
<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID> <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID> ""20200633369" </cbc:ID>
<cbc:IssueDate> 30.25 </cbc:IssueDate>
<cbc:DueDate> 2020-06-26 </cbc:DueDate>"
<cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
每一行等等。 这样做的可能方法是什么?有人可以帮助我吗?
你快到了。您可以使用字符串格式将您的值插入字符串中,如下所示:
data = "some data i want to insert"
result = "This is what I want to say: {}".format(data)
# or
result = f"This is what I want to say: {data}"
参考文献:
https://docs.python.org/3/library/stdtypes.html?highlight=format#str.format
如果您迭代这些行,您会得到一个 (index, series)
的元组,其中 series
包含单个行的列值。该系列可以扩展为一个 str.format
调用,其中包含您要生成的 xml 的模板。举个简单的例子
>>> df=pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df
A B C
0 1 2 3
1 4 5 6
>>> template = "<xml>\n <a>{A}</a>\n <b>{B}</b>\n <c>{C}</c>\n</xml>"
>>> for row in df.iterrows():
... print(template.format(**row[1]))
...
<xml>
<a>1</a>
<b>2</b>
<c>3</c>
</xml>
<xml>
<a>4</a>
<b>5</b>
<c>6</c>
</xml>
扩展此示例,我将需要的文档分解为罐装周围 xml 文档的样板和用于唯一信息的 {fac_details}
格式变量。我不知道这个数据的好名字所以我称它为“fac”——你会想要更具描述性的东西。我试图使 xml 更加完整,但并未涵盖您感兴趣的所有专栏。
注意:OP 不提供完整的 运行 程序,因此这是未经测试的伪代码。
# xml document to be expanding with per row details
fac_doc_template = """<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
{fac_details}
</cbc:CustomizationID>
</Invoice>"""
# per row details
# todo: expand for all of the column values you want
fac_details_xml_template = """
<cbc:ID>{Factuurnumer}</cbc:ID>
<cbc:IssueDate>{Factuurdatum}</cbc:IssueDate>
"""
def series_to_fac_details_xml(s):
return fac_details_xml_template.format(**s)
for index, row in df.iterrows():
details = series_to_fac_details_xml(row)
with open(f"{index}.xml", "w") as f:
f.write(fac_doc_template.format(fac_details=details))