是否可以将 pandas 数据框的每一行转换为预定义的文本文件?

Is there a possibility to convert each row of a pandas data-frame into a predefined text file?

我的数据框看起来是这样的:

我希望将每一行插入到一个预定义的文本文件中,以便这些值在文档中有一个特定的位置。 这是我想出的:

for i in range(len(df)):
with open("%s.xml" %index, "w") as f:
    f.write(
     """<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
  <cbc:UBLVersionID>2.1</cbc:UBLVersionID>
  <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
  <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
  <cbc:ID> """df[Factuurdatum[i]]" </cbc:ID>
  <cbc:IssueDate> Totaal </cbc:IssueDate>
  <cbc:DueDate> Factuurdatum[i] </cbc:DueDate>"
  <cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
  <cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
  <cac:AccountingSupplierParty>

我的理想输出是第一行:

<?xml version="1.0"?>
    <Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
      <cbc:UBLVersionID>2.1</cbc:UBLVersionID>         <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
      <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
      <cbc:ID> ""0606194584" </cbc:ID>
      <cbc:IssueDate> 12.93 </cbc:IssueDate>
      <cbc:DueDate> 2020-09-18 </cbc:DueDate>"
      <cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
      <cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
      <cac:AccountingSupplierParty>

我理想的输出是第二行:

<?xml version="1.0"?>
    <Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
      <cbc:UBLVersionID>2.1</cbc:UBLVersionID>         <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
      <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
      <cbc:ID> ""20200633369" </cbc:ID>
      <cbc:IssueDate> 30.25 </cbc:IssueDate>
      <cbc:DueDate> 2020-06-26 </cbc:DueDate>"
      <cbc:InvoiceTypeCode listID="UNCL1001" listAgencyID="6">380</cbc:InvoiceTypeCode>
      <cbc:DocumentCurrencyCode>EUR</cbc:DocumentCurrencyCode>
      <cac:AccountingSupplierParty>

每一行等等。 这样做的可能方法是什么?有人可以帮助我吗?

你快到了。您可以使用字符串格式将您的值插入字符串中,如下所示:

data = "some data i want to insert"

result = "This is what I want to say: {}".format(data)
# or
result = f"This is what I want to say: {data}"

参考文献:

https://docs.python.org/3/library/stdtypes.html?highlight=format#str.format

https://docs.python.org/3/library/string.html#formatstrings

如果您迭代这些行,您会得到一个 (index, series) 的元组,其中 series 包含单个行的列值。该系列可以扩展为一个 str.format 调用,其中包含您要生成的 xml 的模板。举个简单的例子

>>> df=pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df
   A  B  C
0  1  2  3
1  4  5  6
>>> template = "<xml>\n  <a>{A}</a>\n  <b>{B}</b>\n  <c>{C}</c>\n</xml>"
>>> for row in df.iterrows():
...     print(template.format(**row[1]))
... 
<xml>
  <a>1</a>
  <b>2</b>
  <c>3</c>
</xml>
<xml>
  <a>4</a>
  <b>5</b>
  <c>6</c>
</xml>

扩展此示例,我将需要的文档分解为罐装周围 xml 文档的样板和用于唯一信息的 {fac_details} 格式变量。我不知道这个数据的好名字所以我称它为“fac”——你会想要更具描述性的东西。我试图使 xml 更加完整,但并未涵盖您感兴趣的所有专栏。

注意:OP 不提供完整的 运行 程序,因此这是未经测试的伪代码。

# xml document to be expanding with per row details
fac_doc_template = """<?xml version="1.0"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
  <cbc:UBLVersionID>2.1</cbc:UBLVersionID>
  <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
  <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
  {fac_details}
</cbc:CustomizationID>
</Invoice>"""

# per row details
# todo: expand for all of the column values you want
fac_details_xml_template = """
<cbc:ID>{Factuurnumer}</cbc:ID>
<cbc:IssueDate>{Factuurdatum}</cbc:IssueDate>
"""

def series_to_fac_details_xml(s):
    return fac_details_xml_template.format(**s)

for index, row in df.iterrows():
    details = series_to_fac_details_xml(row)
    with open(f"{index}.xml", "w") as f:
        f.write(fac_doc_template.format(fac_details=details))