如何读取包含数据块的复杂 txt 文件并将其保存为 python 中的 csv 文件?
How to read complex txt file with blocks of data and save it as csv file in python?
如果我有这样组织的文件
++++++++++++++
Country 1
**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office
Address A, 100 City. Country X
**work time 09h00-16h00<br>9h00-14h00
**www.example.com
**emal@example.com;
**012/345 67 89
**téléfax 123/456 67 89
*******
Address
**Home Office
Address A, 200 City. Country X
**email2@example.com;
**001/000 00 00
**téléfax 111/111 11 11
*******
Address
**Living address
Address 0, 123 City
**info@example.ch
**000/000 00 00
**téléfax 222/222 22 22
++++++++++++++
Country 2
**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office
AAA 11, 30 City
BBB 22, 30 City
**work time 08h00-12h30
**www.example.com
**info@example.com
**000/000 00 00
**téléfax 111/11 11 11
*******
ETC
我想将数据放入包含这些列的 csv 文件中:
Country (Line right after ++++++++++++++), Address (Line right after *******), Office (after **), WorkTime (after **), Website (after **), Email (after **), Phone (after **), Fax (after **)
如何在 Python 中执行此操作?问题是,在某些列表中缺少数据,所以我知道 csv 文件中的某些行最终会变得一团糟,但我不介意在执行此操作后对数据库进行一些手动调整。另一个问题是,国家名称不同,所以我需要使用 ++++++++++++++ 作为分隔符。
我试过这样的东西
import csv
with open('listofdata.txt', 'r') as FILE:
DATA = FILE.read()
LIST = DATA.split('++++++++++++++')
LIST2 = []
LIST3 = []
LIST4 = []
for ITEMS in LIST:
LIST2 = ITEMS.split('*******')
for items2 in LIST2:
LIST3 = items2.split('**')
LIST4.append(LIST3)
with open('file.csv', 'w') as CSV:
for ITEMS in LIST4:
csv.write(ITEMS)
但是没用。
错误:`回溯(最后一次调用):
文件 "test.py",第 22 行,位于
csv.write(物品)
AttributeError: 'module' 对象没有属性 'write'
`
当您保存到 csv 文件时使用 csv.writer。但首先您必须为 listofdata.txt
文件的结构准备解析器,然后您可以将数据保存到 csv 文件。
或者,您可以使用 csv.DictWriter,但您必须先准备解析器。
在最后一行中,您写了文件对象 "csv" 而不是 "CSV",这就是出现错误的原因。
我在您的代码中添加了有关如何在 python 中使用 csv 模块的过程。
您现在要做的就是研究您的解析方法。
代码:
import csv
with open('listofdata.txt', 'r') as FILE:
DATA = FILE.read()
LIST = DATA.split('++++++++++++++')
LIST2 = []
LIST3 = []
LIST4 = []
for ITEMS in LIST:
LIST2 = ITEMS.split('*******')
for items2 in LIST2:
LIST3 = items2.split('**')
LIST4.append(LIST3)
with open('file.csv', 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
for ITEMS in LIST4:
spamwriter.writerow(ITEMS)
输出:
""
"
Country 1
","this sentence is not important.
","date 25.09.2017, also not important
"
"
Address
","Office
Address A, 100 City. Country X
","work time 09h00-16h00<br>9h00-14h00
","www.example.com
","emal@example.com;
","012/345 67 89
","téléfax 123/456 67 89
"
"
Address
","Home Office
Address A, 200 City. Country X
","email2@example.com;
","001/000 00 00
","téléfax 111/111 11 11
"
"
Address
","Living address
Address 0, 123 City
","info@example.ch
","000/000 00 00
","téléfax 222/222 22 22
"
"
Country 2
","this sentence is not important.
","date 25.09.2017, also not important
"
"
Address
","Office
AAA 11, 30 City
BBB 22, 30 City
","work time 08h00-12h30
","www.example.com
","info@example.com
","000/000 00 00
","téléfax 111/11 11 11
"
"
"
如果我有这样组织的文件
++++++++++++++
Country 1
**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office
Address A, 100 City. Country X
**work time 09h00-16h00<br>9h00-14h00
**www.example.com
**emal@example.com;
**012/345 67 89
**téléfax 123/456 67 89
*******
Address
**Home Office
Address A, 200 City. Country X
**email2@example.com;
**001/000 00 00
**téléfax 111/111 11 11
*******
Address
**Living address
Address 0, 123 City
**info@example.ch
**000/000 00 00
**téléfax 222/222 22 22
++++++++++++++
Country 2
**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office
AAA 11, 30 City
BBB 22, 30 City
**work time 08h00-12h30
**www.example.com
**info@example.com
**000/000 00 00
**téléfax 111/11 11 11
*******
ETC
我想将数据放入包含这些列的 csv 文件中:
Country (Line right after ++++++++++++++), Address (Line right after *******), Office (after **), WorkTime (after **), Website (after **), Email (after **), Phone (after **), Fax (after **)
如何在 Python 中执行此操作?问题是,在某些列表中缺少数据,所以我知道 csv 文件中的某些行最终会变得一团糟,但我不介意在执行此操作后对数据库进行一些手动调整。另一个问题是,国家名称不同,所以我需要使用 ++++++++++++++ 作为分隔符。
我试过这样的东西
import csv
with open('listofdata.txt', 'r') as FILE:
DATA = FILE.read()
LIST = DATA.split('++++++++++++++')
LIST2 = []
LIST3 = []
LIST4 = []
for ITEMS in LIST:
LIST2 = ITEMS.split('*******')
for items2 in LIST2:
LIST3 = items2.split('**')
LIST4.append(LIST3)
with open('file.csv', 'w') as CSV:
for ITEMS in LIST4:
csv.write(ITEMS)
但是没用。
错误:`回溯(最后一次调用): 文件 "test.py",第 22 行,位于 csv.write(物品) AttributeError: 'module' 对象没有属性 'write'
`
当您保存到 csv 文件时使用 csv.writer。但首先您必须为 listofdata.txt
文件的结构准备解析器,然后您可以将数据保存到 csv 文件。
或者,您可以使用 csv.DictWriter,但您必须先准备解析器。
在最后一行中,您写了文件对象 "csv" 而不是 "CSV",这就是出现错误的原因。
我在您的代码中添加了有关如何在 python 中使用 csv 模块的过程。
您现在要做的就是研究您的解析方法。
代码:
import csv
with open('listofdata.txt', 'r') as FILE:
DATA = FILE.read()
LIST = DATA.split('++++++++++++++')
LIST2 = []
LIST3 = []
LIST4 = []
for ITEMS in LIST:
LIST2 = ITEMS.split('*******')
for items2 in LIST2:
LIST3 = items2.split('**')
LIST4.append(LIST3)
with open('file.csv', 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
for ITEMS in LIST4:
spamwriter.writerow(ITEMS)
输出:
""
"
Country 1
","this sentence is not important.
","date 25.09.2017, also not important
"
"
Address
","Office
Address A, 100 City. Country X
","work time 09h00-16h00<br>9h00-14h00
","www.example.com
","emal@example.com;
","012/345 67 89
","téléfax 123/456 67 89
"
"
Address
","Home Office
Address A, 200 City. Country X
","email2@example.com;
","001/000 00 00
","téléfax 111/111 11 11
"
"
Address
","Living address
Address 0, 123 City
","info@example.ch
","000/000 00 00
","téléfax 222/222 22 22
"
"
Country 2
","this sentence is not important.
","date 25.09.2017, also not important
"
"
Address
","Office
AAA 11, 30 City
BBB 22, 30 City
","work time 08h00-12h30
","www.example.com
","info@example.com
","000/000 00 00
","téléfax 111/11 11 11
"
"
"