使用另一个 csv 文件中的更新信息更新现有的 csv 文件
update existing csv file with updated info in another csv file
我有 2 个 csv 文件,
- 文件
zulu
有基本信息,分为几列。
- 文件
bommel
只更新了相同列中相同记录的信息。
我想在 Python 中解决这个问题(使用标准库中的 csv
模块)而不需要 Pandas 或其他外部资源。
#!/usr/bin/env python3
import csv
# Define column names
fields = ['capcode', 'discipline', 'region', 'location', 'description', 'remark']
# Open the neccesary files
with open('bommel_db_capcodes.txt', 'r') as readFile_bommel:
with open('results.csv', 'w') as results:
with open('zulu_db_capcodes.txt', 'r') as readFile_zulu:
master = csv.DictReader(readFile_zulu, fieldnames=fields)
update = csv.DictReader(readFile_bommel, fieldnames=fields)
writer = csv.DictWriter(results, fieldnames=fields)
# Saves and skips header to output file
writer.writerow(next(master))
# Goes through whole zulu csv
for row in master:
for row2 in update:
if row['capcode'] in update:
writer.writerow(row2)
else:
writer.writerow(row)
ReadFilezulu.close()
ReadFilebommel.close()
results.close()
zulu
csv 的内容:
capcode,discipline,region,location,description,remark
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
000400002,Brandweer,Groningen,Groningen,,
000400003,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water) (Oost)
000100000,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,Aalsmeer,Postalarm
000100001,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,
000100002,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,Banaanzulu
000100003,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,
bommel
csv 的内容:
capcode,discipline,region,location,description,remark
000100000,Brandweer,Amsterdam-Amstelland,,banaanProefalarm,
000100001,Brandweer,Amsterdam-Amstelland,Aalsmeer,Bevelvoerders,
000100004,Brandweer,Amsterdam-Amstelland,Aalsmeer,Korpsalarm,
当前结果
capcode,discipline,region,location,description,remark
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
预期结果
capcode,discipline,region,location,description,remark < from saving header
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord) < from zulu
000400002,Brandweer,Groningen,Groningen,, < from zulu
000400003,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water) (Oost) < from zulu
000100000,Brandweer,Amsterdam-Amstelland,,banaanProefalarm, < from bommel
000100001,Brandweer,Amsterdam-Amstelland,Aalsmeer,Bevelvoerders, < from bommel
000100002,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,Banaanzulu < from zulu
000100003,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,, < from zulu
000100004,Brandweer,Amsterdam-Amstelland,Aalsmeer,Korpsalarm, < from bommel
关于如何完成这项工作有什么想法吗?
第一次使用条件 in update
时,您消耗了整个输入文件。因为 update
基本上是一个生成器,当你遍历它时你会耗尽它。
此外,您的条件检查更新文件中是否完全同一行,当然它不存在(如果数据完全一样)。
您想将更新行读入内存一次,然后在看到具有相同键的行(不是整行!)时从主文件中跳过这些行!
我假设第一个字段 (capcode
) 是这里的关键,尽管可能有其他安排。
相切地,您可以组合所有 with
语句;当你使用 with open
时,不需要 .close()
任何东西。
#!/usr/bin/env python3
import csv
fields = ['capcode', 'discipline', 'region', 'location', 'description', 'remark']
with open('bommel_db_capcodes.txt', 'r') as readFile_bommel, \
open('results.csv', 'w') as results, \
open('zulu_db_capcodes.txt', 'r') as readFile_zulu:
master = csv.DictReader(readFile_zulu, fieldnames=fields)
update = csv.DictReader(readFile_bommel, fieldnames=fields)
writer = csv.DictWriter(results, fieldnames=fields)
# Save header to output file and skip
writer.writerow(next(master))
# Skip header from updates
next(update)
# Read, remember, and write updated lines
seen = set()
for row in update:
writer.writerow(row)
seen.add(row['capcode'])
for row in master:
if row['capcode'] not in seen:
writer.writerow(row)
我有 2 个 csv 文件,
- 文件
zulu
有基本信息,分为几列。 - 文件
bommel
只更新了相同列中相同记录的信息。
我想在 Python 中解决这个问题(使用标准库中的 csv
模块)而不需要 Pandas 或其他外部资源。
#!/usr/bin/env python3
import csv
# Define column names
fields = ['capcode', 'discipline', 'region', 'location', 'description', 'remark']
# Open the neccesary files
with open('bommel_db_capcodes.txt', 'r') as readFile_bommel:
with open('results.csv', 'w') as results:
with open('zulu_db_capcodes.txt', 'r') as readFile_zulu:
master = csv.DictReader(readFile_zulu, fieldnames=fields)
update = csv.DictReader(readFile_bommel, fieldnames=fields)
writer = csv.DictWriter(results, fieldnames=fields)
# Saves and skips header to output file
writer.writerow(next(master))
# Goes through whole zulu csv
for row in master:
for row2 in update:
if row['capcode'] in update:
writer.writerow(row2)
else:
writer.writerow(row)
ReadFilezulu.close()
ReadFilebommel.close()
results.close()
zulu
csv 的内容:
capcode,discipline,region,location,description,remark
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
000400002,Brandweer,Groningen,Groningen,,
000400003,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water) (Oost)
000100000,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,Aalsmeer,Postalarm
000100001,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,
000100002,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,Banaanzulu
000100003,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,
bommel
csv 的内容:
capcode,discipline,region,location,description,remark
000100000,Brandweer,Amsterdam-Amstelland,,banaanProefalarm,
000100001,Brandweer,Amsterdam-Amstelland,Aalsmeer,Bevelvoerders,
000100004,Brandweer,Amsterdam-Amstelland,Aalsmeer,Korpsalarm,
当前结果
capcode,discipline,region,location,description,remark
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord)
预期结果
capcode,discipline,region,location,description,remark < from saving header
000400001,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water (Noord) < from zulu
000400002,Brandweer,Groningen,Groningen,, < from zulu
000400003,Brandweer,Groningen,Groningen,Regionaal,Pelotonscommandant Logistiek/Water) (Oost) < from zulu
000100000,Brandweer,Amsterdam-Amstelland,,banaanProefalarm, < from bommel
000100001,Brandweer,Amsterdam-Amstelland,Aalsmeer,Bevelvoerders, < from bommel
000100002,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,,Banaanzulu < from zulu
000100003,Brandweer,Amsterdam-Amstelland,Amsterdam-Amstelland,, < from zulu
000100004,Brandweer,Amsterdam-Amstelland,Aalsmeer,Korpsalarm, < from bommel
关于如何完成这项工作有什么想法吗?
第一次使用条件 in update
时,您消耗了整个输入文件。因为 update
基本上是一个生成器,当你遍历它时你会耗尽它。
此外,您的条件检查更新文件中是否完全同一行,当然它不存在(如果数据完全一样)。
您想将更新行读入内存一次,然后在看到具有相同键的行(不是整行!)时从主文件中跳过这些行!
我假设第一个字段 (capcode
) 是这里的关键,尽管可能有其他安排。
相切地,您可以组合所有 with
语句;当你使用 with open
时,不需要 .close()
任何东西。
#!/usr/bin/env python3
import csv
fields = ['capcode', 'discipline', 'region', 'location', 'description', 'remark']
with open('bommel_db_capcodes.txt', 'r') as readFile_bommel, \
open('results.csv', 'w') as results, \
open('zulu_db_capcodes.txt', 'r') as readFile_zulu:
master = csv.DictReader(readFile_zulu, fieldnames=fields)
update = csv.DictReader(readFile_bommel, fieldnames=fields)
writer = csv.DictWriter(results, fieldnames=fields)
# Save header to output file and skip
writer.writerow(next(master))
# Skip header from updates
next(update)
# Read, remember, and write updated lines
seen = set()
for row in update:
writer.writerow(row)
seen.add(row['capcode'])
for row in master:
if row['capcode'] not in seen:
writer.writerow(row)