如何在 python 中将一个 csv 拆分为多个文件
How to split one csv into multiple files in python
我有一个 csv 文件 (world.csv) 看起来像这样 :
"city","city_alt","lat","lng","country"
"Mjekić","42.6781","20.9728","Kosovo"
"Mjekiff","42.6781","20.9728","Kosovo"
"paris","42.6781","10.9728","France"
"Bordeau","16.6781","52.9728","France"
"Menes","02.6781","50.9728","Morocco"
"Fess","6.6781","3.9728","Morocco"
"Tanger","8.6781","5.9728","Morocco"
我想像这样按国家/地区将其拆分为多个文件:
Kosovo.csv :
"city","city_alt","lat","lng","country"
"Mjekić","42.6781","20.9728","Kosovo"
"Mjekiff","42.6781","20.9728","Kosovo"
France.csv :
"city","city_alt","lat","lng","country"
"paris","42.6781","10.9728","France"
"Bordeau","16.6781","52.9728","France"
Morroco.csv :
"city","city_alt","lat","lng","country"
"Menes","02.6781","50.9728","Morocco"
"Fess","6.6781","3.9728","Morocco"
"Tanger","8.6781","5.9728","Morocco"
试试这个:
根据国家/地区名称过滤列。然后使用 to_csv
in pandas
将其转换为 csv 文件
df = pd.read_csv('test.csv')
france = df[df['country']=='France']
kosovo = df[df['country']=='Kosovo']
morocco = df[df['country']=='Morocco']
france.to_csv('france.csv', index=False)
kosovo.to_csv('kosovo.csv', index=False)
morocco.to_csv('morocco.csv', index=False)
如果您不能使用 pandas,您可以使用内置的 csv
模块和 itertools.groupby()
函数。您可以使用它按国家/地区分组。
from itertools import groupby
import csv
with open('world.csv') as csv_file:
reader = csv.reader(csv_file)
next(reader) #skip header
#Group by column (country)
lst = sorted(reader, key=lambda x : x[4])
groups = groupby(lst, key=lambda x : x[4])
#Write file for each country
for k,g in groups:
filename = k + '.csv'
with open(filename, 'w', newline='') as fout:
csv_output = csv.writer(fout)
csv_output.writerow(["city","city_alt","lat","lng","country"]) #header
for line in g:
csv_output.writerow(line)
最简单的方法如下:
#在你的工作目录中创建一个名为“adata”的文件夹
#import glob
for i,g in df.groupby('CITY'):
g.to_csv('adata\{}.csv'.format(i), header=True, index_label='Index')
print(glob.glob('adata\*.csv'))
filenames = sorted(glob.glob('adata\*.csv'))
for f in filenames:
#your intended processes
我有一个 csv 文件 (world.csv) 看起来像这样 :
"city","city_alt","lat","lng","country"
"Mjekić","42.6781","20.9728","Kosovo"
"Mjekiff","42.6781","20.9728","Kosovo"
"paris","42.6781","10.9728","France"
"Bordeau","16.6781","52.9728","France"
"Menes","02.6781","50.9728","Morocco"
"Fess","6.6781","3.9728","Morocco"
"Tanger","8.6781","5.9728","Morocco"
我想像这样按国家/地区将其拆分为多个文件:
Kosovo.csv :
"city","city_alt","lat","lng","country"
"Mjekić","42.6781","20.9728","Kosovo"
"Mjekiff","42.6781","20.9728","Kosovo"
France.csv :
"city","city_alt","lat","lng","country"
"paris","42.6781","10.9728","France"
"Bordeau","16.6781","52.9728","France"
Morroco.csv :
"city","city_alt","lat","lng","country"
"Menes","02.6781","50.9728","Morocco"
"Fess","6.6781","3.9728","Morocco"
"Tanger","8.6781","5.9728","Morocco"
试试这个:
根据国家/地区名称过滤列。然后使用 to_csv
in pandas
df = pd.read_csv('test.csv')
france = df[df['country']=='France']
kosovo = df[df['country']=='Kosovo']
morocco = df[df['country']=='Morocco']
france.to_csv('france.csv', index=False)
kosovo.to_csv('kosovo.csv', index=False)
morocco.to_csv('morocco.csv', index=False)
如果您不能使用 pandas,您可以使用内置的 csv
模块和 itertools.groupby()
函数。您可以使用它按国家/地区分组。
from itertools import groupby
import csv
with open('world.csv') as csv_file:
reader = csv.reader(csv_file)
next(reader) #skip header
#Group by column (country)
lst = sorted(reader, key=lambda x : x[4])
groups = groupby(lst, key=lambda x : x[4])
#Write file for each country
for k,g in groups:
filename = k + '.csv'
with open(filename, 'w', newline='') as fout:
csv_output = csv.writer(fout)
csv_output.writerow(["city","city_alt","lat","lng","country"]) #header
for line in g:
csv_output.writerow(line)
最简单的方法如下: #在你的工作目录中创建一个名为“adata”的文件夹 #import glob
for i,g in df.groupby('CITY'):
g.to_csv('adata\{}.csv'.format(i), header=True, index_label='Index')
print(glob.glob('adata\*.csv'))
filenames = sorted(glob.glob('adata\*.csv'))
for f in filenames:
#your intended processes