如何将唯一个体从 csv 文件复制到新文件中
How to copy unique individuals from a csv file into new files
我是 python 的新手,所以请多多包涵。我有一个如下所示的 csv 文件:
.
我正在尝试遍历该文件,并为其中的每个独特个体创建一个新的 csv 文件并复制行。我成功地为一只动物做了这件事,但我在为更通用的方法创建语法时遇到了麻烦。这是我目前拥有的:
import arcpy
import csv
from csv import DictReader
WS = arcpy.env.workspace = raw_input("Where if your workspace")
infile = raw_input("where is your file?")
outfile = raw_input("What is your outfile name?")
arcpy.env.overwriteOutput = True
with open(infile, "r") as csvFile, open(outfile, "w") as out, open("outfile2.csv", "w") as out2:
reader = csv.DictReader(csvFile)
writer = csv.writer(out)
writer.writerow(reader.fieldnames)
for row in reader:
if row["Animal"] == "1":
values = [row[field] for field in reader.fieldnames]
writer.writerow(values)
要将每个 Animal
写入其自己的 CSV 文件,您需要为每种动物打开不同的文件。这可以通过使用字典来存储每只动物的文件对象和 csv 写入器对象来完成。最后,这可以用来正确关闭所有文件:
import csv
output_csvs = {} # e.g. {'1' : [file_object, csv_object]}
with open('input.csv', 'rb') as f_input:
csv_reader = csv.reader(f_input)
header = next(csv_reader)
for row in csv_reader:
animal = row[0]
if animal in output_csvs:
output_csvs[animal][1].writerow(row)
else:
f_output = open('animal_{}.csv'.format(animal), 'wb')
csv_output = csv.writer(f_output)
output_csvs[animal] = [f_output, csv_output]
csv_output.writerow(header)
csv_output.writerow(row)
for csv_file, csv_writer in output_csvs.values():
csv_file.close()
这将为您提供一组根据动物命名的输出 CSV 文件,例如animal_1.csv
或者,如果数据小到可以读入内存,可以使用Python的itertools.groupby()
函数按动物排序,一次输出一个块:
from itertools import groupby
import csv
with open('input.csv', 'rb') as f_input:
csv_reader = csv.reader(f_input)
header = next(csv_reader)
for animal, group in groupby(sorted(csv_reader), lambda x: x[0]):
with open('animal_{}.csv'.format(animal), 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
csv_output.writerows(group)
使用sorted()
可确保将所有同类动物归为一组。如果数据中已经是这种情况,则不需要排序。
要访问这些文件,您可以使用 glob.glob()
:
import matplotlib.pyplot as plt
import glob
for animal_filename in glob.glob('animal_*.csv'):
with open(animal_filename, 'rb') as f_input:
csv_input = csv.reader(f_input)
heading = next(csv_input)
x, y = [], []
for row in csv_input:
x.append(int(row[1]))
y.append(int(row[2]))
fig, ax = plt.subplots()
plt.title(animal_filename)
ax.scatter(x, y)
plt.show()
我是 python 的新手,所以请多多包涵。我有一个如下所示的 csv 文件:
我正在尝试遍历该文件,并为其中的每个独特个体创建一个新的 csv 文件并复制行。我成功地为一只动物做了这件事,但我在为更通用的方法创建语法时遇到了麻烦。这是我目前拥有的:
import arcpy
import csv
from csv import DictReader
WS = arcpy.env.workspace = raw_input("Where if your workspace")
infile = raw_input("where is your file?")
outfile = raw_input("What is your outfile name?")
arcpy.env.overwriteOutput = True
with open(infile, "r") as csvFile, open(outfile, "w") as out, open("outfile2.csv", "w") as out2:
reader = csv.DictReader(csvFile)
writer = csv.writer(out)
writer.writerow(reader.fieldnames)
for row in reader:
if row["Animal"] == "1":
values = [row[field] for field in reader.fieldnames]
writer.writerow(values)
要将每个 Animal
写入其自己的 CSV 文件,您需要为每种动物打开不同的文件。这可以通过使用字典来存储每只动物的文件对象和 csv 写入器对象来完成。最后,这可以用来正确关闭所有文件:
import csv
output_csvs = {} # e.g. {'1' : [file_object, csv_object]}
with open('input.csv', 'rb') as f_input:
csv_reader = csv.reader(f_input)
header = next(csv_reader)
for row in csv_reader:
animal = row[0]
if animal in output_csvs:
output_csvs[animal][1].writerow(row)
else:
f_output = open('animal_{}.csv'.format(animal), 'wb')
csv_output = csv.writer(f_output)
output_csvs[animal] = [f_output, csv_output]
csv_output.writerow(header)
csv_output.writerow(row)
for csv_file, csv_writer in output_csvs.values():
csv_file.close()
这将为您提供一组根据动物命名的输出 CSV 文件,例如animal_1.csv
或者,如果数据小到可以读入内存,可以使用Python的itertools.groupby()
函数按动物排序,一次输出一个块:
from itertools import groupby
import csv
with open('input.csv', 'rb') as f_input:
csv_reader = csv.reader(f_input)
header = next(csv_reader)
for animal, group in groupby(sorted(csv_reader), lambda x: x[0]):
with open('animal_{}.csv'.format(animal), 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
csv_output.writerows(group)
使用sorted()
可确保将所有同类动物归为一组。如果数据中已经是这种情况,则不需要排序。
要访问这些文件,您可以使用 glob.glob()
:
import matplotlib.pyplot as plt
import glob
for animal_filename in glob.glob('animal_*.csv'):
with open(animal_filename, 'rb') as f_input:
csv_input = csv.reader(f_input)
heading = next(csv_input)
x, y = [], []
for row in csv_input:
x.append(int(row[1]))
y.append(int(row[2]))
fig, ax = plt.subplots()
plt.title(animal_filename)
ax.scatter(x, y)
plt.show()