我想从 csv 文件中获取数据

Question

这相当于我的 csv 文件；

customer,quantity
a,250
a,166
c,354
b,185
a,58
d,68
c,263
c,254
d,320
b,176
d,127
...

这个csv文件有8000条数据。我想将客户列中的“a”、“b”、“c”、...“z”与数量列分开。这个csv文件只是一个例子，实际上客户太多了。我不知道客户的名字。我想要的是每个客户都有自己的 csv 文件。我必须使用 python.

来完成它们

对不起我的英语不好。

Answer 1

我不擅长 pandas 模块，但我得到了你想要的这个代码使用户名的 .csv 和文件中的插入 his/her 名称和数量.如果您遇到任何错误，您可以试试这个，然后请在评论中告诉我。 注意：请使用同一数据文件的副本进行尝试

# pip install pandas
import pandas as pd

data = pd.read_csv('abc.csv') # write you csv file name here.

all_customer_value = list(data['customer'])
all_customer_quantity = list(data['quantity'])
all_customer_name = set(data['customer'])

for user in all_customer_name:
    with open(f'{user}.csv','w')as file:
        file.write('customer,quantity\n') # edited

for index,value in  enumerate(all_customer_value):
    with open(f'{value}.csv','a') as file:
        file.write(f'{value}, {all_customer_quantity[index]}\n')

Answer 2

我认为最简单的方法是使用 dict 来存储客户名称和所有相关数量：

from csv import reader, writer

with open('the_file.csv', 'r') as file_in:
    csv_in = reader(file_in)
    customers = {}
    first_line = True
    for cust, qty in csv_in:
        if first_line:
            first_line = False
            continue
        if cust not in customers:
            customers[cust] = [qty]
        else:
            customers[cust].append(qty)
    for cust in customers:
        with open(f'{cust}.csv', 'w', newline='') as file_out:
            csv_out = writer(file_out)
            for qty in customers[cust]:
                csv_out.writerow([cust, qty])

Answer 3

如果您同意 Pandas，我会使用 .groupby() on the dataframe created by .read_csv() (grouping by the customers), and then write the pieces to csv-files with .to_csv()（将 file.csv 替换为您输入的文件名）：

import pandas as pd

for customer, df in pd.read_csv("file.csv").groupby("customer"):
    df.to_csv(f"{customer}.csv", index=False)

如果你想在没有 Pandas 的情况下做或多或少相同的事情，你可以使用 groupby() from the standard library's itertools module and the reader and writer from the standard library's csv 模块：

import csv
from operator import itemgetter
from itertools import groupby

with open("file.csv", "r") as file:
    data = list(csv.reader(file))
header, data = data[0], data[1:]
key = itemgetter(0)
for customer, group in groupby(sorted(data, key=key), key=key):
    with open(f"{customer}.csv", "w") as file:
        writer = csv.writer(file)
        writer.writerow(header)
        writer.writerows(group)

如果您想避免排序，那么@gimix 中的解决方案可能会更好。

我能想到的最有效的解决方案如下，其中没有文件关闭直到结束（通过 ExitStack() 处理）并且输入被直接分类到正确的插槽中：

import csv
from contextlib import ExitStack

writers = {}
with ExitStack() as files:
    reader = csv.reader(files.enter_context(open("file.csv", "r")))
    header = next(reader)
    for row in reader:
        customer = row[0]
        if customer not in writers:
            fout = files.enter_context(open(f"{customer}.csv", "w"))
            writers.setdefault(customer, csv.writer(fout)).writerow(header)
        writers[customer].writerow(row)

这里的权衡：没有中间数据结构，但文件处理簿记。

（我会避免使用可能存在过多文件的解决方案 openings/closings：运行时间可能会显着恶化。）

我对包含 1_000_000 行和 1_000 客户的示例文件进行了一些计时。执行 10 次的时间（使用 timeit）：

Sharim Iqbal 的解决方案（已接受的答案）：933 秒
Pandas 此答案中的版本：16 秒
Python-groupby 此答案中的版本：17 秒
gimix 解决方案的略微修改版本：16 秒
此答案的最新版本：11 秒

因此，如果您想最大程度地减少代码量，请使用 2。否则，请从 3.-5 中选择。但不要使用 1.：这是您不应该这样做的一个示例（而且，老实说，答案不应该被接受）。

PS: 不要朝信使开枪 ;)

我想从 csv 文件中获取数据

I would like to take a data from in the csv file

python

csv