Python 从一个 CSV 文件中读取 如果有元素匹配,则从另一个 csv 中搜索相应的行
Python read from one CSV file search the corresponding row from another csv if there is a element match
我想通读第一个 file1.csv,如果 file2.csv 中存在策略,则获取策略的特定 ID 并从 file3.csv 中获取该策略 ID 的计数。
所以我有 3 个 csv 文件 file1.csv file2.csv file3.csv 如下所示,其中有数千个相似的行
file2.csv
Name Policies
Raj 12345, 676, 909
Sam 786
Lucy 899, 7676, 09
file2.csv
Policies ID
676, 8787 212
909,898,707 342
89, 98,09 345
file3.csv
ID Count
212 56
342 23
345 07
所以最后我的最终输出看起来像这样存储在文件或 csv 中。可以使用 panda、numpy 或任何东西
Final.csv
Name tuple of [Policies, ID, Count]
Raj [676,212,56]
Raj [909, 342, 23]
Lucy [09, 345, 07]
我被下面的代码卡住了:
policyid = csv.reader( 'file2.csv', delimiter=',')
with open('file1.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
data = row['Policies'].split(",")
if data:
for policy in data:
for policy, id in policyid:
data2 = policy.split(",")
if policy in data2:
print id
一种方法是读入所有三个 CSV 文件,从文件 1 中获取一个值,然后扫描文件 2 和文件 3 以获取这些值。这是一个额外的困难,因为字段中的逗号分隔列表是一种反模式,迫使我们做一些额外的工作来解析文本。
另一种方法是将所有三个 CSV 文件加载到 SQL table 或数据帧中并执行一些 JOIN,但逗号分隔的列表仍然使这变得困难。
这是我所描述的示例,尽管这确实很混乱:
import csv
with open('file1.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file1 = [row for row in reader]
with open('file2.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file2 = [row for row in reader]
with open('file3.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file3 = [row for row in reader]
def get_policy_id(policy):
for line in file2:
policies = line['policies'].split(', ')
if policy in policies:
return line['ID']
def get_id_count(id):
for line in file3:
if id == line['id']:
return line['count']
output = []
for line in file1:
policies = line['policies'].split(', ')
for policy in policies:
id = get_policy_id(policy)
count = get_id_count(id)
output.append({'name': line['name'],
'policy': policy,
'id': id,
'count': count})
我想通读第一个 file1.csv,如果 file2.csv 中存在策略,则获取策略的特定 ID 并从 file3.csv 中获取该策略 ID 的计数。 所以我有 3 个 csv 文件 file1.csv file2.csv file3.csv 如下所示,其中有数千个相似的行
file2.csv
Name Policies
Raj 12345, 676, 909
Sam 786
Lucy 899, 7676, 09
file2.csv
Policies ID
676, 8787 212
909,898,707 342
89, 98,09 345
file3.csv
ID Count
212 56
342 23
345 07
所以最后我的最终输出看起来像这样存储在文件或 csv 中。可以使用 panda、numpy 或任何东西
Final.csv
Name tuple of [Policies, ID, Count]
Raj [676,212,56]
Raj [909, 342, 23]
Lucy [09, 345, 07]
我被下面的代码卡住了:
policyid = csv.reader( 'file2.csv', delimiter=',')
with open('file1.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
data = row['Policies'].split(",")
if data:
for policy in data:
for policy, id in policyid:
data2 = policy.split(",")
if policy in data2:
print id
一种方法是读入所有三个 CSV 文件,从文件 1 中获取一个值,然后扫描文件 2 和文件 3 以获取这些值。这是一个额外的困难,因为字段中的逗号分隔列表是一种反模式,迫使我们做一些额外的工作来解析文本。
另一种方法是将所有三个 CSV 文件加载到 SQL table 或数据帧中并执行一些 JOIN,但逗号分隔的列表仍然使这变得困难。
这是我所描述的示例,尽管这确实很混乱:
import csv
with open('file1.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file1 = [row for row in reader]
with open('file2.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file2 = [row for row in reader]
with open('file3.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file3 = [row for row in reader]
def get_policy_id(policy):
for line in file2:
policies = line['policies'].split(', ')
if policy in policies:
return line['ID']
def get_id_count(id):
for line in file3:
if id == line['id']:
return line['count']
output = []
for line in file1:
policies = line['policies'].split(', ')
for policy in policies:
id = get_policy_id(policy)
count = get_id_count(id)
output.append({'name': line['name'],
'policy': policy,
'id': id,
'count': count})