Python 从一个 CSV 文件中读取 如果有元素匹配,则从另一个 csv 中搜索相应的行

Python read from one CSV file search the corresponding row from another csv if there is a element match

我想通读第一个 file1.csv,如果 file2.csv 中存在策略,则获取策略的特定 ID 并从 file3.csv 中获取该策略 ID 的计数。 所以我有 3 个 csv 文件 file1.csv file2.csv file3.csv 如下所示,其中有数千个相似的行

file2.csv
Name   Policies
Raj    12345, 676, 909
Sam    786
Lucy   899, 7676, 09

file2.csv
Policies       ID
676, 8787      212
909,898,707    342
89, 98,09      345

file3.csv
ID  Count
212 56
342 23
345 07

所以最后我的最终输出看起来像这样存储在文件或 csv 中。可以使用 panda、numpy 或任何东西

Final.csv
Name  tuple of [Policies, ID, Count]
Raj     [676,212,56]
Raj     [909, 342, 23]
Lucy    [09, 345, 07]

我被下面的代码卡住了:

policyid = csv.reader( 'file2.csv', delimiter=',')
with open('file1.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        data = row['Policies'].split(",")
        if data:
            for policy in data:
                for policy, id in policyid:
                    data2 = policy.split(",")
                        if policy in data2:
                            print id

一种方法是读入所有三个 CSV 文件,从文件 1 中获取一个值,然后扫描文件 2 和文件 3 以获取这些值。这是一个额外的困难,因为字段中的逗号分隔列表是一种反模式,迫使我们做一些额外的工作来解析文本。

另一种方法是将所有三个 CSV 文件加载到 SQL table 或数据帧中并执行一些 JOIN,但逗号分隔的列表仍然使这变得困难。

这是我所描述的示例,尽管这确实很混乱:

import csv

with open('file1.csv') as f:
    reader = csv.DictReader(f)
    next(reader)  # Skip header
    file1 = [row for row in reader]
with open('file2.csv') as f:
    reader = csv.DictReader(f)
    next(reader)  # Skip header
    file2 = [row for row in reader]
with open('file3.csv') as f:
    reader = csv.DictReader(f)
    next(reader)  # Skip header
    file3 = [row for row in reader]


def get_policy_id(policy):
    for line in file2:
        policies = line['policies'].split(', ')
        if policy in policies:
            return line['ID']


def get_id_count(id):
    for line in file3:
        if id == line['id']:
            return line['count']


output = []
for line in file1:
    policies = line['policies'].split(', ')
    for policy in policies:
        id = get_policy_id(policy)
        count = get_id_count(id)
        output.append({'name': line['name'],
                       'policy': policy,
                       'id': id,
                       'count': count})