我想检查 csv 中的值是否存在于另一个 csv 文件中 return 1

Question

我有2个csv文件：一个是dictionary.csv，另一个是file.csv，里面有很多字。我想检查 dictionary.csv 中的单词是否存在于 file.csv 的特定列中。

如果存在，则应创建一个新文件new.csv。该文件应包含来自 file.csv 的所有数据，但有一个额外的列，如果存在则写入 1，如果不存在则写入 0。

这些是我的脚本：

import csv
import pandas as pd

news=pd.read_csv("file.csv")

dictionary=pd.read_csv("dictionary.csv", squeeze=True)

pattern = '|'.join(dictionary)

exist=news['sentences'].str.contains(pattern, na=False)

with open('new.csv', 'w') as outFile:
    for cols in exist:
        if pattern in exist:
            outFile.write(exist, "1")

结果，我得到一个空的 csv 文件，我想我可能遗漏了什么。

file.csv
id      sentences
0        Roses are red
1        burgers are delicious

dictionary.csv
red
blue
green

new.csv 文件应包含以下输出：

id      sentences                exist/not exist
0        Roses are red               1
1        burgers are delicious       0

Answer 1

鉴于我们有

file

   id              sentences
0   0          Roses are red
1   1  burgers are delicious

和

dictionary
       0
0    red
1   blue
2  green

你可以这样做：

words=list(dictionary[0])
file['exist']=file['sentences'].apply(lambda x: len([i for i in words if i in x]))
print(file)

   id              sentences  exist
0   0          Roses are red      1
1   1  burgers are delicious      0

然后就可以保存了：

file.to_csv('new.csv', index=False)

Answer 2

您可以使用 numpy.where to create new column and pandas.DataFrame.to_csv 将结果写入新文件。

news["exist/not exist"] = np.where(
    news["sentences"].str.contains('|'.join(dictionary), na=False),
    1, 0
)

news.to_csv("name.csv", index=False)

我想检查 csv 中的值是否存在于另一个 csv 文件中 return 1

I want to check if a value in csv exist in another csv file return 1

csv

python-3.x

export-to-csv

pandas