使用 Pandas 和 Python 从数组中获取至少有一个元素的所有行

Get all the rows that have at least one element from array in using Pandas and Python

考虑代码:

import pandas as pd
import csv

bigArrayOfValues = ['XXX' , 'YYY' , 'ZZZ' ....... ........ .........]

# Find which values from the array are in another CSV
with open('...........csv') as inf, open('out.csv','w') as outf:
    reader = csv.reader(inf)
    writer = csv.writer(outf, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in reader:        
        for column in row:
            if column in bigArrayOfValues:
                print('Found: {}'.format(row))
                writer.writerow(row)


print('Done...')    

我正在尝试从 CSV 文件中提取至少具有数组 bigArrayOfValues 中的一个值的所有行,但是它不起作用(总是生成空 CSV 而没有任何结果) .

知道代码有什么问题吗?

您只需替换以下行:

reader = csv.reader(inf)

作者:

reader = csv.reader(inf, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)

如果你想用 pandas 代替,你可以这样做:

import pandas as pd

bigArrayOfValues = ['XXX' , 'YYY' , 'ZZZ']

df=pd.read_csv('input.csv')

def _check_row(row):
    for x in row:
        if x in bigArrayOfValues:
            return True
    return False

mask=df.apply(_check_row, axis=1)

out_df=df[mask]

out_df.to_csv('output.csv', index=False)