使用 Pandas 和 Python 从数组中获取至少有一个元素的所有行
Get all the rows that have at least one element from array in using Pandas and Python
考虑代码:
import pandas as pd
import csv
bigArrayOfValues = ['XXX' , 'YYY' , 'ZZZ' ....... ........ .........]
# Find which values from the array are in another CSV
with open('...........csv') as inf, open('out.csv','w') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in reader:
for column in row:
if column in bigArrayOfValues:
print('Found: {}'.format(row))
writer.writerow(row)
print('Done...')
我正在尝试从 CSV 文件中提取至少具有数组 bigArrayOfValues
中的一个值的所有行,但是它不起作用(总是生成空 CSV 而没有任何结果) .
知道代码有什么问题吗?
您只需替换以下行:
reader = csv.reader(inf)
作者:
reader = csv.reader(inf, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
如果你想用 pandas 代替,你可以这样做:
import pandas as pd
bigArrayOfValues = ['XXX' , 'YYY' , 'ZZZ']
df=pd.read_csv('input.csv')
def _check_row(row):
for x in row:
if x in bigArrayOfValues:
return True
return False
mask=df.apply(_check_row, axis=1)
out_df=df[mask]
out_df.to_csv('output.csv', index=False)
考虑代码:
import pandas as pd
import csv
bigArrayOfValues = ['XXX' , 'YYY' , 'ZZZ' ....... ........ .........]
# Find which values from the array are in another CSV
with open('...........csv') as inf, open('out.csv','w') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in reader:
for column in row:
if column in bigArrayOfValues:
print('Found: {}'.format(row))
writer.writerow(row)
print('Done...')
我正在尝试从 CSV 文件中提取至少具有数组 bigArrayOfValues
中的一个值的所有行,但是它不起作用(总是生成空 CSV 而没有任何结果) .
知道代码有什么问题吗?
您只需替换以下行:
reader = csv.reader(inf)
作者:
reader = csv.reader(inf, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
如果你想用 pandas 代替,你可以这样做:
import pandas as pd
bigArrayOfValues = ['XXX' , 'YYY' , 'ZZZ']
df=pd.read_csv('input.csv')
def _check_row(row):
for x in row:
if x in bigArrayOfValues:
return True
return False
mask=df.apply(_check_row, axis=1)
out_df=df[mask]
out_df.to_csv('output.csv', index=False)