根据 python 中的另一个列表过滤列表

Question

我正在尝试使用以下代码根据另一个 list2 过滤 list1：

import csv

with open('screen.csv') as f: #A file with a list of all the article titles
    reader = csv.reader(f)
    list1 = list(reader)

print(list1)

list2 = ["Knowledge Management", "modeling language"] #key words that article title should have (at least one of them)
list2 = [str(x) for x in list2]

occur = [i for i in list1  for j in list2 if str(j) in i]

print(occur)

但输出为空。

我的 list1 看起来像这样：

Answer 1

import pandas as pd 
import numpy as np
df = pd.DataFrame(data) 
print(df[df.column_of_list.map(lambda x: np.isin(x, another_list).all())])
#OR
print(df[df[0].map(lambda x: np.isin(x, another_list).all())])

尝试使用真实数据：

import numpy as np
import pandas as pd 
data = ["Knowledge Management", "modeling language"]
another_list=["modeling language","natural language"]
df = pd.DataFrame(data) 
a = df[df[0].map(lambda x: np.isin(x, another_list).all())]

print(a)

Answer 2

list_1 实际上是一个列表列表，而不是字符串列表，因此在尝试比较元素之前需要将其展平（例如通过 doing this）：

list_1 = [['foo bar'], ['baz beep bop']]
list_2 = ['foo', 'bub']

flattened_list_1 = [
    element 
    for sublist in list_1 
    for element in sublist
]
occurrences = [
    phrase 
    for phrase in flattened_list_1 if any(
        word in phrase 
        for word in list_2
    )
]
print(occurrences)

# output:
# ['foo bar']

Answer 3

您的 list1 是一个列表列表，因为您用来创建它的 csv.reader 总是 returns 列出每一行，即使只有一个项目. （如果你希望每一行只有一个名字，我不确定你为什么在这里使用 csv，这只会成为一个障碍。）

稍后当您检查 if str(j) in i 作为过滤列表理解的一部分时，您是在测试字符串 j 是否出现在列表 i 中。由于 list2 中的值不是完整标题而是 key-phrases，您不会找到任何匹配项。如果您正在检查内部字符串，则会进行子字符串检查，但是当您测试列表成员时，它必须是完全匹配的。

可能解决此问题的最佳方法是取消 list1 中的嵌套列表。尝试创建它：

with open('screen.csv') as f:
    list1 = [line.strip() for line in f]

根据 python 中的另一个列表过滤列表

Filter list based on another list in python

python

filtering