将两个列表列表与数据框列进行比较 python

comparing two list of lists with a dataframe column python

我想将两个列表列表与数据框列进行比较。
list1=[[r2,r4,r6],[r6,r7]]
list2=[[p4,p5,p8],[p86,p21,p0,p94]]

数据集:

rid pid value
r2 p0 banana
r2 p4 chocolate
r4 p89 apple
r6 p5 milk
r7 p0 bread

输出:

[[chocolate,milk],[bread]]

由于 r2p4 出现在数据集的 list1[0]list2[0] 同一行 中,所以 chocolate必须存储。类似地,r6p5 出现在数据集中同一位置和同一行的两个列表中,必须存储 milk

您可以按照以下方式进行:

from itertools import product

df = pd.DataFrame({'rid': {0: 'r2', 1: 'r2', 2: 'r4', 3: 'r6', 4: 'r7'},
 'pid': {0: 'p0', 1: 'p4', 2: 'p89', 3: 'p5', 4: 'p0'},
 'value': {0: 'banana', 1: 'chocolate', 2: 'apple', 3: 'milk', 4: 'bread'}})
list1 = [['r2','r4','r6'],['r6','r7']]
list2 = [['p4','p5','p8'],['p86','p21','p0','p94']]

# Generate all possible associations.
associations = (product(l1, l2) for l1, l2 in zip(list1, list2))

# Index for speed and convenience of the lookup.
df = df.set_index(['rid', 'pid']).sort_index()

output = [[df.loc[assoc, 'value'] for assoc in assoc_list if assoc in df.index] 
          for assoc_list in associations]

print(output)
[['chocolate', 'milk'], ['bread']]

回答

result = []
for l1, l2 in zip(list1, list2):
    res = df.loc[df["rid"].isin(l1) & df["pid"].isin(l2)]["value"].tolist()
    result.append(res)
[['chocolate', 'milk'], ['bread']]

解释一下

  • zip 将合并两个列表,相当于
for i in range(len(list1)):
    l1 = list1[i]
    l2 = list2[i]
  • df["rid"].isin(l1) & df["pid"].isin(l2) 将条件与 and operator &
  • 组合

注意力

  • list1和list2的长度必须相等,否则zip会忽略较长列表的剩余元素。