使用数据框中的关键字来检测另一个数据框或字符串中是否存在任何关键字

Question

我有两个问题：首先是...

我有一个数据框，其中包含这样的类别和关键字：

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

另一个像这样的数据框：

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

我想要这样的最终结果：

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

第二个是...

我应该能够检查字符串是否包含任何关键字，如果为真，则输出适当类别的列表。

示例：sample_sentence = "This line contains a red plum?"

输出：

result_list = ['color','Fruit']

编辑：它有点相似但不是 same.Use 这个供参考：

编辑 2：

我还有另一个版本的第一个数据框，如下所示：

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

Answer 1

您可以使用列表理解来实现此目的：

数据帧设置：

df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
 'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
  1: 'We found a bushel of fruit. They are red.',
  2: 'There is a peck of pears that taste sweet.',
  3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')

代码：

df2['Category'] = (df2['Summary'].str.split(' ').apply(
    lambda x: list(set([str(a) for y in 
                        x for a,b in 
                        zip(df1['Category'], df1['Keywords']) for c in 
                        b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
                        str(y)]))).str.join(', '))
df2

输出：

Out[1]: 
                                          Summary      Category
0  This is a basket of red apples. They are sour.  Fruit, Color
1       We found a bushel of fruit. They are red.         Color
2      There is a peck of pears that taste sweet.         Fruit
3                         We have a box of plums.         Fruit

a、b 和 x 遍历 rows（垂直）。 c 和 y 遍历列表在行内（水平）。为了开始水平地遍历列表，您首先需要垂直地遍历行。这就是我们拥有所有这些变量的原因（见图）。您可以使用 zip 同时遍历第一个数据帧的两列或多列。

使用数据框中的关键字来检测另一个数据框或字符串中是否存在任何关键字

Use keywords from dataframe to detect if any present in another dataframe or string

python

filtering

dataframe

pandas