语料库中的名称与 Pandas 数据框中另一列中的名称的部分匹配
Partial Matching of name in a corpus to names in another column in a Pandas dataframe
我有一个这样的数据框
Name Corpus
0 James Bond Junior Bristleback Agent James Bond went missing
1 Batman Bin Superman Superman saves the day again
2 Thor S/O Odin Loki was last seen in March 2020
我希望得到这个输出。
Name Corpus Value
0 James Bond Junior Bristleback Agent James Bond went missing True
1 Batman Bin Superman Superman saves the day again True
2 Thor S/O Odin Loki was last seen in March 2020 False
我以前尝试过正则表达式,但似乎无法获得所需的输出。无论如何,有没有办法用正则表达式或其他一些东西来实现这个libraries/packages?
不确定这是否完全符合您的需求。它本质上是将每个句子转换成一组单词,并检查是否有重叠:
df.Name.str.split().apply(set) & df.Corpus.str.split().apply(set)
输出:
0 True
1 True
2 False
dtype: bool
我有一个这样的数据框
Name Corpus
0 James Bond Junior Bristleback Agent James Bond went missing
1 Batman Bin Superman Superman saves the day again
2 Thor S/O Odin Loki was last seen in March 2020
我希望得到这个输出。
Name Corpus Value
0 James Bond Junior Bristleback Agent James Bond went missing True
1 Batman Bin Superman Superman saves the day again True
2 Thor S/O Odin Loki was last seen in March 2020 False
我以前尝试过正则表达式,但似乎无法获得所需的输出。无论如何,有没有办法用正则表达式或其他一些东西来实现这个libraries/packages?
不确定这是否完全符合您的需求。它本质上是将每个句子转换成一组单词,并检查是否有重叠:
df.Name.str.split().apply(set) & df.Corpus.str.split().apply(set)
输出:
0 True
1 True
2 False
dtype: bool