如何将 2 列表中的单词与 Python 中没有子字符串匹配的另一个单词字符串进行匹配?

How to match words in 2 list against another string of words without sub-string matching in Python?

我有 2 个包含关键字的列表:

slangNames = [Vikes, Demmies, D, MS Contin]
riskNames = [enough, pop, final, stress, trade]

我还有一本名为 overallDict 的字典,其中包含推文。键值对是 {ID: Tweet text) 例如:

{1:"Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

我试图仅从 slangNames 和 riskNames 中分离出至少有一个关键字的推文。因此,推文必须包含来自 slangNames 的任何关键字和来自 riskNames 的任何关键字。 所以从上面的例子来看,我的代码应该 return 键 1 和 3,即

{1:"Vikes is not enough for me", 3:"pop a D"}. 

但我的代码正在提取子字符串而不是完整的单词。所以基本上,任何带有字母 'D' 的东西都会被拾取。我如何将这些作为整个单词而不是子字符串进行匹配? 请帮忙。谢谢!

到目前为止我的代码如下:

for key in overallDict:
    if any(x in overallDict[key] for x in strippedRisks) and (any(x in overallDict[key] for x in strippedSlangs)):
        output.append(key)

将 slangNames 和 riskNames 存储为集合,拆分字符串并检查是否有任何单词出现在两个集合中

slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split() # split once
    if any(word in slangNames for word in spl) and any(word  in riskNames for word in spl):
        print(k,v)

输出:

1 Vikes is not enough for me
3 pop a D

或不使用set.isdisjoint:

slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split()
    if not slangNames.isdisjoint(spl) and not riskNames.isdisjoint(spl):
        print(k, v)

使用 any 应该是最有效的,因为我们将在第一场比赛中短路。如果两个集合的交集是一个空集合,则这两个集合是不相交的,因此如果 if not slangNames.isdisjoint(spl) 为真,则至少出现一个常用词。

如果 MS Contin 实际上是一个词,您还需要注意:

import re
slangNames = set(["Vikes", "Demmies", "D"])
r = re.compile(r"\bMS Contin\b")
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split()
    if (not slangNames.isdisjoint(spl) or r.search(v)) and not riskNames.isdisjoint(spl):
        print(k,v)