将一个同义词集列表迭代到另一个

Iterate one list of synsets over another

我有两组 wordnet 同义词集(包含在两个单独的列表对象 s1 和 s2 中),我想从中找到 s1 到 s2 中每个同义词集的最大路径相似度分数,输出长度等于s1。例如,如果 s1 包含 4 个同义词集,则输出的长度应为 4。

我已经尝试了以下代码(到目前为止):

import numpy as np
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd

#two wordnet synsets (s1, s2)

s1 = [wn.synset('be.v.01'),
 wn.synset('angstrom.n.01'),
 wn.synset('trial.n.02'),
 wn.synset('function.n.01')]

s2 = [wn.synset('use.n.01'),
 wn.synset('function.n.01'),
 wn.synset('check.n.01'),
 wn.synset('code.n.01'),
 wn.synset('inch.n.01'),
 wn.synset('be.v.01'),
 wn.synset('correct.v.01')]
 
# define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1

ps_list = []
def similarity_score(s1, s2):
    for word1 in s1:
        best = max(wn.path_similarity(word1, word2) for word2 in s2)
        ps_list.append(best)
    return ps_list

ps_list(s1, s2)

但是它returns下面的错误信息

'>' not supported between instances of 'NoneType' and 'float'

我不知道代码是怎么回事。有人愿意看一下我的代码并分享 his/her 对 for 循环的见解吗?将不胜感激。

谢谢。

完整的错误回溯在这里

TypeError                                 Traceback (most recent call last)
<ipython-input-73-4506121e17dc> in <module>()
     38     return word_list
     39 
---> 40 s = similarity_score(s1, s2)
     41 
     42 

<ipython-input-73-4506121e17dc> in similarity_score(s1, s2)
     33 def similarity_score(s1, s2):
     34     for word1 in s1:
---> 35         best = max(wn.path_similarity(word1, word2) for word2 in s2)
     36         word_list.append(best)
     37 

TypeError: '>' not supported between instances of 'NoneType' and 'float'

[编辑] 我想到了这个临时解决方案:

s_list = []
for word1 in s1:
    best = [word1.path_similarity(word2) for word2 in s2]
    b = pd.Series(best).max()
    s_list.append(b)

虽然不优雅,但很管用。想知道是否有人有更好的解决方案或方便的技巧来处理这个问题?

我没有使用 nltk 模块的经验,但是通过阅读文档我可以看到 path_similarity 是任何对象 wn.synset(args) returns 的方法。您而是将其视为一个函数。

你应该做的是这样的:

ps_list = []
for word1 in s1:
    best = max(word1.path_similarity(word2) for word2 in s2) #path_similarity is a method of each synset
    ps_list.append(best)

我认为错误来自以下方面:

best = max(wn.path_similarity(word1, word2) for word2 in s2)

如果 wn.path_similarity(word1, word2) 是 NoneType,你应该添加一个条件,那么你不能做 max() ,例如你可以重写这个:

best = max([word1.path_similarity(word2) for word2 in s2 if word1.path_similarity(word2) is not None])