将一个同义词集列表迭代到另一个
Iterate one list of synsets over another
我有两组 wordnet 同义词集(包含在两个单独的列表对象 s1 和 s2 中),我想从中找到 s1 到 s2 中每个同义词集的最大路径相似度分数,输出长度等于s1。例如,如果 s1 包含 4 个同义词集,则输出的长度应为 4。
我已经尝试了以下代码(到目前为止):
import numpy as np
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd
#two wordnet synsets (s1, s2)
s1 = [wn.synset('be.v.01'),
wn.synset('angstrom.n.01'),
wn.synset('trial.n.02'),
wn.synset('function.n.01')]
s2 = [wn.synset('use.n.01'),
wn.synset('function.n.01'),
wn.synset('check.n.01'),
wn.synset('code.n.01'),
wn.synset('inch.n.01'),
wn.synset('be.v.01'),
wn.synset('correct.v.01')]
# define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1
ps_list = []
def similarity_score(s1, s2):
for word1 in s1:
best = max(wn.path_similarity(word1, word2) for word2 in s2)
ps_list.append(best)
return ps_list
ps_list(s1, s2)
但是它returns下面的错误信息
'>' not supported between instances of 'NoneType' and 'float'
我不知道代码是怎么回事。有人愿意看一下我的代码并分享 his/her 对 for 循环的见解吗?将不胜感激。
谢谢。
完整的错误回溯在这里
TypeError Traceback (most recent call last)
<ipython-input-73-4506121e17dc> in <module>()
38 return word_list
39
---> 40 s = similarity_score(s1, s2)
41
42
<ipython-input-73-4506121e17dc> in similarity_score(s1, s2)
33 def similarity_score(s1, s2):
34 for word1 in s1:
---> 35 best = max(wn.path_similarity(word1, word2) for word2 in s2)
36 word_list.append(best)
37
TypeError: '>' not supported between instances of 'NoneType' and 'float'
[编辑]
我想到了这个临时解决方案:
s_list = []
for word1 in s1:
best = [word1.path_similarity(word2) for word2 in s2]
b = pd.Series(best).max()
s_list.append(b)
虽然不优雅,但很管用。想知道是否有人有更好的解决方案或方便的技巧来处理这个问题?
我没有使用 nltk 模块的经验,但是通过阅读文档我可以看到 path_similarity 是任何对象 wn.synset(args)
returns 的方法。您而是将其视为一个函数。
你应该做的是这样的:
ps_list = []
for word1 in s1:
best = max(word1.path_similarity(word2) for word2 in s2) #path_similarity is a method of each synset
ps_list.append(best)
我认为错误来自以下方面:
best = max(wn.path_similarity(word1, word2) for word2 in s2)
如果 wn.path_similarity(word1, word2) 是 NoneType,你应该添加一个条件,那么你不能做 max() ,例如你可以重写这个:
best = max([word1.path_similarity(word2) for word2 in s2 if word1.path_similarity(word2) is not None])
我有两组 wordnet 同义词集(包含在两个单独的列表对象 s1 和 s2 中),我想从中找到 s1 到 s2 中每个同义词集的最大路径相似度分数,输出长度等于s1。例如,如果 s1 包含 4 个同义词集,则输出的长度应为 4。
我已经尝试了以下代码(到目前为止):
import numpy as np
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd
#two wordnet synsets (s1, s2)
s1 = [wn.synset('be.v.01'),
wn.synset('angstrom.n.01'),
wn.synset('trial.n.02'),
wn.synset('function.n.01')]
s2 = [wn.synset('use.n.01'),
wn.synset('function.n.01'),
wn.synset('check.n.01'),
wn.synset('code.n.01'),
wn.synset('inch.n.01'),
wn.synset('be.v.01'),
wn.synset('correct.v.01')]
# define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1
ps_list = []
def similarity_score(s1, s2):
for word1 in s1:
best = max(wn.path_similarity(word1, word2) for word2 in s2)
ps_list.append(best)
return ps_list
ps_list(s1, s2)
但是它returns下面的错误信息
'>' not supported between instances of 'NoneType' and 'float'
我不知道代码是怎么回事。有人愿意看一下我的代码并分享 his/her 对 for 循环的见解吗?将不胜感激。
谢谢。
完整的错误回溯在这里
TypeError Traceback (most recent call last)
<ipython-input-73-4506121e17dc> in <module>()
38 return word_list
39
---> 40 s = similarity_score(s1, s2)
41
42
<ipython-input-73-4506121e17dc> in similarity_score(s1, s2)
33 def similarity_score(s1, s2):
34 for word1 in s1:
---> 35 best = max(wn.path_similarity(word1, word2) for word2 in s2)
36 word_list.append(best)
37
TypeError: '>' not supported between instances of 'NoneType' and 'float'
[编辑] 我想到了这个临时解决方案:
s_list = []
for word1 in s1:
best = [word1.path_similarity(word2) for word2 in s2]
b = pd.Series(best).max()
s_list.append(b)
虽然不优雅,但很管用。想知道是否有人有更好的解决方案或方便的技巧来处理这个问题?
我没有使用 nltk 模块的经验,但是通过阅读文档我可以看到 path_similarity 是任何对象 wn.synset(args)
returns 的方法。您而是将其视为一个函数。
你应该做的是这样的:
ps_list = []
for word1 in s1:
best = max(word1.path_similarity(word2) for word2 in s2) #path_similarity is a method of each synset
ps_list.append(best)
我认为错误来自以下方面:
best = max(wn.path_similarity(word1, word2) for word2 in s2)
如果 wn.path_similarity(word1, word2) 是 NoneType,你应该添加一个条件,那么你不能做 max() ,例如你可以重写这个:
best = max([word1.path_similarity(word2) for word2 in s2 if word1.path_similarity(word2) is not None])