使用 SimLex-999 评估 word2vec 模型
evaluating word2vec model using SimLex-999
我已经用 Gensim.now 训练了我的模型我想用 simlexx-999 评估我的模型但是它给我错误。
我的代码。
model.wv.evaluate_word_analogies('SimLex-999.txt')
2019-08-25 13:43:22,766 : INFO : Evaluating word analogies for top 300000 words in the model on SimLex-999.txt
错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-60cb96c45579> in <module>()
----> 1 model.wv.evaluate_word_analogies('SimLex-999.txt')
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_analogies(self, analogies, restrict_vocab, case_insensitive, dummy4unknown)
1088 else:
1089 if not section:
-> 1090 raise ValueError("Missing section header before line #%i in %s" % (line_no, analogies))
1091 try:
1092 if case_insensitive:
ValueError: Missing section header before line #0 in SimLex-999.txt
我试过了
from gensim.test.utils import datapath
similarities = model.evaluate_word_pairs(datapath('SimLex-999.txt'))
print(similarities)
但它给了我keyError.Please帮助我解决问题。
KeyError Traceback (most recent call last)
<ipython-input-29-caeb682cb7ff> in <module>()
1 from gensim.test.utils import datapath
2
----> 3 similarities = model.wv.evaluate_word_pairs(datapath('SimLex-999.txt'),dummy4unknown=True)
4
5 print(similarities)
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_pairs(self, pairs, delimiter, restrict_vocab, case_insensitive, dummy4unknown)
1287
1288 """
-> 1289 ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
1290 ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
1291
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in <listcomp>(.0)
1287
1288 """
-> 1289 ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
1290 ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
1291
KeyError: 'movie'
SimLex-999.txt
似乎不是适合作为 evaluate_word_analogies()
函数参数的词类比列表。
你试过evaluate_word_pairs()
功能了吗?它的描述位于:
使用这个:model.wv.evaluate_word_pairs(datapath('simlex999.txt'))
我已经用 Gensim.now 训练了我的模型我想用 simlexx-999 评估我的模型但是它给我错误。 我的代码。
model.wv.evaluate_word_analogies('SimLex-999.txt')
2019-08-25 13:43:22,766 : INFO : Evaluating word analogies for top 300000 words in the model on SimLex-999.txt
错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-60cb96c45579> in <module>()
----> 1 model.wv.evaluate_word_analogies('SimLex-999.txt')
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_analogies(self, analogies, restrict_vocab, case_insensitive, dummy4unknown)
1088 else:
1089 if not section:
-> 1090 raise ValueError("Missing section header before line #%i in %s" % (line_no, analogies))
1091 try:
1092 if case_insensitive:
ValueError: Missing section header before line #0 in SimLex-999.txt
我试过了
from gensim.test.utils import datapath
similarities = model.evaluate_word_pairs(datapath('SimLex-999.txt'))
print(similarities)
但它给了我keyError.Please帮助我解决问题。
KeyError Traceback (most recent call last)
<ipython-input-29-caeb682cb7ff> in <module>()
1 from gensim.test.utils import datapath
2
----> 3 similarities = model.wv.evaluate_word_pairs(datapath('SimLex-999.txt'),dummy4unknown=True)
4
5 print(similarities)
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_pairs(self, pairs, delimiter, restrict_vocab, case_insensitive, dummy4unknown)
1287
1288 """
-> 1289 ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
1290 ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
1291
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in <listcomp>(.0)
1287
1288 """
-> 1289 ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
1290 ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
1291
KeyError: 'movie'
SimLex-999.txt
似乎不是适合作为 evaluate_word_analogies()
函数参数的词类比列表。
你试过evaluate_word_pairs()
功能了吗?它的描述位于:
使用这个:model.wv.evaluate_word_pairs(datapath('simlex999.txt'))