预训练向量未加载 spacy
pretrained vectors not loading in spacy
我正在使用 spacy.blank("en") 模型从头开始训练自定义 NER 模型。我向其中添加自定义词向量。向量加载如下:
from gensim.models.word2vec import Word2Vec
from gensim.models import KeyedVectors
med_vec = KeyedVectors.load_word2vec_format('./wikipedia-pubmed-and-PMC-w2v.bin', binary=True, limit = 300000)
然后我将它添加到此代码片段中的空白模型中:
def main(model=None, n_iter=3, output_dir=None):
"""Set up the pipeline and entity recognizer, and train the new entity."""
random.seed(0)
if model is not None:
nlp = spacy.load(model) # load existing spaCy model
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank("en") # create blank Language class
nlp.vocab.reset_vectors(width=200)
for idx in range(len(med_vec.index2word)):
word = med_vec.index2word[idx]
vector = med_vec.vectors[idx]
nlp.vocab.set_vector(word, vector)
for key, vector in nlp.vocab.vectors.items():
nlp.vocab.strings.add(nlp.vocab.strings[key])
nlp.vocab.vectors.name = 'spacy_pretrained_vectors'
print("Created blank 'en' model")
......Code for training the ner
然后我保存这个模型。
当我尝试加载模型时,
nlp = spacy.load("./NDLA/vectorModel0")
我收到以下错误:
`~\AppData\Local\Continuum\anaconda3\lib\site-packages\thinc\neural\_classes\static_vectors.py in __init__(self, lang, nO, drop_factor, column)
47 if self.nM == 0:
48 raise ValueError(
---> 49 "Cannot create vectors table with dimension 0.\n"
50 "If you're using pre-trained vectors, are the vectors loaded?"
51 )
ValueError: Cannot create vectors table with dimension 0.
If you're using pre-trained vectors, are the vectors loaded?
我也收到这个警告:
UserWarning: [W019] Changing vectors name from spacy_pretrained_vectors to spacy_pretrained_vectors_336876, to avoid clash with previously loaded vectors. See Issue #3853.
"__main__", mod_spec)
模型中的 vocab 目录有一个大小为 270 MB 的矢量文件。所以我知道它不是空的......是什么导致了这个错误?
您可以尝试一次传递所有向量,而不是使用 for 循环。
nlp.vocab.vectors = spacy.vocab.Vectors(data=med_vec.syn0, keys=med_vec.vocab.keys())
所以你的 else 语句会变成这样:
else:
nlp = spacy.blank("en") # create blank Language class
nlp.vocab.reset_vectors(width=200)
nlp.vocab.vectors = spacy.vocab.Vectors(data=med_vec.syn0, keys=med_vec.vocab.keys())
nlp.vocab.vectors.name = 'spacy_pretrained_vectors'
print("Created blank 'en' model")
我正在使用 spacy.blank("en") 模型从头开始训练自定义 NER 模型。我向其中添加自定义词向量。向量加载如下:
from gensim.models.word2vec import Word2Vec
from gensim.models import KeyedVectors
med_vec = KeyedVectors.load_word2vec_format('./wikipedia-pubmed-and-PMC-w2v.bin', binary=True, limit = 300000)
然后我将它添加到此代码片段中的空白模型中:
def main(model=None, n_iter=3, output_dir=None):
"""Set up the pipeline and entity recognizer, and train the new entity."""
random.seed(0)
if model is not None:
nlp = spacy.load(model) # load existing spaCy model
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank("en") # create blank Language class
nlp.vocab.reset_vectors(width=200)
for idx in range(len(med_vec.index2word)):
word = med_vec.index2word[idx]
vector = med_vec.vectors[idx]
nlp.vocab.set_vector(word, vector)
for key, vector in nlp.vocab.vectors.items():
nlp.vocab.strings.add(nlp.vocab.strings[key])
nlp.vocab.vectors.name = 'spacy_pretrained_vectors'
print("Created blank 'en' model")
......Code for training the ner
然后我保存这个模型。
当我尝试加载模型时,
nlp = spacy.load("./NDLA/vectorModel0")
我收到以下错误:
`~\AppData\Local\Continuum\anaconda3\lib\site-packages\thinc\neural\_classes\static_vectors.py in __init__(self, lang, nO, drop_factor, column)
47 if self.nM == 0:
48 raise ValueError(
---> 49 "Cannot create vectors table with dimension 0.\n"
50 "If you're using pre-trained vectors, are the vectors loaded?"
51 )
ValueError: Cannot create vectors table with dimension 0.
If you're using pre-trained vectors, are the vectors loaded?
我也收到这个警告:
UserWarning: [W019] Changing vectors name from spacy_pretrained_vectors to spacy_pretrained_vectors_336876, to avoid clash with previously loaded vectors. See Issue #3853.
"__main__", mod_spec)
模型中的 vocab 目录有一个大小为 270 MB 的矢量文件。所以我知道它不是空的......是什么导致了这个错误?
您可以尝试一次传递所有向量,而不是使用 for 循环。
nlp.vocab.vectors = spacy.vocab.Vectors(data=med_vec.syn0, keys=med_vec.vocab.keys())
所以你的 else 语句会变成这样:
else:
nlp = spacy.blank("en") # create blank Language class
nlp.vocab.reset_vectors(width=200)
nlp.vocab.vectors = spacy.vocab.Vectors(data=med_vec.syn0, keys=med_vec.vocab.keys())
nlp.vocab.vectors.name = 'spacy_pretrained_vectors'
print("Created blank 'en' model")