为什么以下 tfidf 矢量化失败？

Question

你好，我正在进行以下实验，首先我创建了一个名为：tfidf:

的向量化器

tfidf_vectorizer = TfidfVectorizer(min_df=10,ngram_range=(1,3),analyzer='word',max_features=500)

然后我向量化了以下列表：

tfidf = tfidf_vectorizer.fit_transform(listComments)

我的评论列表如下：

listComments = ["hello this is a test","the car is red",...]

我尝试保存模型如下：

#Saving tfidf
with open('vectorizerTFIDF.pickle','wb') as idxf:
    pickle.dump(tfidf, idxf, pickle.HIGHEST_PROTOCOL)

我想使用我的矢量化器将相同的 tfidf 应用于以下列表：

lastComment = ["this is a car"]

开幕式：

with open('vectorizerTFIDF.pickle', 'rb') as infile:
    tdf = pickle.load(infile)

vector = tdf.transform(lastComment)

但是我得到：

Traceback (most recent call last):
  File "C:/Users/LDA_test/ldaTest.py", line 141, in <module>
    vector = tdf.transform(lastComment)
  File "C:\Program Files\Anaconda3\lib\site-packages\scipy\sparse\base.py", line 559, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: transform not found

我希望有人能在这个问题上支持我在此先感谢，

Answer 1

您已经腌制了向量化数组，而不是转换器，您需要 pickle.dump(tfidf_vectorizer, idxf, pickle.HIGHEST_PROTOCOL)

为什么以下 tfidf 矢量化失败？

Why the following tfidf vectorization is failing?

tf-idf

scikit-learn