sklearn LinearSVC - X 每个样本有 1 个特征;期待 5
sklearn LinearSVC - X has 1 features per sample; expecting 5
我正在尝试预测测试数组的 class,但出现以下错误以及堆栈跟踪:
Traceback (most recent call last):
File "/home/radu/PycharmProjects/Recommender/Temporary/classify_dict_test.py", line 24, in <module>
print classifier.predict(test)
File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 215, in predict
scores = self.decision_function(X)
File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 196, in decision_function
% (X.shape[1], n_features))
ValueError: X has 1 features per sample; expecting 5
生成此代码的代码是:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
corpus = [
"I am super good with Java and JEE",
"I am super good with .NET and C#",
"I am really good with Python and R",
"I am really good with C++ and pointers"
]
classes = ["java developer", ".net developer", "data scientist", "C++ developer"]
test = ["I think I'm a good developer with really good understanding of .NET"]
tvect = TfidfVectorizer(min_df=1, max_df=1)
X = tvect.fit_transform(corpus)
classifier = LinearSVC()
classifier.fit(X, classes)
print classifier.predict(test)
我已经尝试查看 LinearSVC documentation 以获取有关可能引发此错误的指导或提示,但我无法弄清楚。
非常感谢任何帮助!
变量 test 是一个字符串 - SVC 需要一个与 X 具有相同维数的特征向量。在将它提供给SVC:
X_test=tvect.transform(test)
classifier.predict(X_test)
我正在尝试预测测试数组的 class,但出现以下错误以及堆栈跟踪:
Traceback (most recent call last):
File "/home/radu/PycharmProjects/Recommender/Temporary/classify_dict_test.py", line 24, in <module>
print classifier.predict(test)
File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 215, in predict
scores = self.decision_function(X)
File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 196, in decision_function
% (X.shape[1], n_features))
ValueError: X has 1 features per sample; expecting 5
生成此代码的代码是:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
corpus = [
"I am super good with Java and JEE",
"I am super good with .NET and C#",
"I am really good with Python and R",
"I am really good with C++ and pointers"
]
classes = ["java developer", ".net developer", "data scientist", "C++ developer"]
test = ["I think I'm a good developer with really good understanding of .NET"]
tvect = TfidfVectorizer(min_df=1, max_df=1)
X = tvect.fit_transform(corpus)
classifier = LinearSVC()
classifier.fit(X, classes)
print classifier.predict(test)
我已经尝试查看 LinearSVC documentation 以获取有关可能引发此错误的指导或提示,但我无法弄清楚。
非常感谢任何帮助!
变量 test 是一个字符串 - SVC 需要一个与 X 具有相同维数的特征向量。在将它提供给SVC:
X_test=tvect.transform(test)
classifier.predict(X_test)