'Multiclass-multioutput is not supported' Scikit 学习 Knn 分类器时出错
'Multiclass-multioutput is not supported' Error in Scikit learn for Knn classifier
我有两个变量 X 和 Y。
X 的结构(即 np.array):
[[26777 24918 26821 ... -1 -1 -1]
[26777 26831 26832 ... -1 -1 -1]
[26777 24918 26821 ... -1 -1 -1]
...
[26811 26832 26813 ... -1 -1 -1]
[26830 26831 26832 ... -1 -1 -1]
[26830 26831 26832 ... -1 -1 -1]]
Y的结构:
[[1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [25197, 26777, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [26764, 25803, 26781], [26764, 25803, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [1252, 26777, 16172], [1252, 26777, 16172]]
Y 中的数组,示例 [1252, 26777, 26831] 是三个独立的特征。
我正在使用来自 scikit 学习模块的 Knn 分类器
classifier = KNeighborsClassifier(n_neighbors=3)
classifier.fit(X,Y)
predictions = classifier.predict(X)
print(accuracy_score(Y,predictions))
但我收到一条错误消息:
ValueError: multiclass-multioutput is not supported
我猜 'Y' 的结构不被支持,我需要做哪些更改才能让程序执行?
输入:
Deluxe Single room with sea view
预期输出:
c_class = Deluxe
c_occ = single
c_view = sea
如错误中所述,KNN
不支持多输出regression/classification。
对于你的问题,你需要MultiOutputClassifier()
。
from sklearn.multioutput import MultiOutputClassifier
knn = KNeighborsClassifier(n_neighbors=3)
classifier = MultiOutputClassifier(knn, n_jobs=-1)
classifier.fit(X,Y)
工作示例:
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> corpus = [
... 'This is the first document.',
... 'This document is the second document.',
... 'And this is the third one.',
... 'Is this the first document?',
... ]
>>> vectorizer = TfidfVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> Y = [[124323,1234132,1234],[124323,4132,14],[1,4132,1234],[1,4132,14]]
>>> from sklearn.multioutput import MultiOutputClassifier
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> classifier = MultiOutputClassifier(knn, n_jobs=-1)
>>> classifier.fit(X,Y)
>>> predictions = classifier.predict(X)
array([[124323, 4132, 14],
[124323, 4132, 14],
[ 1, 4132, 1234],
[124323, 4132, 14]])
>>> classifier.score(X,np.array(Y))
0.5
>>> test_data = ['I want to test this']
>>> classifier.predict(vectorizer.transform(test_data))
array([[124323, 4132, 14]])
#use dataframe instead of list:
#for example :
dataset = list()
#....
df = pd.DataFrame(dataset)
y, X = df[df.columns[-1]], df.drop(df.columns[-1], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
# euclidean_distance (l2) for p = 2.
knn_model = KNeighborsClassifier(n_neighbors=5,p=2)
knn_model.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)
print(classification_report(y_test, y_pred))
print('Accuracy score: ', round(accuracy_score(y_test, y_pred), 2))
print('F1 Score: ', round(f1_score(y_test, y_pred), 2))
print(confusion_matrix(y_test, y_pred))
我有两个变量 X 和 Y。
X 的结构(即 np.array):
[[26777 24918 26821 ... -1 -1 -1]
[26777 26831 26832 ... -1 -1 -1]
[26777 24918 26821 ... -1 -1 -1]
...
[26811 26832 26813 ... -1 -1 -1]
[26830 26831 26832 ... -1 -1 -1]
[26830 26831 26832 ... -1 -1 -1]]
Y的结构:
[[1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [25197, 26777, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [26764, 25803, 26781], [26764, 25803, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [1252, 26777, 16172], [1252, 26777, 16172]]
Y 中的数组,示例 [1252, 26777, 26831] 是三个独立的特征。
我正在使用来自 scikit 学习模块的 Knn 分类器
classifier = KNeighborsClassifier(n_neighbors=3)
classifier.fit(X,Y)
predictions = classifier.predict(X)
print(accuracy_score(Y,predictions))
但我收到一条错误消息:
ValueError: multiclass-multioutput is not supported
我猜 'Y' 的结构不被支持,我需要做哪些更改才能让程序执行?
输入:
Deluxe Single room with sea view
预期输出:
c_class = Deluxe
c_occ = single
c_view = sea
如错误中所述,KNN
不支持多输出regression/classification。
对于你的问题,你需要MultiOutputClassifier()
。
from sklearn.multioutput import MultiOutputClassifier
knn = KNeighborsClassifier(n_neighbors=3)
classifier = MultiOutputClassifier(knn, n_jobs=-1)
classifier.fit(X,Y)
工作示例:
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> corpus = [
... 'This is the first document.',
... 'This document is the second document.',
... 'And this is the third one.',
... 'Is this the first document?',
... ]
>>> vectorizer = TfidfVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> Y = [[124323,1234132,1234],[124323,4132,14],[1,4132,1234],[1,4132,14]]
>>> from sklearn.multioutput import MultiOutputClassifier
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> classifier = MultiOutputClassifier(knn, n_jobs=-1)
>>> classifier.fit(X,Y)
>>> predictions = classifier.predict(X)
array([[124323, 4132, 14],
[124323, 4132, 14],
[ 1, 4132, 1234],
[124323, 4132, 14]])
>>> classifier.score(X,np.array(Y))
0.5
>>> test_data = ['I want to test this']
>>> classifier.predict(vectorizer.transform(test_data))
array([[124323, 4132, 14]])
#use dataframe instead of list:
#for example :
dataset = list()
#....
df = pd.DataFrame(dataset)
y, X = df[df.columns[-1]], df.drop(df.columns[-1], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
# euclidean_distance (l2) for p = 2.
knn_model = KNeighborsClassifier(n_neighbors=5,p=2)
knn_model.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)
print(classification_report(y_test, y_pred))
print('Accuracy score: ', round(accuracy_score(y_test, y_pred), 2))
print('F1 Score: ', round(f1_score(y_test, y_pred), 2))
print(confusion_matrix(y_test, y_pred))