无法理解这个错误 "Unknown label type: 'continuous-multioutput' "
Can't understand this error "Unknown label type: 'continuous-multioutput' "
我正在尝试 运行 数据集上的几种机器学习算法来预测 salary/income 是大于 50k 还是小于等于 50k。我创建了一个函数并将值传递给它,其中包含 1% 样本、10% 样本和 100% 样本的不同大小的样本集。
我收到未知错误“未知标签类型:'continuous-multioutput'”
我不知道这个错误是什么。
我尝试更改我使用的分类算法但没有用。它对所有算法显示相同的错误。
from sklearn.metrics import fbeta_score, accuracy_score
def train_predict (learner, sample_size, X_train, X_test, y_train, y_test):
results = {}
start = time()
learner = learner.fit(X_train[:sample_size], y_train)
end = time()
results['train_time'] = end - start
start = time()
predictions_test = learner.predict(X_test)
predictions_train = learner.predict(X_train[:sample_size])
end = time()
results['pred_time'] = end - start
results['acc_train'] = accuracy_score(X_train[:sample_size], y_train[:sample_size])
results['acc_test'] = accuracy_score[X_test, y_test]
results['f_train'] = fbeta_score(X_train[:sample_size], y_train[:sample_size], beta = 1)
resutts['f_test'] = fbeta_score(X_test, y_test, beta = 1)
print("{} trained on {} samples. ".format(learner.__class__.__name__, sample_size))
return results
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
clf_A = DecisionTreeClassifier()
clf_B = GaussianNB()
clf_C = SVC()
samples_100 = len(X_train) #taking 100% i.e. all the data in training set
samples_10 = int(len(X_train)*.1) # taking 10% of the training data
samples_1 = int(len(X_train)*.01) #taking 1% of the training data
results= {}
for clf in [clf_A, clf_B, clf_C]:
clf_name = clf.__class__.__name__
results[clf_name] = {}
for i, samples in enumerate([samples_1, samples_10, samples_100]):
results[clf_name][i] = \
train_predict(clf, samples, X_train, y_train, X_test, y_test)
vs.evaluate(results, accuracy, fscore)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-66-e06d42fbd15b> in <module>
5 for i, samples in enumerate([samples_1, samples_10, samples_100]):
6 results[clf_name][i] = \
----> 7 train_predict(clf, samples, X_train, y_train, X_test, y_test)
8 vs.evaluate(results, accuracy, fscore)
<ipython-input-62-4484b803a707> in train_predict(learner, sample_size, X_train, X_test, y_train, y_test)
2 results = {}
3 start = time()
----> 4 learner = learner.fit(X_train[:sample_size], y_train)
5 end = time()
6 results['train_time'] = end - start
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
799 sample_weight=sample_weight,
800 check_input=check_input,
--> 801 X_idx_sorted=X_idx_sorted)
802 return self
803
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
138
139 if is_classification:
--> 140 check_classification_targets(y)
141 y = np.copy(y)
142
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
169 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
170 'multilabel-indicator', 'multilabel-sequences']:
--> 171 raise ValueError("Unknown label type: %r" % y_type)
172
173
ValueError: Unknown label type: 'continuous-multioutput'
我希望此代码 运行 并显示这些算法针对此数据集的准确性和其他指标。
P.S。我知道代码太长太麻烦,但请努力完成它并让我知道解决方案。我是这个机器学习领域的新手。任何帮助将不胜感激。
P.S。请不要将此问题标记为重复 我已经遇到过类似的问题并尝试了那里建议的所有内容但徒劳无功。它对我没有任何用处。
好的,我得到了错误,问题出在我对数据集进行采样的方式上。
我使用 frac 属性更改了采样代码,错误消失了。
samples_100 = df.sample(frac = 1) #taking 100% i.e. all the data in training set
samples_10 = df.sample(frac = .1) #taking 10% of the training data
samples_1 = df.sample(frac = .01) #taking 1% of the training data
您收到未知错误“未知标签类型:'continuous-multioutput'”,因为如果您在代码中看到
train_predict(clf, samples, X_train, y_train, X_test, y_test)
这需要重新排列为
train_predict(clf, samples, X_train, X_test, y_train, y_test)
由于我们在训练和测试变量 x 和 y 之间拆分数据,因此顺序对于正确拆分和分配数据很重要。
我遇到了同样的问题,它对我有用。
我正在尝试 运行 数据集上的几种机器学习算法来预测 salary/income 是大于 50k 还是小于等于 50k。我创建了一个函数并将值传递给它,其中包含 1% 样本、10% 样本和 100% 样本的不同大小的样本集。 我收到未知错误“未知标签类型:'continuous-multioutput'” 我不知道这个错误是什么。
我尝试更改我使用的分类算法但没有用。它对所有算法显示相同的错误。
from sklearn.metrics import fbeta_score, accuracy_score
def train_predict (learner, sample_size, X_train, X_test, y_train, y_test):
results = {}
start = time()
learner = learner.fit(X_train[:sample_size], y_train)
end = time()
results['train_time'] = end - start
start = time()
predictions_test = learner.predict(X_test)
predictions_train = learner.predict(X_train[:sample_size])
end = time()
results['pred_time'] = end - start
results['acc_train'] = accuracy_score(X_train[:sample_size], y_train[:sample_size])
results['acc_test'] = accuracy_score[X_test, y_test]
results['f_train'] = fbeta_score(X_train[:sample_size], y_train[:sample_size], beta = 1)
resutts['f_test'] = fbeta_score(X_test, y_test, beta = 1)
print("{} trained on {} samples. ".format(learner.__class__.__name__, sample_size))
return results
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
clf_A = DecisionTreeClassifier()
clf_B = GaussianNB()
clf_C = SVC()
samples_100 = len(X_train) #taking 100% i.e. all the data in training set
samples_10 = int(len(X_train)*.1) # taking 10% of the training data
samples_1 = int(len(X_train)*.01) #taking 1% of the training data
results= {}
for clf in [clf_A, clf_B, clf_C]:
clf_name = clf.__class__.__name__
results[clf_name] = {}
for i, samples in enumerate([samples_1, samples_10, samples_100]):
results[clf_name][i] = \
train_predict(clf, samples, X_train, y_train, X_test, y_test)
vs.evaluate(results, accuracy, fscore)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-66-e06d42fbd15b> in <module>
5 for i, samples in enumerate([samples_1, samples_10, samples_100]):
6 results[clf_name][i] = \
----> 7 train_predict(clf, samples, X_train, y_train, X_test, y_test)
8 vs.evaluate(results, accuracy, fscore)
<ipython-input-62-4484b803a707> in train_predict(learner, sample_size, X_train, X_test, y_train, y_test)
2 results = {}
3 start = time()
----> 4 learner = learner.fit(X_train[:sample_size], y_train)
5 end = time()
6 results['train_time'] = end - start
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
799 sample_weight=sample_weight,
800 check_input=check_input,
--> 801 X_idx_sorted=X_idx_sorted)
802 return self
803
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
138
139 if is_classification:
--> 140 check_classification_targets(y)
141 y = np.copy(y)
142
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
169 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
170 'multilabel-indicator', 'multilabel-sequences']:
--> 171 raise ValueError("Unknown label type: %r" % y_type)
172
173
ValueError: Unknown label type: 'continuous-multioutput'
我希望此代码 运行 并显示这些算法针对此数据集的准确性和其他指标。
P.S。我知道代码太长太麻烦,但请努力完成它并让我知道解决方案。我是这个机器学习领域的新手。任何帮助将不胜感激。
P.S。请不要将此问题标记为重复 我已经遇到过类似的问题并尝试了那里建议的所有内容但徒劳无功。它对我没有任何用处。
好的,我得到了错误,问题出在我对数据集进行采样的方式上。 我使用 frac 属性更改了采样代码,错误消失了。
samples_100 = df.sample(frac = 1) #taking 100% i.e. all the data in training set
samples_10 = df.sample(frac = .1) #taking 10% of the training data
samples_1 = df.sample(frac = .01) #taking 1% of the training data
您收到未知错误“未知标签类型:'continuous-multioutput'”,因为如果您在代码中看到
train_predict(clf, samples, X_train, y_train, X_test, y_test)
这需要重新排列为
train_predict(clf, samples, X_train, X_test, y_train, y_test)
由于我们在训练和测试变量 x 和 y 之间拆分数据,因此顺序对于正确拆分和分配数据很重要。
我遇到了同样的问题,它对我有用。