Sklearn模型，多元素数组的真值是歧义错误

Question

我一直在学习决策树以及如何在 sklearn 中制作它们。但是当我尝试它时，我为避免读取

的 vlaue 错误所做的所有尝试都没有成功

“具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()” 这是完整的错误：

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15136/2104431115.py in <module>
      2 dt = DecisionTreeRegressor(max_depth= 5, random_state= 1, min_samples_leaf=.1)
      3 dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
----> 4 y_pred = dt.predict(x_test, y_test)

~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in predict(self, X, check_input)
    465         """
    466         check_is_fitted(self)
--> 467         X = self._validate_X_predict(X, check_input)
    468         proba = self.tree_.predict(X)
    469         n_samples = X.shape[0]

~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
    430     def _validate_X_predict(self, X, check_input):
    431         """Validate the training data on predict (probabilities)."""
--> 432         if check_input:
    433             X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    434             if issparse(X) and (

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

这里是我目前为这个模型编写的所有代码：

x = np.array(bat[["TB_x"]])
y = np.array(bat[["TB_y"]])

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= .2, random_state= 1)
dt = DecisionTreeRegressor(max_depth= 5, random_state= 1, min_samples_leaf=.1)
dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
y_pred = dt.predict(x_test, y_test)

最初我得到一个错误，它会说它期待一个二维数组但是得到一个一维数组，我通过使用重塑解决了这个问题但是现在我得到了这个我不明白的值错误。

Answer 1

这是对predict函数工作原理的一个小误解。如果你从概念上考虑，如果你试图预测一些东西，为什么你需要传入预期的标签？

在 DecisionTreeRegressor 中（可能在所有 sklearn 模型中）predict 的签名是 predict(X, check_input=True)，你只需要传入特征，而不是预期的标签。

你正在做 y_pred = dt.predict(x_test, y_test) 但 predict 期望的第二个参数实际上只是一个布尔值，它允许你禁用一些关于 x_test.

的完整性检查

您只需执行以下操作：

y_pred = dt.predict(x_test)

您可以参考sklearn documentation for a DecisionTreeRegressor了解更多信息

Sklearn模型，多元素数组的真值是歧义错误

Sklearn model, The truth value of an array with more than one element is ambiguous error

python

tree

numpy

scikit-learn