决策树总是预测 class 标签为是

Decision tree always predicting class Label as Yes

我正在尝试在 R 中的一个小数据集上拟合决策树模型,但它总是预测 class 标签为是,无论输入的数据集是什么。

数据

outlook <- c("sunny", "sunny", "overcast", "rain", "rain", "rain", "overcast", "sunny", "sunny", "rain", "sunny", "overcast", "overcast", "rain")
temperature <- c("hot", "hot", "hot", "mild", "cool", "cool", "cool", "mild", "cool", "mild", "mild", "mild", "hot", "mild")
humidity <- c("high", "high", "high", "high", "normal", "normal", "normal", "high", "normal", "normal", "normal", "high", "normal", "high")
wind <- c("weak", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "strong")
class <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")

data <- data.frame(outlook, temperature, humidity, wind, class)
data

编码数据

outlook_new <- as.numeric(as.factor(outlook))
temperature_new <- as.numeric(as.factor(temperature))
humidity_new <- as.numeric(as.factor(humidity))
wind_new <- as.numeric(as.factor(wind))
class_new <- as.numeric(as.factor(class))

data_new <- data.frame(outlook_new, temperature_new, humidity_new, wind_new, class_new)
data_new

建立模型

model <- rpart(class_new ~ ., data=data_new)

正在创建测试数据点

test_data <- data.frame(outlook_new = 2, temperature_new = 2, humidity_new = 1, wind_new = 1)
test_data

预测

predict(model, test_data, type='response')

无论输入如何,预测函数始终给出“是”的结果。

有什么问题?

对于如此小的训练集,您需要更新模型控制,并对结果持保留态度!

model <- rpart(class ~ ., data = data, control = rpart.control(minsplit = 1))
predict(model, newdata = data, type = 'class')
#   1   2   3   4   5   6   7   8   9  10  11  12  13  14 
#  no  no yes yes yes  no yes  no yes yes yes yes yes  no 
# Levels: no yes