决策树总是预测 class 标签为是
Decision tree always predicting class Label as Yes
我正在尝试在 R 中的一个小数据集上拟合决策树模型,但它总是预测 class 标签为是,无论输入的数据集是什么。
数据
outlook <- c("sunny", "sunny", "overcast", "rain", "rain", "rain", "overcast", "sunny", "sunny", "rain", "sunny", "overcast", "overcast", "rain")
temperature <- c("hot", "hot", "hot", "mild", "cool", "cool", "cool", "mild", "cool", "mild", "mild", "mild", "hot", "mild")
humidity <- c("high", "high", "high", "high", "normal", "normal", "normal", "high", "normal", "normal", "normal", "high", "normal", "high")
wind <- c("weak", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "strong")
class <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
data <- data.frame(outlook, temperature, humidity, wind, class)
data
编码数据
outlook_new <- as.numeric(as.factor(outlook))
temperature_new <- as.numeric(as.factor(temperature))
humidity_new <- as.numeric(as.factor(humidity))
wind_new <- as.numeric(as.factor(wind))
class_new <- as.numeric(as.factor(class))
data_new <- data.frame(outlook_new, temperature_new, humidity_new, wind_new, class_new)
data_new
建立模型
model <- rpart(class_new ~ ., data=data_new)
正在创建测试数据点
test_data <- data.frame(outlook_new = 2, temperature_new = 2, humidity_new = 1, wind_new = 1)
test_data
预测
predict(model, test_data, type='response')
无论输入如何,预测函数始终给出“是”的结果。
有什么问题?
对于如此小的训练集,您需要更新模型控制,并对结果持保留态度!
model <- rpart(class ~ ., data = data, control = rpart.control(minsplit = 1))
predict(model, newdata = data, type = 'class')
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14
# no no yes yes yes no yes no yes yes yes yes yes no
# Levels: no yes
我正在尝试在 R 中的一个小数据集上拟合决策树模型,但它总是预测 class 标签为是,无论输入的数据集是什么。
数据
outlook <- c("sunny", "sunny", "overcast", "rain", "rain", "rain", "overcast", "sunny", "sunny", "rain", "sunny", "overcast", "overcast", "rain")
temperature <- c("hot", "hot", "hot", "mild", "cool", "cool", "cool", "mild", "cool", "mild", "mild", "mild", "hot", "mild")
humidity <- c("high", "high", "high", "high", "normal", "normal", "normal", "high", "normal", "normal", "normal", "high", "normal", "high")
wind <- c("weak", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "strong")
class <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
data <- data.frame(outlook, temperature, humidity, wind, class)
data
编码数据
outlook_new <- as.numeric(as.factor(outlook))
temperature_new <- as.numeric(as.factor(temperature))
humidity_new <- as.numeric(as.factor(humidity))
wind_new <- as.numeric(as.factor(wind))
class_new <- as.numeric(as.factor(class))
data_new <- data.frame(outlook_new, temperature_new, humidity_new, wind_new, class_new)
data_new
建立模型
model <- rpart(class_new ~ ., data=data_new)
正在创建测试数据点
test_data <- data.frame(outlook_new = 2, temperature_new = 2, humidity_new = 1, wind_new = 1)
test_data
预测
predict(model, test_data, type='response')
无论输入如何,预测函数始终给出“是”的结果。
有什么问题?
对于如此小的训练集,您需要更新模型控制,并对结果持保留态度!
model <- rpart(class ~ ., data = data, control = rpart.control(minsplit = 1))
predict(model, newdata = data, type = 'class')
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14
# no no yes yes yes no yes no yes yes yes yes yes no
# Levels: no yes