为什么决策树在 R 中给出错误的分类?
Why does decision tree give wrong classification in R?
由于为 Fisher 的鸢尾花数据创建了决策树,我得到了误分类错误率:0.02667 = 4 / 150。但是我在我的图中只看到 3 个错误:
DS for the iris.
如果我们查看这一点的概率 - 没关系(virginica - 与上图相同):
setosa versicolor virginica
0 0.1666667 0.83333333
你能解释为什么会发生这种错误分类吗(4 个错误,而不是图中清楚描绘的 3 个错误)?
代码:
# install.packages("tree")
# install.packages("ggplot2")
library('tree')
library('ggplot2')
data(iris)
iris <- iris[ , c('Petal.Length', 'Petal.Width', 'Species')]
myTree <- tree(Species ~ Petal.Length + Petal.Width, data = iris)
summary(myTree)
# Classification tree:
# tree(formula = Species ~ Petal.Length + Petal.Width, data = iris)
# Number of terminal nodes: 5
# Residual mean deviance: 0.157 = 22.77 / 145
# Misclassification error rate: 0.02667 = 4 / 150
# The errors were found by comparing predict(myTree, iris, type="class")
# with native data set
errors <- data.frame(
Species = c('versicolor', 'versicolor', 'versicolor', 'virginica'),
Petal.Length = c(4.8, 5.0, 5.1, 4.5), Petal.Width = c(1.8, 1.7, 1.6, 1.7))
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, colour = Species)) +
geom_point(size = 2.1) +
geom_vline(xintercept = 2.45) +
geom_hline(yintercept = 1.75) +
geom_vline(xintercept = 4.95) +
geom_point(data = errors, shape = 1, size = 5,colour = "black")
您正在查看的点没有被错误分类。
但当时有多个观察结果,它们并不都是同一物种。给情节添加一些抖动...
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, colour = Species)) +
geom_point(position = "jitter") +
geom_vline(xintercept = 4.95) + geom_vline(xintercept = 2.45) + geom_hline(yintercept = 1.75)
你会看到实际发生了什么。
从数据...
> iris[iris$Petal.Length == 4.8 & iris$Petal.Width == 1.8,]
Petal.Length Petal.Width Species
71 4.8 1.8 versicolor
127 4.8 1.8 virginica
139 4.8 1.8 virginica
由于为 Fisher 的鸢尾花数据创建了决策树,我得到了误分类错误率:0.02667 = 4 / 150。但是我在我的图中只看到 3 个错误: DS for the iris.
如果我们查看这一点的概率 - 没关系(virginica - 与上图相同):
setosa versicolor virginica
0 0.1666667 0.83333333
你能解释为什么会发生这种错误分类吗(4 个错误,而不是图中清楚描绘的 3 个错误)?
代码:
# install.packages("tree")
# install.packages("ggplot2")
library('tree')
library('ggplot2')
data(iris)
iris <- iris[ , c('Petal.Length', 'Petal.Width', 'Species')]
myTree <- tree(Species ~ Petal.Length + Petal.Width, data = iris)
summary(myTree)
# Classification tree:
# tree(formula = Species ~ Petal.Length + Petal.Width, data = iris)
# Number of terminal nodes: 5
# Residual mean deviance: 0.157 = 22.77 / 145
# Misclassification error rate: 0.02667 = 4 / 150
# The errors were found by comparing predict(myTree, iris, type="class")
# with native data set
errors <- data.frame(
Species = c('versicolor', 'versicolor', 'versicolor', 'virginica'),
Petal.Length = c(4.8, 5.0, 5.1, 4.5), Petal.Width = c(1.8, 1.7, 1.6, 1.7))
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, colour = Species)) +
geom_point(size = 2.1) +
geom_vline(xintercept = 2.45) +
geom_hline(yintercept = 1.75) +
geom_vline(xintercept = 4.95) +
geom_point(data = errors, shape = 1, size = 5,colour = "black")
您正在查看的点没有被错误分类。
但当时有多个观察结果,它们并不都是同一物种。给情节添加一些抖动...
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, colour = Species)) +
geom_point(position = "jitter") +
geom_vline(xintercept = 4.95) + geom_vline(xintercept = 2.45) + geom_hline(yintercept = 1.75)
你会看到实际发生了什么。
从数据...
> iris[iris$Petal.Length == 4.8 & iris$Petal.Width == 1.8,]
Petal.Length Petal.Width Species
71 4.8 1.8 versicolor
127 4.8 1.8 virginica
139 4.8 1.8 virginica