这些图中的 "pale colored regions" 是什么?
What are the "pale colored regions" in these plots?
我正在使用 R 为不同的机器学习算法绘制“决策边界表面”。
首先,我模拟了一些数据:
#load library
library(RSSL)
#generate data
d <- generateCrescentMoon(1000,2,1)
然后,我在这些数据上训练了一些不同的机器学习算法:
#load library
library(mlr)
#specify data
aa = makeClassifTask(data = d, target = "Class")
#specify and train machine learning algorithms
learners = list(
makeLearner("classif.svm", kernel = "linear"),
makeLearner("classif.svm", kernel = "polynomial"),
makeLearner("classif.svm", kernel = "radial"),
"classif.rpart",
"classif.randomForest",
"classif.knn"
)
现在,当我决定可视化结果时:
plotLearnerPrediction(learner = learners[[5]], task = aa)
plotLearnerPrediction(learner = learners[[4]], task = aa)
对于左边的图(rpart),有人可以帮我理解“淡色区域”的意思吗?我知道“蓝色”应该代表“三角形 class”,“红色”应该代表“圆圈 class” - 但“粉红色”和“浅蓝色”是什么“区域?这些应该代表重叠区域吗?
有人能帮我理解一下吗?
来自 documentation for plotLearnerPrediction
:
...potentially through color alpha blending the posterior probabilities are shown.
...
prob.alpha
(logical(1)
) For classification: Set alpha value of background to probability for predicted class? Allows visualization of “confidence” for prediction. If not, only a constant color is displayed in the background for the predicted label. Default is TRUE
.
您的模型对这些区域不太确定,给出的估计概率远离 0 和 1。
更多细节,看来我们需要进入the source。在那里我们找到了这个片段:
if (taskdim == 2L) {
p = ggplot(grid, aes_string(x = x1n, y = x2n))
if (hasLearnerProperties(learner, "prob") && prob.alpha) {
# max of rows is prob for selected class
prob = apply(getPredictionProbabilities(pred.grid, cl = td$class.levels), 1, max)
grid$.prob.pred.class = prob
p = p + geom_raster(data = grid, mapping = aes_string(fill = target, alpha = ".prob.pred.class"),
show.legend = TRUE) + scale_fill_discrete(drop = FALSE)
p = p + scale_alpha(limits = range(grid$.prob.pred.class))
} else {
p = p + geom_raster(mapping = aes_string(fill = target))
}
...
因此颜色的 alpha 值根据获胜的预测概率设置 class。
如果你真的需要深入研究它是如何完成的,你需要比我更了解 ggplot
;首先,请参阅 geom_raster
, aes_string
, and scale_alpha
.
的文档页面
我正在使用 R 为不同的机器学习算法绘制“决策边界表面”。
首先,我模拟了一些数据:
#load library
library(RSSL)
#generate data
d <- generateCrescentMoon(1000,2,1)
然后,我在这些数据上训练了一些不同的机器学习算法:
#load library
library(mlr)
#specify data
aa = makeClassifTask(data = d, target = "Class")
#specify and train machine learning algorithms
learners = list(
makeLearner("classif.svm", kernel = "linear"),
makeLearner("classif.svm", kernel = "polynomial"),
makeLearner("classif.svm", kernel = "radial"),
"classif.rpart",
"classif.randomForest",
"classif.knn"
)
现在,当我决定可视化结果时:
plotLearnerPrediction(learner = learners[[5]], task = aa)
plotLearnerPrediction(learner = learners[[4]], task = aa)
对于左边的图(rpart),有人可以帮我理解“淡色区域”的意思吗?我知道“蓝色”应该代表“三角形 class”,“红色”应该代表“圆圈 class” - 但“粉红色”和“浅蓝色”是什么“区域?这些应该代表重叠区域吗?
有人能帮我理解一下吗?
来自 documentation for plotLearnerPrediction
:
...potentially through color alpha blending the posterior probabilities are shown.
...
prob.alpha
(logical(1)
) For classification: Set alpha value of background to probability for predicted class? Allows visualization of “confidence” for prediction. If not, only a constant color is displayed in the background for the predicted label. Default isTRUE
.
您的模型对这些区域不太确定,给出的估计概率远离 0 和 1。
更多细节,看来我们需要进入the source。在那里我们找到了这个片段:
if (taskdim == 2L) {
p = ggplot(grid, aes_string(x = x1n, y = x2n))
if (hasLearnerProperties(learner, "prob") && prob.alpha) {
# max of rows is prob for selected class
prob = apply(getPredictionProbabilities(pred.grid, cl = td$class.levels), 1, max)
grid$.prob.pred.class = prob
p = p + geom_raster(data = grid, mapping = aes_string(fill = target, alpha = ".prob.pred.class"),
show.legend = TRUE) + scale_fill_discrete(drop = FALSE)
p = p + scale_alpha(limits = range(grid$.prob.pred.class))
} else {
p = p + geom_raster(mapping = aes_string(fill = target))
}
...
因此颜色的 alpha 值根据获胜的预测概率设置 class。
如果你真的需要深入研究它是如何完成的,你需要比我更了解 ggplot
;首先,请参阅 geom_raster
, aes_string
, and scale_alpha
.