这些图中的 "pale colored regions" 是什么？

Question

我正在使用 R 为不同的机器学习算法绘制“决策边界表面”。

首先，我模拟了一些数据：

#load library
library(RSSL)

#generate data
d <- generateCrescentMoon(1000,2,1)

然后，我在这些数据上训练了一些不同的机器学习算法：

#load library
library(mlr)

#specify data
aa = makeClassifTask(data = d, target = "Class")

#specify and train machine learning algorithms
learners = list(
    makeLearner("classif.svm", kernel = "linear"),
    makeLearner("classif.svm", kernel = "polynomial"),
    makeLearner("classif.svm", kernel = "radial"),
    "classif.rpart",
    "classif.randomForest",
    "classif.knn"
)

现在，当我决定可视化结果时：

 plotLearnerPrediction(learner = learners[[5]], task = aa)
 plotLearnerPrediction(learner = learners[[4]], task = aa)

对于左边的图（rpart），有人可以帮我理解“淡色区域”的意思吗？我知道“蓝色”应该代表“三角形 class”，“红色”应该代表“圆圈 class” - 但“粉红色”和“浅蓝色”是什么“区域？这些应该代表重叠区域吗？

有人能帮我理解一下吗？

Answer 1

来自 documentation for plotLearnerPrediction:

...potentially through color alpha blending the posterior probabilities are shown.

...

prob.alpha
(logical(1)) For classification: Set alpha value of background to probability for predicted class? Allows visualization of “confidence” for prediction. If not, only a constant color is displayed in the background for the predicted label. Default is TRUE.

您的模型对这些区域不太确定，给出的估计概率远离 0 和 1。

更多细节，看来我们需要进入the source。在那里我们找到了这个片段：

    if (taskdim == 2L) {
      p = ggplot(grid, aes_string(x = x1n, y = x2n))
      if (hasLearnerProperties(learner, "prob") && prob.alpha) {
        # max of rows is prob for selected class
        prob = apply(getPredictionProbabilities(pred.grid, cl = td$class.levels), 1, max)
        grid$.prob.pred.class = prob
        p = p + geom_raster(data = grid, mapping = aes_string(fill = target, alpha = ".prob.pred.class"),
          show.legend = TRUE) + scale_fill_discrete(drop = FALSE)
        p = p + scale_alpha(limits = range(grid$.prob.pred.class))
      } else {
        p = p + geom_raster(mapping = aes_string(fill = target))
      }
...

因此颜色的 alpha 值根据获胜的预测概率设置 class。如果你真的需要深入研究它是如何完成的，你需要比我更了解 ggplot；首先，请参阅 geom_raster, aes_string, and scale_alpha.

的文档页面

这些图中的 "pale colored regions" 是什么？

What are the "pale colored regions" in these plots?

plot

r

data-visualization

machine-learning

data-science