如何优化 R 中多个预测模型的准确性代码？

Question

我制作了一个交叉验证函数，它适用于多个模型。

我有一个包含我想要计算的模型的函数，在交叉验证中我调用它所以我得到一个名为 results 的数据框，带有 Class 或标签，针对每个预测对于每次迭代：

 head(results)
     iteration class ksvm rf
65          1     4    4  4
306         1     2    2  2
300         1     4    4  4
385         1     2    2  2
431         1     2    2  2
205         1     4    4  4

（索引可以忽略，因为它来自被采样的数据）

因为我有一个 5 折交叉验证，所以在这种情况下我有 5 次迭代预测 ksvm 和 rf。（这些存储在名为 algorithms.

的变量中

在此之后我用这种方式计算准确度：

   results %>% 
     group_by(iteration) %>% 
     summarise(acc_ksvm = sum(ksvm == class) / n() , acc_rf = sum(rf == class) / n() )

输出：

   iteration  acc_ksvm    acc_rf
      (int)     (dbl)     (dbl)
 1         1 0.9603175 0.9603175
 2         2 0.9760000 0.9680000
 3         3 0.9603175 0.9523810
 4         4 0.9840000 0.9920000
 5         5 0.9444444 0.9523810

问题： 有没有办法优化它？我最终会增加模型，我只想在函数中传递 algorithms 变量，并计算所有模型的准确性，而无需为每个模型手动编写 summarise(acc_ksvm = sum(ksvm == class) / n() , acc_rf = sum(rf == class) / n() )。

这可以通过申请来完成吗？还是我必须更改 df 的构建方式才能按模型分组？

谢谢！

Answer 1

因为 sum(ksvm == class) / n() 实际上是算法列的 TRUE 与 class 匹配的组平均值，请考虑创建逻辑值列 ( TRUE/FALSE 匹配），然后在所有其他列中使用 dplyr 的 summarise_each：

algorithms <- c("alg1", "alg2", "alg3", "alg4", "alg5")

results[algorithms] <- sapply(algorithms, function(i){
  results[i] == results$class
})

summarydf <-
  results[c("iteration", algorithms)] %>% 
  group_by(iteration) %>% 
  summarise_each(funs(mean)) %>%
  setNames(c("iteration", paste0("acc_", algorithms)))

如何优化 R 中多个预测模型的准确性代码？

How to optimize accuracy code for multiple predictive models in R?

loops

r

apply

dataframe

dplyr