R插入符:结合 rfe() 和 train()

R caret: Combine rfe() and train()

我想将递归特征消除与 rfe() 结合起来,并使用方法 rf(随机森林)将调优与模型选择与 trainControl() 结合起来。我想要 MAPE(平均绝对百分比误差)而不是标准汇总统计数据。因此,我使用 ChickWeight 数据集尝试了以下代码:

library(caret)
library(randomForest)
library(MLmetrics)

# Compute MAPE instead of other metrics
mape <- function(data, lev = NULL, model = NULL){
  mape <- MAPE(y_pred = data$pred, y_true = data$obs)
  c(MAPE = mape)
}

# specify trainControl
trc <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid", savePred =T,
                    summaryFunction = mape)
# set up grid
tunegrid <- expand.grid(.mtry=c(1:3))

# specify rfeControl
rfec <- rfeControl(functions=rfFuncs, method="cv", number=10, saveDetails = TRUE)

set.seed(42)
results <- rfe(weight ~ Time + Chick + Diet, 
           sizes=c(1:3), # number of predictors from which should algorithm chose the best predictor
           data = ChickWeight, 
           method="rf",
           ntree = 250, 
           metric= "RMSE", 
           tuneGrid=tunegrid,
           rfeControl=rfec,
           trControl = trc)

代码运行没有错误。但是我在哪里可以找到我在 trainControl 中定义为 summaryFunction 的 MAPE? trainControl是执行还是忽略?

我如何重写代码以便使用 rfe 进行递归特征消除,然后在 rfe 中同时使用 trainControl 调整超参数 mtry时间计算附加误差测量 (MAPE)?

trainControl 被忽略,因为它的描述

Control the computational nuances of the train function

会建议。要使用 MAPE,您需要

rfec$functions$summary <- mape

然后

rfe(weight ~ Time + Chick + Diet, 
    sizes = c(1:3),
    data = ChickWeight, 
    method ="rf",
    ntree = 250, 
    metric = "MAPE", # Modified
    maximize = FALSE, # Modified
    rfeControl = rfec)
#
# Recursive feature selection
#
# Outer resampling method: Cross-Validated (10 fold) 
#
# Resampling performance over subset size:
#
#  Variables   MAPE  MAPESD Selected
#          1 0.1903 0.03190         
#          2 0.1029 0.01727        *
#          3 0.1326 0.02136         
#         53 0.1303 0.02041         
#
# The top 2 variables (out of 2):
#    Time, Chick.L