createMultiFolds 行为与插入符号对象的重采样摘要之间是否存在差异？

Question

我在使用自定义折叠与 caret 进行交叉验证时遇到了一个奇怪的问题。

MWE（其中使用 createMultiFolds 并没有真正意义）

library(caret) #version 6.0-47
data(iris)

set.seed(1)    
train.idx <- createDataPartition(iris$Species, p = .75,
                                 list = FALSE,
                                 times = 1)

train_1 <- iris[train.idx, ]

#I create specific folds
set.seed(1)    
id_1 <- createMultiFolds(train_1$Species, k=10, times = 10)

# And use them in my cross validation
cvCtrl_2 <- trainControl(method = "repeatedcv",
                         index = id_1,
                         classProbs = TRUE)

trainX <- train_1[, names(train_1) != "Species"]

# I fit my model
set.seed(1111)
rfTune2 <- train(trainX, train_1$Species,
                 method = "rf",
                 trControl = cvCtrl_2)

rfTune2

我的模型总结如下：

##Random Forest 
...
##Resampling: Cross-Validated (10 fold, repeated 1 times)

id_1 是一个包含 100 个索引向量的列表，用于重复 10 次的 10 折交叉验证。我要求 trainControl 使用此列表进行重采样。

那么为什么我的模型摘要定义重采样

(10 fold, repeated 1 times)

而 length(rfTune2$control$index) 等于 100，所以我假设我的模型已使用所有折叠正确训练？

我应该 post 关于 github 的问题还是我错过了关于 trainControl 如何工作的任何明显信息？

Answer 1

trainControl 的默认值有

number = ifelse(grepl("cv", method), 10, 25),
repeats = ifelse(grepl("cv", method), 1, number)

如果您提供 index，代码不知道使用什么类型的重采样。您必须将这些参数与 repeats 一起指定才能使标签正确。

createMultiFolds 行为与插入符号对象的重采样摘要之间是否存在差异？

Is there a discrepancy between createMultiFolds behavior and the resampling summary of a caret object?

r

cross-validation

r-caret