在并行执行中使用自定义摘要函数的问题(插入符号)
Issue using custom summary function in parallel execution (caret)
我正在尝试使用 MAPE 作为评估模型性能的指标。
在 LOOCV 和并行执行的情况下,一切正常,但如果我使用另一种重采样方法,我会收到此错误:
Error in { : task 1 failed - “could not find function ”mape“”
相反,在串行执行中,这个问题消失了。
下面的代码提供了一个示例。
library(caret)
library(doParallel)
data("environmental")
registerDoParallel(makeCluster(detectCores(), outfile = ''))
mape <- function(y, yhat) mean(abs((y - yhat)/y))
mapeSummary <- function (data, lev = NULL, model = NULL) {
out <- mape(data$obs, data$pred)
names(out) <- "MAPE"
out
}
#LOOCV - parallel
trControlLoocvPar <- trainControl(allowParallel = T,
verboseIter = T,
method = "LOOCV",
summaryFunction = mapeSummary)
#LOOCV - serial
trControlLoocvSer <- trainControl(allowParallel = F,
verboseIter = T,
method = "LOOCV",
summaryFunction = mapeSummary)
#Bootstrapping - parallel
trControlBootPar <- trainControl(allowParallel = T,
verboseIter = T,
method = "boot",
summaryFunction = mapeSummary)
#Bootstrapping - serial
trControlBootSer <- trainControl(allowParallel = F,
verboseIter = T,
method = "boot",
summaryFunction = mapeSummary)
trControlList <- list(trControlLoocvSer,
trControlLoocvPar,
trControlBootSer,
trControlBootPar)
models <- lapply(trControlList,
function(control) {
train(y = environmental$ozone,
x = environmental[, -1],
method = "glmnet",
trControl = control,
metric = "MAPE",
maximize = FALSE)
})
我的OS是El Capitan 10.11.4,插入符号的版本是6.0.62。
如消息所述,您的并行进程找不到 mape 函数。
最简单的解决方案是将 mape 函数放在 mapeSummary 函数中,如下所示。然后您的并行进程将正常工作。
mapeSummary <- function (data, lev = NULL, model = NULL) {
mape <- function(y, yhat) mean(abs((y - yhat)/y))
out <- mape(data$obs, data$pred)
names(out) <- "MAPE"
out
}
奖金:
您还可以使用 clusterEvalQ
函数,它是 clusterApply 函数之一。其工作方式如下,但不是最优雅的解决方案并且需要更多输入:
cl <- makePSOCKcluster(detectCores()-1)
clusterEvalQ(cl, mape <- function(y, yhat) mean(abs((y - yhat)/y)))
registerDoParallel(cl)
mapeSummary <- function (data, lev = NULL, model = NULL) {
out <- mape(data$obs, data$pred)
names(out) <- "MAPE"
out
}
#Bootstrapping - parallel
trControlBootPar <- trainControl(allowParallel = T,
verboseIter = T,
method = "boot",
summaryFunction = mapeSummary)
train(y = environmental$ozone,
x = environmental[, -1],
method = "glmnet",
trControl = trControlBootPar,
metric = "MAPE",
maximize = FALSE)
stopCluster(cl)
registerDoSEQ()
我正在尝试使用 MAPE 作为评估模型性能的指标。
在 LOOCV 和并行执行的情况下,一切正常,但如果我使用另一种重采样方法,我会收到此错误:
Error in { : task 1 failed - “could not find function ”mape“”
相反,在串行执行中,这个问题消失了。
下面的代码提供了一个示例。
library(caret)
library(doParallel)
data("environmental")
registerDoParallel(makeCluster(detectCores(), outfile = ''))
mape <- function(y, yhat) mean(abs((y - yhat)/y))
mapeSummary <- function (data, lev = NULL, model = NULL) {
out <- mape(data$obs, data$pred)
names(out) <- "MAPE"
out
}
#LOOCV - parallel
trControlLoocvPar <- trainControl(allowParallel = T,
verboseIter = T,
method = "LOOCV",
summaryFunction = mapeSummary)
#LOOCV - serial
trControlLoocvSer <- trainControl(allowParallel = F,
verboseIter = T,
method = "LOOCV",
summaryFunction = mapeSummary)
#Bootstrapping - parallel
trControlBootPar <- trainControl(allowParallel = T,
verboseIter = T,
method = "boot",
summaryFunction = mapeSummary)
#Bootstrapping - serial
trControlBootSer <- trainControl(allowParallel = F,
verboseIter = T,
method = "boot",
summaryFunction = mapeSummary)
trControlList <- list(trControlLoocvSer,
trControlLoocvPar,
trControlBootSer,
trControlBootPar)
models <- lapply(trControlList,
function(control) {
train(y = environmental$ozone,
x = environmental[, -1],
method = "glmnet",
trControl = control,
metric = "MAPE",
maximize = FALSE)
})
我的OS是El Capitan 10.11.4,插入符号的版本是6.0.62。
如消息所述,您的并行进程找不到 mape 函数。
最简单的解决方案是将 mape 函数放在 mapeSummary 函数中,如下所示。然后您的并行进程将正常工作。
mapeSummary <- function (data, lev = NULL, model = NULL) {
mape <- function(y, yhat) mean(abs((y - yhat)/y))
out <- mape(data$obs, data$pred)
names(out) <- "MAPE"
out
}
奖金:
您还可以使用 clusterEvalQ
函数,它是 clusterApply 函数之一。其工作方式如下,但不是最优雅的解决方案并且需要更多输入:
cl <- makePSOCKcluster(detectCores()-1)
clusterEvalQ(cl, mape <- function(y, yhat) mean(abs((y - yhat)/y)))
registerDoParallel(cl)
mapeSummary <- function (data, lev = NULL, model = NULL) {
out <- mape(data$obs, data$pred)
names(out) <- "MAPE"
out
}
#Bootstrapping - parallel
trControlBootPar <- trainControl(allowParallel = T,
verboseIter = T,
method = "boot",
summaryFunction = mapeSummary)
train(y = environmental$ozone,
x = environmental[, -1],
method = "glmnet",
trControl = trControlBootPar,
metric = "MAPE",
maximize = FALSE)
stopCluster(cl)
registerDoSEQ()