为什么我会收到与种子相关的错误消息?
Why am I getting an error related to seeds?
我用这个
中的例子
train$`1stFlrSF`<-train$S1stFlrSF
train$`2ndFlrSF`<-train$S2ndFlrSF
train$`3SsnPorch`<-train$S3SsnPorch
library("randomForest")
set.seed(1)
rf.model <- randomForest(SalePrice ~ .,
data = train,
ntree = 50,
nodesize = 5,
mtry = 2,
importance = TRUE,
metric = "RMSE")
library("caret")
caret.oob.model <- train(train[,-ncol(train)], train$SalePrice,
method = "rf",
ntree = 50,
tuneGrid = data.frame(mtry = 2),
nodesize = 5,
importance = TRUE,
metric = "RMSE",
trControl = trainControl(method = "oob", seed = 1),
allowParallel = FALSE)
但是在caret.oob.model
中出现错误
Error: Bad seeds: the seed object should be a list of length 2 with 1 integer vectors of size 1 and the last list element having at least a single integer.
这是我的数据集https://drive.google.com/file/d/1el-gAgA93EbYnM6VnDqzhT5c5uWsnKvq/view?usp=sharing
我该如何解决这个问题?
randomForest 是一种随机算法,它依赖于行和列的采样。设置 RNG 种子可获得可重现的结果。对于 randomForest
在调用训练函数之前只需要一个种子就足够了。在插入符中,由于重新采样以及安装了不止一个模型这一事实,事情变得更加复杂。
在您的情况下,即使没有重新采样,您也在拟合两个模型,一个用于 mtry
超参数的 OOB 评估和最终模型。
?trainControl
的帮助页面指出,seeds
参数是一组可选的整数,将用于在每次重采样迭代时设置种子。
它被指定为 B+1 个元素的列表,其中 B 是重采样的次数(“boot632”方法除外)。列表的前 B 个元素应该是长度为 M 的整数向量,其中 M 是正在评估的模型数(在您的情况下为 1)。列表的最后一个元素只需要是一个整数(对于最终模型)。
示例:
library(randomForest)
library(caret)
data(mtcars)
set.seed(1)
rf.model <- randomForest(mpg ~ .,
data = mtcars,
ntree = 50,
nodesize = 5,
mtry = 2,
importance = TRUE,
metric = "RMSE")
rf.model
Call:
randomForest(formula = mpg ~ ., data = mtcars, ntree = 50, nodesize = 5, mtry = 2, importance = TRUE, metric = "RMSE")
Type of random forest: regression
Number of trees: 50
No. of variables tried at each split: 2
Mean of squared residuals: 7.353122
% Var explained: 79.1
caret.oob.model <- train(mpg ~ .,
data = mtcars,
method = "rf",
ntree = 50,
tuneGrid = data.frame(mtry = 2),
nodesize = 5,
importance = TRUE,
metric = "RMSE",
trControl = trainControl(method = "oob", seeds = list(1, 1)))
caret.oob.model$finalModel
Call:
randomForest(x = x, y = y, ntree = 50, mtry = param$mtry, nodesize = 5, importance = TRUE)
Type of random forest: regression
Number of trees: 50
No. of variables tried at each split: 2
Mean of squared residuals: 7.353122
% Var explained: 79.1
在我看来,模型是相同的,基于完全相同的 Mean of squared residuals
和 % Var explained
。
我用这个
train$`1stFlrSF`<-train$S1stFlrSF
train$`2ndFlrSF`<-train$S2ndFlrSF
train$`3SsnPorch`<-train$S3SsnPorch
library("randomForest")
set.seed(1)
rf.model <- randomForest(SalePrice ~ .,
data = train,
ntree = 50,
nodesize = 5,
mtry = 2,
importance = TRUE,
metric = "RMSE")
library("caret")
caret.oob.model <- train(train[,-ncol(train)], train$SalePrice,
method = "rf",
ntree = 50,
tuneGrid = data.frame(mtry = 2),
nodesize = 5,
importance = TRUE,
metric = "RMSE",
trControl = trainControl(method = "oob", seed = 1),
allowParallel = FALSE)
但是在caret.oob.model
中出现错误
Error: Bad seeds: the seed object should be a list of length 2 with 1 integer vectors of size 1 and the last list element having at least a single integer.
这是我的数据集https://drive.google.com/file/d/1el-gAgA93EbYnM6VnDqzhT5c5uWsnKvq/view?usp=sharing
我该如何解决这个问题?
randomForest 是一种随机算法,它依赖于行和列的采样。设置 RNG 种子可获得可重现的结果。对于 randomForest
在调用训练函数之前只需要一个种子就足够了。在插入符中,由于重新采样以及安装了不止一个模型这一事实,事情变得更加复杂。
在您的情况下,即使没有重新采样,您也在拟合两个模型,一个用于 mtry
超参数的 OOB 评估和最终模型。
?trainControl
的帮助页面指出,seeds
参数是一组可选的整数,将用于在每次重采样迭代时设置种子。
它被指定为 B+1 个元素的列表,其中 B 是重采样的次数(“boot632”方法除外)。列表的前 B 个元素应该是长度为 M 的整数向量,其中 M 是正在评估的模型数(在您的情况下为 1)。列表的最后一个元素只需要是一个整数(对于最终模型)。
示例:
library(randomForest)
library(caret)
data(mtcars)
set.seed(1)
rf.model <- randomForest(mpg ~ .,
data = mtcars,
ntree = 50,
nodesize = 5,
mtry = 2,
importance = TRUE,
metric = "RMSE")
rf.model
Call:
randomForest(formula = mpg ~ ., data = mtcars, ntree = 50, nodesize = 5, mtry = 2, importance = TRUE, metric = "RMSE")
Type of random forest: regression
Number of trees: 50
No. of variables tried at each split: 2
Mean of squared residuals: 7.353122
% Var explained: 79.1
caret.oob.model <- train(mpg ~ .,
data = mtcars,
method = "rf",
ntree = 50,
tuneGrid = data.frame(mtry = 2),
nodesize = 5,
importance = TRUE,
metric = "RMSE",
trControl = trainControl(method = "oob", seeds = list(1, 1)))
caret.oob.model$finalModel
Call:
randomForest(x = x, y = y, ntree = 50, mtry = param$mtry, nodesize = 5, importance = TRUE)
Type of random forest: regression
Number of trees: 50
No. of variables tried at each split: 2
Mean of squared residuals: 7.353122
% Var explained: 79.1
在我看来,模型是相同的,基于完全相同的 Mean of squared residuals
和 % Var explained
。