在 R studio 中训练数据集
Training the datasets in R studio
我将数据集分为 70% 的训练集和 30% 的验证集。也有很多 NaN 变量,也许正因为如此,我无法训练我的数据。虽然,我能够将数据集区分为训练和测试数据集。但是当我想训练时,出现了这个错误 ("Error in na.fail.default(list(ndvi = c(0.426755102040816, 0.409, 0.501735849056604, : missing values in object")。
我想使用 NDVI 估算生物量,然后查看与观测到的生物量的关系。
set.seed(123)
inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
training = newdata[ inTrain,]
testing = newdata[-inTrain,]
cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
newdata[cols] <- lapply(newdata[cols], factor) ## as.factor() could also be used
set.seed(32343)
modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)
modelFit
创建 modelfit 后,我想在 R 中使用 'vif' 找出哪些变量很重要。
试试这个
# load library
library(caret)
# set seed value
set.seed(123)
# remove NA's in data
newdata = na.omit(newdata)
# split data set
inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
training = newdata[ inTrain,]
testing = newdata[-inTrain,]
# convert columns to factors
cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
newdata[cols] <- lapply(newdata[cols], factor) ## as.factor() could also be used
# reset seed value
set.seed(32343)
# train model
modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)
# view model
modelFit
我将数据集分为 70% 的训练集和 30% 的验证集。也有很多 NaN 变量,也许正因为如此,我无法训练我的数据。虽然,我能够将数据集区分为训练和测试数据集。但是当我想训练时,出现了这个错误 ("Error in na.fail.default(list(ndvi = c(0.426755102040816, 0.409, 0.501735849056604, : missing values in object")。
我想使用 NDVI 估算生物量,然后查看与观测到的生物量的关系。
set.seed(123)
inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
training = newdata[ inTrain,]
testing = newdata[-inTrain,]
cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
newdata[cols] <- lapply(newdata[cols], factor) ## as.factor() could also be used
set.seed(32343)
modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)
modelFit
创建 modelfit 后,我想在 R 中使用 'vif' 找出哪些变量很重要。
试试这个
# load library
library(caret)
# set seed value
set.seed(123)
# remove NA's in data
newdata = na.omit(newdata)
# split data set
inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
training = newdata[ inTrain,]
testing = newdata[-inTrain,]
# convert columns to factors
cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
newdata[cols] <- lapply(newdata[cols], factor) ## as.factor() could also be used
# reset seed value
set.seed(32343)
# train model
modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)
# view model
modelFit