运行 并用分类数据绘制随机森林作为 r 中的特征向量
Running and plotting Random Forest with categorical data as a feature vector in r
我有如下数据。它有三列。第一列是分类数据,第二列是数字,最后一列是我的 class 标签。我想 运行 我的数据上的随机森林并绘制树以及变量重要性。我的目标是找到哪个 subject_result
是最重要的,其次是什么,还可以看到树。
使用此代码有错误
library(randomForest)
randomForest(ENSC_Disc~.,data = df)
Error in randomForest.default(m, y, ...) :
NA/NaN/Inf in foreign function call (arg 1)
同样使用rpart和ctree return错误。
data.frame(stringsAsFactors=FALSE,
subject_result = c("ENSCPassed", "CHEMPassed", "ENSCPassed", "OTHERPassed",
"ENSCPassed", "MATHPassed", "ENSCPassed", "OTHERPassed",
"OTHERPassed", "OTHERPassed", "PHYSPassed", "CHEMPassed",
"MATHPassed", "ENSCPassed", "CMPTPassed", "OTHERPassed",
"CMPTPassed"),
semester_num = c(9L, 4L, 16L, 7L, 7L, 2L, 8L, 11L, 4L, 12L, 1L, 4L, 3L,
11L, 8L, 11L, 12L),
ENSC_Disc = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)
)
这是使用 caret.
的尝试
library(tidyverse)
library(caret)
df<-data.frame(stringsAsFactors=FALSE,
subject_result = c("ENSCPassed", "CHEMPassed", "ENSCPassed", "OTHERPassed",
"ENSCPassed", "MATHPassed", "ENSCPassed", "OTHERPassed",
"OTHERPassed", "OTHERPassed", "PHYSPassed", "CHEMPassed",
"MATHPassed", "ENSCPassed", "CMPTPassed", "OTHERPassed",
"CMPTPassed"),
semester_num = c(9L, 4L, 16L, 7L, 7L, 2L, 8L, 11L, 4L, 12L, 1L, 4L, 3L,
11L, 8L, 11L, 12L),
ENSC_Disc = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)
)
set.seed(233)
str(df)
df$ENSC_Disc<-as.factor(df$ENSC_Disc)
fit.rf<-train(ENSC_Disc~.,data=df,metric="Accuracy",method="rf",
trControl=trainControl(method="cv",number=5))
第一个变量重要性:
plot(varImp(fit.rf))
实际的树:这并不像我想的那样有效。更好的方法是使用 library(rattle)
但这只适用于 "rpart" 而不是 "rf" 但这里是:
plot(fit.rf$finalModel)
我有如下数据。它有三列。第一列是分类数据,第二列是数字,最后一列是我的 class 标签。我想 运行 我的数据上的随机森林并绘制树以及变量重要性。我的目标是找到哪个 subject_result
是最重要的,其次是什么,还可以看到树。
使用此代码有错误
library(randomForest)
randomForest(ENSC_Disc~.,data = df)
Error in randomForest.default(m, y, ...) :
NA/NaN/Inf in foreign function call (arg 1)
同样使用rpart和ctree return错误。
data.frame(stringsAsFactors=FALSE,
subject_result = c("ENSCPassed", "CHEMPassed", "ENSCPassed", "OTHERPassed",
"ENSCPassed", "MATHPassed", "ENSCPassed", "OTHERPassed",
"OTHERPassed", "OTHERPassed", "PHYSPassed", "CHEMPassed",
"MATHPassed", "ENSCPassed", "CMPTPassed", "OTHERPassed",
"CMPTPassed"),
semester_num = c(9L, 4L, 16L, 7L, 7L, 2L, 8L, 11L, 4L, 12L, 1L, 4L, 3L,
11L, 8L, 11L, 12L),
ENSC_Disc = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)
)
这是使用 caret.
library(tidyverse)
library(caret)
df<-data.frame(stringsAsFactors=FALSE,
subject_result = c("ENSCPassed", "CHEMPassed", "ENSCPassed", "OTHERPassed",
"ENSCPassed", "MATHPassed", "ENSCPassed", "OTHERPassed",
"OTHERPassed", "OTHERPassed", "PHYSPassed", "CHEMPassed",
"MATHPassed", "ENSCPassed", "CMPTPassed", "OTHERPassed",
"CMPTPassed"),
semester_num = c(9L, 4L, 16L, 7L, 7L, 2L, 8L, 11L, 4L, 12L, 1L, 4L, 3L,
11L, 8L, 11L, 12L),
ENSC_Disc = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)
)
set.seed(233)
str(df)
df$ENSC_Disc<-as.factor(df$ENSC_Disc)
fit.rf<-train(ENSC_Disc~.,data=df,metric="Accuracy",method="rf",
trControl=trainControl(method="cv",number=5))
第一个变量重要性:
plot(varImp(fit.rf))
实际的树:这并不像我想的那样有效。更好的方法是使用 library(rattle)
但这只适用于 "rpart" 而不是 "rf" 但这里是:
plot(fit.rf$finalModel)