有没有办法提取数据中的哪些行满足 R 中混淆矩阵的每个象限?
Is there a way to extract which rows in your data are satisfying each quadrant of a confusion matrix in R?
我正在为二分类问题(0 和 1)生成一个随机森林模型,我想提取模型的训练集和验证集的行,并查看数据的哪些行对应于每个象限我计算出的混淆矩阵。有没有办法在数据集中创建变量并将每个数据点标记为 "Predicted: 1, Actual: 1" 之类的?我想知道具体哪些数据点是假阳性。
# Create a Random Forest model with default parameters
model1 <- randomForest(failure ~ customer_count+ mfgr_yr+ age+ kva_rating+ existing_phasing+ manufacturer+ mounting+ owner_name+ secondary_nominal_voltage+ secondary_voltage_connection+ structure_mounting+ type_vl+ primary_nominal_voltage+ existing_phases+ temp70+ temp80+ temp90+ temp40+ temp30+ temp20+ humidity75+ humidity85+ humidity95+ wind6+ wind10+ wind15+ rain01+ rain07+ rain15+ percentoverloaded
,data = TrainSet, importance = TRUE, cutoff = c(.08,.92))
model1
# Predicting on train set
predTrain <- predict(model1, TrainSet, type = "class")
# Checking classification accuracy
table(predTrain, TrainSet$failure)
# Predicting on Validation set
predValid <- predict(model1, ValidSet, type = "class")
# Checking classification accuracy
mean(predValid == ValidSet$failure)
table(predValid,ValidSet$failure)
这就是我设置混淆矩阵的方式。我不一定需要在数据集中有一个新变量,我只需要能够看到哪些数据行对应于每个象限。谢谢!
为了不过度思考一个非常简单的问题,我提出了一些非常简单的建议:
predTrain <- c(1,1,1,1,0,0,0,0)
TrainSet <- data.frame(failure=c(1,0,1,0,1,0,1,0))
which(predTrain == 1 & TrainSet == 1)
which(predTrain == 1 & TrainSet == 0)
which(predTrain == 0 & TrainSet == 1)
which(predTrain == 0 & TrainSet == 0)
或者如果你真的想要一个新专栏
# example data
predTrain <- c(1,1,1,1,0,0,0,0)
TrainSet <- data.frame(failure=c(1,0,1,0,1,0,1,0))
# building a new row in TrainSet
TrainSet$confusion <- 10 * predTrain + TrainSet$failure
print(TrainSet)
# alternatively
TrainSet$chrConfusion <- paste0(predTrain, TrainSet$failure)
print(TrainSet)
我正在为二分类问题(0 和 1)生成一个随机森林模型,我想提取模型的训练集和验证集的行,并查看数据的哪些行对应于每个象限我计算出的混淆矩阵。有没有办法在数据集中创建变量并将每个数据点标记为 "Predicted: 1, Actual: 1" 之类的?我想知道具体哪些数据点是假阳性。
# Create a Random Forest model with default parameters
model1 <- randomForest(failure ~ customer_count+ mfgr_yr+ age+ kva_rating+ existing_phasing+ manufacturer+ mounting+ owner_name+ secondary_nominal_voltage+ secondary_voltage_connection+ structure_mounting+ type_vl+ primary_nominal_voltage+ existing_phases+ temp70+ temp80+ temp90+ temp40+ temp30+ temp20+ humidity75+ humidity85+ humidity95+ wind6+ wind10+ wind15+ rain01+ rain07+ rain15+ percentoverloaded
,data = TrainSet, importance = TRUE, cutoff = c(.08,.92))
model1
# Predicting on train set
predTrain <- predict(model1, TrainSet, type = "class")
# Checking classification accuracy
table(predTrain, TrainSet$failure)
# Predicting on Validation set
predValid <- predict(model1, ValidSet, type = "class")
# Checking classification accuracy
mean(predValid == ValidSet$failure)
table(predValid,ValidSet$failure)
这就是我设置混淆矩阵的方式。我不一定需要在数据集中有一个新变量,我只需要能够看到哪些数据行对应于每个象限。谢谢!
为了不过度思考一个非常简单的问题,我提出了一些非常简单的建议:
predTrain <- c(1,1,1,1,0,0,0,0)
TrainSet <- data.frame(failure=c(1,0,1,0,1,0,1,0))
which(predTrain == 1 & TrainSet == 1)
which(predTrain == 1 & TrainSet == 0)
which(predTrain == 0 & TrainSet == 1)
which(predTrain == 0 & TrainSet == 0)
或者如果你真的想要一个新专栏
# example data
predTrain <- c(1,1,1,1,0,0,0,0)
TrainSet <- data.frame(failure=c(1,0,1,0,1,0,1,0))
# building a new row in TrainSet
TrainSet$confusion <- 10 * predTrain + TrainSet$failure
print(TrainSet)
# alternatively
TrainSet$chrConfusion <- paste0(predTrain, TrainSet$failure)
print(TrainSet)