R 中的机器学习 - 集成的混淆矩阵
Machine Learning in R - confusion matrix of an ensemble
我正在尝试访问多个分类器的总体准确度(或混淆矩阵),但似乎找不到如何报告这一点。
已经尝试过:
confusionMatrix(fits_predicts,reference=(mnist_27$test$y))
Error in table(data, reference, dnn = dnn, ...) : all arguments
must have the same length
library(caret)
library(dslabs)
set.seed(1)
data("mnist_27")
models <- c("glm", "lda", "naive_bayes", "svmLinear",
"gamboost", "gamLoess", "qda",
"knn", "kknn", "loclda", "gam",
"rf", "ranger", "wsrf", "Rborist",
"avNNet", "mlp", "monmlp",
"adaboost", "gbm",
"svmRadial", "svmRadialCost", "svmRadialSigma")
fits <- lapply(models, function(model){
print(model)
train(y ~ ., method = model, data = mnist_27$train)
})
names(fits) <- models
fits_predicts <- sapply(fits, function(fits){ predict(fits,mnist_27$test)
})
我想报告不同模型中的 confusionMatrix。
您没有训练任何 ensemble;你只是训练了几个模型的列表,没有以任何方式组合它们,这绝对不是一个集合。
鉴于此,您得到的错误并不意外,因为 confusionMatrix
期望单个预测(如果您确实有一个整体,情况就是如此),而不是多个预测。
为简单起见,只保留前 4 个模型的列表,并稍微更改 fits_predicts
定义,以便它提供数据框,即:
models <- c("glm", "lda", "naive_bayes", "svmLinear")
fits_predicts <- as.data.frame( sapply(fits, function(fits){ predict(fits,mnist_27$test)
}))
# rest of your code as-is
这里是如何获得每个模型的混淆矩阵:
cm <- lapply(fits_predicts, function(fits_predicts){confusionMatrix(fits_predicts,reference=(mnist_27$test$y))
})
这给出了
> cm
$glm
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 82 26
7 24 68
Accuracy : 0.75
95% CI : (0.684, 0.8084)
No Information Rate : 0.53
P-Value [Acc > NIR] : 1.266e-10
Kappa : 0.4976
Mcnemar's Test P-Value : 0.8875
Sensitivity : 0.7736
Specificity : 0.7234
Pos Pred Value : 0.7593
Neg Pred Value : 0.7391
Prevalence : 0.5300
Detection Rate : 0.4100
Detection Prevalence : 0.5400
Balanced Accuracy : 0.7485
'Positive' Class : 2
$lda
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 82 26
7 24 68
Accuracy : 0.75
95% CI : (0.684, 0.8084)
No Information Rate : 0.53
P-Value [Acc > NIR] : 1.266e-10
Kappa : 0.4976
Mcnemar's Test P-Value : 0.8875
Sensitivity : 0.7736
Specificity : 0.7234
Pos Pred Value : 0.7593
Neg Pred Value : 0.7391
Prevalence : 0.5300
Detection Rate : 0.4100
Detection Prevalence : 0.5400
Balanced Accuracy : 0.7485
'Positive' Class : 2
$naive_bayes
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 88 23
7 18 71
Accuracy : 0.795
95% CI : (0.7323, 0.8487)
No Information Rate : 0.53
P-Value [Acc > NIR] : 5.821e-15
Kappa : 0.5873
Mcnemar's Test P-Value : 0.5322
Sensitivity : 0.8302
Specificity : 0.7553
Pos Pred Value : 0.7928
Neg Pred Value : 0.7978
Prevalence : 0.5300
Detection Rate : 0.4400
Detection Prevalence : 0.5550
Balanced Accuracy : 0.7928
'Positive' Class : 2
$svmLinear
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 81 24
7 25 70
Accuracy : 0.755
95% CI : (0.6894, 0.8129)
No Information Rate : 0.53
P-Value [Acc > NIR] : 4.656e-11
Kappa : 0.5085
Mcnemar's Test P-Value : 1
Sensitivity : 0.7642
Specificity : 0.7447
Pos Pred Value : 0.7714
Neg Pred Value : 0.7368
Prevalence : 0.5300
Detection Rate : 0.4050
Detection Prevalence : 0.5250
Balanced Accuracy : 0.7544
'Positive' Class : 2
您还可以访问每个模型的单个混淆矩阵,例如对于 lda
:
> cm['lda']
$lda
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 82 26
7 24 68
Accuracy : 0.75
95% CI : (0.684, 0.8084)
No Information Rate : 0.53
P-Value [Acc > NIR] : 1.266e-10
Kappa : 0.4976
Mcnemar's Test P-Value : 0.8875
Sensitivity : 0.7736
Specificity : 0.7234
Pos Pred Value : 0.7593
Neg Pred Value : 0.7391
Prevalence : 0.5300
Detection Rate : 0.4100
Detection Prevalence : 0.5400
Balanced Accuracy : 0.7485
'Positive' Class : 2
我正在尝试访问多个分类器的总体准确度(或混淆矩阵),但似乎找不到如何报告这一点。
已经尝试过:
confusionMatrix(fits_predicts,reference=(mnist_27$test$y))
Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length
library(caret)
library(dslabs)
set.seed(1)
data("mnist_27")
models <- c("glm", "lda", "naive_bayes", "svmLinear",
"gamboost", "gamLoess", "qda",
"knn", "kknn", "loclda", "gam",
"rf", "ranger", "wsrf", "Rborist",
"avNNet", "mlp", "monmlp",
"adaboost", "gbm",
"svmRadial", "svmRadialCost", "svmRadialSigma")
fits <- lapply(models, function(model){
print(model)
train(y ~ ., method = model, data = mnist_27$train)
})
names(fits) <- models
fits_predicts <- sapply(fits, function(fits){ predict(fits,mnist_27$test)
})
我想报告不同模型中的 confusionMatrix。
您没有训练任何 ensemble;你只是训练了几个模型的列表,没有以任何方式组合它们,这绝对不是一个集合。
鉴于此,您得到的错误并不意外,因为 confusionMatrix
期望单个预测(如果您确实有一个整体,情况就是如此),而不是多个预测。
为简单起见,只保留前 4 个模型的列表,并稍微更改 fits_predicts
定义,以便它提供数据框,即:
models <- c("glm", "lda", "naive_bayes", "svmLinear")
fits_predicts <- as.data.frame( sapply(fits, function(fits){ predict(fits,mnist_27$test)
}))
# rest of your code as-is
这里是如何获得每个模型的混淆矩阵:
cm <- lapply(fits_predicts, function(fits_predicts){confusionMatrix(fits_predicts,reference=(mnist_27$test$y))
})
这给出了
> cm
$glm
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 82 26
7 24 68
Accuracy : 0.75
95% CI : (0.684, 0.8084)
No Information Rate : 0.53
P-Value [Acc > NIR] : 1.266e-10
Kappa : 0.4976
Mcnemar's Test P-Value : 0.8875
Sensitivity : 0.7736
Specificity : 0.7234
Pos Pred Value : 0.7593
Neg Pred Value : 0.7391
Prevalence : 0.5300
Detection Rate : 0.4100
Detection Prevalence : 0.5400
Balanced Accuracy : 0.7485
'Positive' Class : 2
$lda
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 82 26
7 24 68
Accuracy : 0.75
95% CI : (0.684, 0.8084)
No Information Rate : 0.53
P-Value [Acc > NIR] : 1.266e-10
Kappa : 0.4976
Mcnemar's Test P-Value : 0.8875
Sensitivity : 0.7736
Specificity : 0.7234
Pos Pred Value : 0.7593
Neg Pred Value : 0.7391
Prevalence : 0.5300
Detection Rate : 0.4100
Detection Prevalence : 0.5400
Balanced Accuracy : 0.7485
'Positive' Class : 2
$naive_bayes
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 88 23
7 18 71
Accuracy : 0.795
95% CI : (0.7323, 0.8487)
No Information Rate : 0.53
P-Value [Acc > NIR] : 5.821e-15
Kappa : 0.5873
Mcnemar's Test P-Value : 0.5322
Sensitivity : 0.8302
Specificity : 0.7553
Pos Pred Value : 0.7928
Neg Pred Value : 0.7978
Prevalence : 0.5300
Detection Rate : 0.4400
Detection Prevalence : 0.5550
Balanced Accuracy : 0.7928
'Positive' Class : 2
$svmLinear
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 81 24
7 25 70
Accuracy : 0.755
95% CI : (0.6894, 0.8129)
No Information Rate : 0.53
P-Value [Acc > NIR] : 4.656e-11
Kappa : 0.5085
Mcnemar's Test P-Value : 1
Sensitivity : 0.7642
Specificity : 0.7447
Pos Pred Value : 0.7714
Neg Pred Value : 0.7368
Prevalence : 0.5300
Detection Rate : 0.4050
Detection Prevalence : 0.5250
Balanced Accuracy : 0.7544
'Positive' Class : 2
您还可以访问每个模型的单个混淆矩阵,例如对于 lda
:
> cm['lda']
$lda
Confusion Matrix and Statistics
Reference
Prediction 2 7
2 82 26
7 24 68
Accuracy : 0.75
95% CI : (0.684, 0.8084)
No Information Rate : 0.53
P-Value [Acc > NIR] : 1.266e-10
Kappa : 0.4976
Mcnemar's Test P-Value : 0.8875
Sensitivity : 0.7736
Specificity : 0.7234
Pos Pred Value : 0.7593
Neg Pred Value : 0.7391
Prevalence : 0.5300
Detection Rate : 0.4100
Detection Prevalence : 0.5400
Balanced Accuracy : 0.7485
'Positive' Class : 2