Why isnt my logistic regression model output a factor of 2 levels? (Error: `data` and `reference` should be factors with the same levels.)

Why isnt my logistic regression model output a factor of 2 levels? (Error: `data` and `reference` should be factors with the same levels.)

通过阅读类似的问题,我知道问题在于 yhat.logisticReg 不是 2 级的因子,而 training.prepped$TARGET_FLAG 是。我假设这个问题可以通过改变我的模型或在预测中得到解决,所以 yhat.logisticReg 是 2 级的一个因素。我该怎么做?

logisticReg = glm(TARGET_FLAG ~ .,
                  data = training.prepped,
                  family = binomial())
yhat.logisticReg = predict(logisticReg, training.prepped, type = "response")
confusionMatrix(yhat.logisticReg, training.prepped$TARGET_FLAG)

Error: `data` and `reference` should be factors with the same levels.
str(training.prepped$TARGET_FLAG)
Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 2 1 ...

str(yhat.logisticReg)
 Named num [1:8161] 0.1656 0.2792 0.3717 0.0894 0.272 ...
 - attr(*, "names")= chr [1:8161] "1" "2" "3" "4" ...

您可能需要先选择一个阈值,然后将您的实数值数据转换为二进制值,例如

a <- c(0.2, 0.7, 0.4)
threshold <- 0.5
binary_a <- factor(as.numeric(a>threshold))

str(binary_a)
Factor w/ 2 levels "0","1": 1 2 1

库插入符号的方法 confusionMatrix 实现了多个指标。调用 overall 你可以得到准确度。如果你想要另一个指标,你可以检查他们是否已经实施并调用它。

library(caret)
acc = c()
for(value in yhat.logisticReg)
{
  predictions <- ifelse(yhat.logisticReg <= value, 0, 1)
  confusion_matrix = confusionMatrix(predictions, yhat.logisticReg)
  acc = c(acc,confusion_matrix$overall["Accuracy"])
}

best_acc = max(acc)
best_threshold  = yhat.logisticReg[which.max(acc)]