在 ML 文本分类中，如果文本不属于任何类别怎么办？

Question

我使用逻辑回归对体育、政治、商业、娱乐等新闻类型进行文本分类，不属于上述类别的文本也被预测为该类别之一。如何在机器学习中防止这种情况发生？还有如何将类别外的文本添加到 other_category 标签？

Answer 1

predict 方法将为您提供概率最高的预测。您可以使用 predict_proba 方法，该方法将为您提供每个类别的概率分数。所以你可以使用 max() 函数得到最大概率，然后你可以简单地使用 if 语句检查概率是否大于所需值打印预测否则 else 打印其他。如果您不明白，请查看示例代码。

model.fit(text, tags)

textToBeClassified = ["Google's shares are going down"]  # it is to be in a [] list that's how the predict method expects the input, you can classify more text by adding here separated by a comma.

prediction = model.predict(textToBeClassified)  # it will return the tag as politics or sports or business etc.
predictionConfidence = model.predict_proba(textToBeClassified)  # it will return a tuple of confidence score (probability) for each inputs.

maxConfidence = max(predictionConfidence[0])  # I'm having only one text to be classified which is the first text so I'm finding the maximum value of the first text.

if maxConfidence > 0.8:  # I want the output only if it is 80% confident about the classification, you can change 0.8 to 0.9 if you want 90% accurate.
    print(prediction)
else:
    print("Sorry the text doesn't fall under any of the categories")

尝试在各处添加打印语句，这样您就知道发生了什么

model.fit(text, tags)

textToBeClassified = ["Google's shares are going down"]

prediction = model.predict(textToBeClassified)
print("Predicted as:", prediction)

predictionConfidence = model.predict_proba(textToBeClassified)
print("The Confidance scores:", predictionCondidence)

maxConfidence = max(predictionConfidence[0])
print("maximum confidence score is:", maxConfidence)

if maxConfidence > 0.8:
    print(prediction)
else:
    print("Sorry the text doesn't fall under any of the categories")

像这样:)

在 ML 文本分类中，如果文本不属于任何类别怎么办？

In ML text classification what if text doesn't belongs to any category?

machine-learning

python-3.x

logistic-regression