在 ML 文本分类中,如果文本不属于任何类别怎么办?
In ML text classification what if text doesn't belongs to any category?
我使用逻辑回归对体育、政治、商业、娱乐等新闻类型进行文本分类,不属于上述类别的文本也被预测为该类别之一。如何在机器学习中防止这种情况发生?还有如何将类别外的文本添加到 other_category 标签?
predict
方法将为您提供概率最高的预测。您可以使用 predict_proba
方法,该方法将为您提供每个类别的概率分数。所以你可以使用
max()
函数得到最大概率,然后你可以简单地使用 if
语句检查概率是否大于所需值打印预测否则 else
打印其他。
如果您不明白,请查看示例代码。
model.fit(text, tags)
textToBeClassified = ["Google's shares are going down"] # it is to be in a [] list that's how the predict method expects the input, you can classify more text by adding here separated by a comma.
prediction = model.predict(textToBeClassified) # it will return the tag as politics or sports or business etc.
predictionConfidence = model.predict_proba(textToBeClassified) # it will return a tuple of confidence score (probability) for each inputs.
maxConfidence = max(predictionConfidence[0]) # I'm having only one text to be classified which is the first text so I'm finding the maximum value of the first text.
if maxConfidence > 0.8: # I want the output only if it is 80% confident about the classification, you can change 0.8 to 0.9 if you want 90% accurate.
print(prediction)
else:
print("Sorry the text doesn't fall under any of the categories")
尝试在各处添加打印语句,这样您就知道发生了什么
model.fit(text, tags)
textToBeClassified = ["Google's shares are going down"]
prediction = model.predict(textToBeClassified)
print("Predicted as:", prediction)
predictionConfidence = model.predict_proba(textToBeClassified)
print("The Confidance scores:", predictionCondidence)
maxConfidence = max(predictionConfidence[0])
print("maximum confidence score is:", maxConfidence)
if maxConfidence > 0.8:
print(prediction)
else:
print("Sorry the text doesn't fall under any of the categories")
像这样:)
我使用逻辑回归对体育、政治、商业、娱乐等新闻类型进行文本分类,不属于上述类别的文本也被预测为该类别之一。如何在机器学习中防止这种情况发生?还有如何将类别外的文本添加到 other_category 标签?
predict
方法将为您提供概率最高的预测。您可以使用 predict_proba
方法,该方法将为您提供每个类别的概率分数。所以你可以使用
max()
函数得到最大概率,然后你可以简单地使用 if
语句检查概率是否大于所需值打印预测否则 else
打印其他。
如果您不明白,请查看示例代码。
model.fit(text, tags)
textToBeClassified = ["Google's shares are going down"] # it is to be in a [] list that's how the predict method expects the input, you can classify more text by adding here separated by a comma.
prediction = model.predict(textToBeClassified) # it will return the tag as politics or sports or business etc.
predictionConfidence = model.predict_proba(textToBeClassified) # it will return a tuple of confidence score (probability) for each inputs.
maxConfidence = max(predictionConfidence[0]) # I'm having only one text to be classified which is the first text so I'm finding the maximum value of the first text.
if maxConfidence > 0.8: # I want the output only if it is 80% confident about the classification, you can change 0.8 to 0.9 if you want 90% accurate.
print(prediction)
else:
print("Sorry the text doesn't fall under any of the categories")
尝试在各处添加打印语句,这样您就知道发生了什么
model.fit(text, tags)
textToBeClassified = ["Google's shares are going down"]
prediction = model.predict(textToBeClassified)
print("Predicted as:", prediction)
predictionConfidence = model.predict_proba(textToBeClassified)
print("The Confidance scores:", predictionCondidence)
maxConfidence = max(predictionConfidence[0])
print("maximum confidence score is:", maxConfidence)
if maxConfidence > 0.8:
print(prediction)
else:
print("Sorry the text doesn't fall under any of the categories")
像这样:)