Sklearn for Python:有没有办法查看预测的接近程度?

Sklearn for Python: Is there a way to see how close a prediction was?

我正在使用此代码执行预测以对文本进行分类:

predicted = clf.predict(X_new_tfidf)

我的预测要么是说文本片段属于主题 A 要么是主题 B。但是,我想对不稳定的预测做进一步的分析——也就是说,如果模型真的不确定它是否是A或B,但必须为此选择一个。有没有办法提取预测的相对置信度?

代码:

X_train["Sentence I know belongs to Subject A", "Another sentence that describes Subject A", "A sentence about Subject B", "Another sentence about Subject B"...],等等

Y_train包含对应分类器:["Subject A", "Subject A", "Subject B", "Subject B", ...]

predict_these_X是我要分类的句子列表:["Some random sentence", "Another sentence", "Another sentence again", ...]

    count_vect = CountVectorizer()
    tfidf_transformer = TfidfTransformer()

    X_train_counts = count_vect.fit_transform(X_train)
    X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

    X_new_counts = count_vect.transform(predict_these_X)
    X_new_tfidf = tfidf_transformer.transform(X_new_counts)

    estimator = BernoulliNB()
    estimator.fit(X_train_tfidf, Y_train)
    predictions = estimator.predict(X_new_tfidf)

    print estimator.predict_proba(X_new_tfidf)
    return predictions

结果:

[[  9.97388646e-07   9.99999003e-01]
 [  9.99996892e-01   3.10826824e-06]
 [  9.40063326e-01   5.99366742e-02]
 [  9.99999964e-01   3.59816546e-08]
 ...
 [  1.95070084e-10   1.00000000e+00]
 [  3.21721965e-15   1.00000000e+00]
 [  1.00000000e+00   3.89012777e-10]]
from sklearn.datasets import make_classification
from sklearn.naive_bayes import BernoulliNB

# generate some artificial data
X, y = make_classification(n_samples=1000, n_features=50, weights=[0.1, 0.9])


# your estimator
estimator = BernoulliNB()
estimator.fit(X, y)

# generate predictions
estimator.predict(X)
Out[164]: array([1, 1, 1, ..., 0, 1, 1])

# to get confidence on the prediction
estimator.predict_proba(X)

Out[163]: 
array([[ 0.0043,  0.9957],
       [ 0.0046,  0.9954],
       [ 0.0071,  0.9929],
       ..., 
       [ 0.8392,  0.1608],
       [ 0.0018,  0.9982],
       [ 0.0339,  0.9661]])

现在你看,对于前 3 个观测值中的每一个,它有超过 99% 的概率是正例。