为什么 sklearn.svm.svc 的 predict_proba 函数给出的概率大于 1？

Question

我有一个 sklearn.svm.svc（RBF 内核）模型，在两个类上训练，每个模型包含 140 个样本。当我尝试预测时概率设置为真，并且这两个类的预测概率在变化。

对于某些测试样本，给出的概率大于1
和其他少于一个

例如（'sample-1'：1.55478334，'sample-2'： 0.999984).
在某些情况下，它给出的概率都小于 1

例如（'sample-1'：0.4182294947776875，'sample-2'： 0.58177035052223113).

我的模型是否运行良好，或者我的训练或测试有问题。 Probability greater then 1Probability less then 1

我的代码如下：

#Training code      
        tcdf512_d1=np.empty(280,(18)),dtype=float)
            lables=np.empty((0))
            model512_d1=SVC(probability=True)
            for img,img2 in map(None,catA,catB):
                if img!=None:
                    tcdf512_d1[k]=img(18 features i.e. skewness,variance, standard deviation etc)
                    k+=1
                    lables=np.append(lables,'Cat-VI')
                    pass
                if img2!=None:
                    tcdf512_d1[k]=img2(18 features i.e. skewness,variance, standard deviation etc)
                    k+=1
                    lables=np.append(lables,'Cat-VII')
                    pass
                if k%50==0:
                    print (k)

            print ("LBP Calculated")
            print (time.strftime('%d/%m/%Y %H:%M:%S'))
            model512_d1.fit(tcdf512_d1,lables)
            tcdf512_d1=None
            lables=None
            k=None
            print ("Model Trained")
            print (time.strftime('%d/%m/%Y %H:%M:%S'))
            joblib.dump(model512_d1,"Cat/momentsCat_6-7_128_d1.pkl",compress=3)
            print ("Model Saved")
            print (time.strftime('%d/%m/%Y %H:%M:%S'))
            model512_d1=None
#Testing Code

    size=128
    Cat_I_II       =  joblib.load("Cat/momentsCat_6-7_128_d1.pkl")
    name1="VII"
    print (name1)
    images_address="Catagory/Testbg/"+name1+"/"
    name1="Cat-"+str(name1)
    test_images = cvutils.imlist(images_address)

    count =images_address.rfind("/")+1
    results1=[]
    print (len(test_images))
    print ("Start Time ")
    print (time.strftime('%d/%m/%Y %H:%M:%S'))
    j=float(len(test_images))
    k=0
#    testdata=[]
    for img3 in test_images:
        results1.append("Image : "+str(img3[count:]))
        results1.append("\n")
    varientarray=[]        
        array=[]
        array.append(img3(18 features i.e. skewness,variance, standard deviation etc))
        print array
        prediction = Cat_I_II.predict(array)[0]
        prob=Cat_I_II.predict_proba(array)[0]
        prob_per_class_dictionary = dict(zip(Cat_I_II.classes_, prob))
        print(prediction,prob_per_class_dictionary)
        results1.append("Result of Cat_I_II is : "+str(prediction) +"\t"+str(prob_per_class_dictionary))
        varientarray.append(prediction)

        print (k)
        print ("Final Result of image "+str(i[count:]) + " is : "+str(collections.Counter(varientarray).most_common(1)[0][0]))
        results1.append("Final Result of image "+str(i[count:]) + " is : "+str(collections.Counter(varientarray).most_common(1)[0][0]))

        if str(i[count:i.index('0')])==collections.Counter(varientarray).most_common(1)[0][0]:
            j-=1
        gc.collect()
        k+=1
    k=float(j*100/len(test_images))
    Accuracy=float((len(test_images)-j)*100/len(test_images))
    print (j)
    print (k)
    print (Accuracy)
    with open("CatResults/_Finalresults.txt", 'a') as f:
        f.write(str("The accuracy for "+str(name1)+" is :"+str(Accuracy)) +"\n")
    results1.append("Incorrect Results are :"+str(j))
    results1.append("The percentage of incorrect result is :"+str(k))
    results1.append("The accuracy is :"+str(Accuracy))
    with open("CatResults/Cat-"+str(name1)+"resultsp2.txt", 'w') as f:
        for s in results1:
            f.write(str(s) +"\n")
    print ("End Time")
    print(time.strftime('%d/%m/%Y %H:%M:%S'))

我的结果片段如下

Answer 1

请注意这些概率中的e-06或e-08。这相当于科学记数法中的 10^(-08)。所以你想的上面1的概率是非常非常小的。

例如：

2.798594e-06 = 0.000002798594

同样，

7.7173288137e-08 = 0.000000077173288137

因此，当您对这些值求和时，您将得到 1。如果不是 1，那么它将类似于 0.99999999...。由于显示的结果四舍五入，这是预期的结果。

因此 predict_proba 结果并不矛盾。他们实际上是正确的。

现在至于为什么预测结果与最高预测概率不匹配，那是在文档中描述的，并且是由于算法内部结构而导致的预期行为。请查看文档：-

http://scikit-learn.org/dev/modules/svm.html#scores-and-probabilities

The probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities. (E.g., in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba.)

为什么 sklearn.svm.svc 的 predict_proba 函数给出的概率大于 1？

why the predict_proba function of sklearn.svm.svc is giving probability greater than 1?

python

svm

svc

scikit-learn