predict_proba(X) of RandomForestClassifier (sklearn) 似乎是静态的？

Question

对于所有 classes，我想检索给定样本的 prediction-score/probability。我正在使用 sklearn 的 RandomForestClassifier。如果我使用 .predict()，我的代码是运行ning 没问题。但是，为了显示我正在使用 .predict_proba(X) 的概率，它 returns 始终具有相同的值，即使 X 发生变化也是如此。为什么会这样，如何解决？

我将我的代码分解为相关部分：

# ... code ... feature generation / gets the feature data
if rf is None:
    rf = RandomForestClassifier(n_estimators=80)
    rf.fit(featureData, classes)
else:
    prediction = rf.predict(featureData) # gets the right class / always different
    proba = rf.predict_proba(featureData) 
    print proba # this prints always the same values for all my 40 classes

有趣的是 max(proba) 在第一个运行中检索 .predict() returns 的 class。由于 .predict() 按预期工作，我认为错误出在 sklearn 方面，即我想有一个标志需要设置。

有人有想法吗？

Answer 1

我想问题是您总是将相同的参数传递给 predict_proba。这是我从 iris 数据集构建森林的代码：

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
iris = datasets.load_iris()
X = iris.data
y = iris.target
rf = RandomForestClassifier(n_estimators=80)
rf.fit(X, y)

当我调用方法 predict 和 predict_proba 时，不同参数的 class 和 class 对数概率预测也不同，正如人们可以合理预期的那样.

样本运行：

In [82]: a, b = X[:3], X[-3:]

In [83]: a
Out[83]: 
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2]])

In [84]: b
Out[84]: 
array([[ 6.5,  3. ,  5.2,  2. ],
       [ 6.2,  3.4,  5.4,  2.3],
       [ 5.9,  3. ,  5.1,  1.8]])

In [85]: rf.predict(a)
Out[85]: array([0, 0, 0])

In [86]: rf.predict(b)
Out[86]: array([2, 2, 2])

In [87]: rf.predict_proba(a)
Out[87]: 
array([[ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])

In [88]: rf.predict_proba(b)
Out[88]: 
array([[ 0.    ,  0.    ,  1.    ],
       [ 0.    ,  0.0125,  0.9875],
       [ 0.    ,  0.0375,  0.9625]])

predict_proba(X) of RandomForestClassifier (sklearn) 似乎是静态的？

predict_proba(X) of RandomForestClassifier (sklearn) seems to be static?

python

classification

probability

random-forest

scikit-learn