为什么 predict_proba 函数 return 2 列？

Question

为什么 predict_proba 函数给出 2 列？

我查看了这个网站： https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.predict_proba

然而，它只是说 returns: T: array-like of shape (n_samples, n_classes)

Returns 模型中每个 class 样本的概率，其中 classes 按它们在 self.classes_.

中的顺序排序

我还是不明白为什么输出总是returns 2列。

import numpy as np
import pandas as pd
from pylab import rcParams
import seaborn as sb 
from sklearn.preprocessing import scale
from collections import Counter

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

%matplotlib inline
rcParams['figure.figsize'] = 5,4
sb.set_style('whitegrid')

from sklearn.linear_model import LogisticRegression


import os
cwd = os.getcwd()
file_path = cwd + '\Default.xlsx' 
default_data = pd.read_excel(file_path)

default_data = pd.read_excel('Default.xlsx')

default_data = default_data.drop(['Unnamed: 0'], axis=1)
default_data['default_factor'] = default_data.default.factorize()[0]
default_data['student_factor'] = default_data.student.factorize()[0]

X = default_data[['balance']]
y = default_data['default_factor']

lr = LogisticRegression()
lr.fit(X, y)

X_pred = np.linspace(start = 0, stop = 3000, num = 2).reshape(-1,1)
y_pred = lr.predict_proba(X_pred)

X_pred
X_pred.shape
y_pred.shape

Answer 1

简答

在每一列中，它都会为您提供有关该样本属于此 class 的概率的信息（零列显示属于 class 0 的概率，第一列显示属于 class 1 等等)

详细解答

假设 y_pred.shape 给你形状 (2, 2) 意味着你有 2 samples 和 2 classes.

假设您的 X_pred 看起来像这样：

 In: print(X_pred)
Out: [[   0.],
      [3000.]]

这意味着你有两个样本：

sample one，只有特征 x = [0] 和
sample two，只有特征 x = [3000]

假设您的预测输出如下所示：

In:  print(y_pred)

Out: [[0.28, 0.72]
      [0.65, 0.35]]

所以这意味着，sample one 很可能属于 class = 1（第一行告诉你它可能是 class 0，概率为 28% 和 class 1 概率 72%)

和 sample two 很可能属于 class = 0（第二行告诉你它可能是 class 0 的概率是 65% 和 class 1 的概率是 35%)

为什么 predict_proba 函数 return 2 列？

Why does the predict_proba function return 2 columns?

python

scikit-learn

简答

详细解答