如何使用 matplotlib 在 statsmodel 中绘制 Logit 的结果

Question

在这个数据集中，我有两个分类响应值（0 和 1），我想使用 statsmodels 拟合 Logit 模型。

X_incl_const = sm.add_constant(X)
model = sm.Logit(y, X_incl_const)
results = model.fit()
results.summary()

当我尝试使用以下代码绘制直线和点时：

plt.scatter(X, y)
plt.plot(X, model.predict(X))

我收到以下错误：

    ValueError                                Traceback (most recent call last)
    <ipython-input-16-d69741b1f0ad> in <module>
          1 plt.scatter(X, y)
    ----> 2 plt.plot(X, model.predict(X))
    
    ~\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in predict(self, params, exog, linear)
        461             exog = self.exog
        462         if not linear:
    --> 463             return self.cdf(np.dot(exog, params))
        464         else:
        465             return np.dot(exog, params)
    
    <__array_function__ internals> in dot(*args, **kwargs)
    
    ValueError: shapes (518,2) and (518,) not aligned: 2 (dim 1) != 518 (dim 0)

如何绘制此模型预测的预测线？

Answer 1

查看您收到的错误：ValueError: shapes (518,2) and (518,) not aligned: 2 (dim 1) != 518 (dim 0)。它准确地说你的 X 是 518x2，意味着它有两个“列”（又名特征是二维的）。对于具有两个特征的数据，您不能使用 1x1 维的散点图。您一次只能绘制一个特征。

提示：这就是为什么在 Whosebug 上最好给出数据示例的原因。因为现在，很难告诉您哪里错了：您的数据是真正的二维数据，还是只是代码中的一个错误？

Answer 2

您的预测函数必须输入一个数组，该数组的列数（或预测变量）与拟合中使用的列数相同。此外，您应该在代码中使用拟合对象 result，而不是 model。使用示例数据集：

from sklearn.datasets import load_breast_cancer
import statsmodels.api as sm

dat = load_breast_cancer()
df = pd.DataFrame(dat.data,columns=dat.feature_names)
df['target'] = dat.target
X = df['mean radius']
y = df['target']

X_incl_const = sm.add_constant(X)
model = sm.Logit(y, X_incl_const)
results = model.fit()
results.summary()

身材很好。现在，如果我们只是进行预测，就会出现与您看到的相同的错误：

model.predict(X)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-180-2558e7096c7c> in <module>
----> 1 model.predict(X)
      2 
      3 

~/anaconda2/lib/python3.7/site-packages/statsmodels/discrete/discrete_model.py in predict(self, params, exog, linear)
    482             exog = self.exog
    483         if not linear:
--> 484             return self.cdf(np.dot(exog, params))
    485         else:
    486             return np.dot(exog, params)

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (569,2) and (569,) not aligned: 2 (dim 1) != 569 (dim 0)

我们加上常量截距，然后就可以了：

plt.scatter(X,results.predict(sm.add_constant(X)))

或者，如果您只绘制拟合值，请执行以下操作：

plt.scatter(X,results.predict())

如何使用 matplotlib 在 statsmodel 中绘制 Logit 的结果

How can I plot the results of Logit in statsmodel using matplotlib

python

data-visualization

matplotlib

statsmodels

data-science