statsmodel中预测值和拟合值的区别

Question

我有一个非常基本的问题，我无法找到真正的答案。

假设我有一个模型：

import statsmodels.formula.api as smf
model = smf.ols(....).fit()

model.fittedvalues 和 model.predict 有什么区别？

Answer 1

model.predict是一种预测值的方法，所以你可以给它提供一个看不见的数据集：

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,2),columns=['X','Y'])

model = smf.ols('Y ~ X',data=df).fit()

model.predict(exog=pd.DataFrame({'X':[1,2,3]}))

如果你不提供exog参数，它returns通过调用对象下存储的数据进行预测，你看下这个source code:

def predict(self, params, exog=None):
        """
        Return linear predicted values from a design matrix.

        Parameters
        ----------
        params : array_like
            Parameters of a linear model.
        exog : array_like, optional
            Design / exogenous data. Model exog is used if None.

        Returns
        -------
        array_like
            An array of fitted values.

        Notes
        -----
        If the model has not yet been fit, params is not optional.
        """
        # JP: this does not look correct for GLMAR
        # SS: it needs its own predict method

        if exog is None:
            exog = self.exog

        return np.dot(exog, params)

另一方面，model.fittedvalues 是一个属性，存储的是拟合值。由于上述原因，它将与 model.predict() 完全相同。

您也可以查看此类型的 methods。

Answer 2

调用 smf.ols(....).fit() 时，您将模型与数据相匹配。 IE。对于数据集中的每个数据点，模型都会尝试对其进行解释并为其计算一个值。在这一点上，模型只是试图解释你的历史数据，还没有预测任何东西。另请注意，fittedvalues 是模型的属性（或属性）。

model.predict()是模型实际预测未见值的一种方法。

statsmodel中预测值和拟合值的区别

Difference between predict and fittedvalue in statsmodel

python

statsmodels