sklearn 逻辑回归中如何选择初始偏差值？

Question

训练逻辑回归时，它会经历一个迭代过程，在每个过程中，它都会计算 x 变量的权重和偏差值，以最小化损失函数。

来自sklearn官方代码class LogisticRegression | linear model in scikit-learn，逻辑回归class'拟合方法如下

def fit(self, X, y, sample_weight=None):
    """
    Fit the model according to the given training data.
    Parameters
    ----------
    X : {array-like, sparse matrix} of shape (n_samples, n_features)
        Training vector, where n_samples is the number of samples and
        n_features is the number of features.
    y : array-like of shape (n_samples,)
        Target vector relative to X.
    sample_weight : array-like of shape (n_samples,) default=None
        Array of weights that are assigned to individual samples.
        If not provided, then each sample is given unit weight.
        .. versionadded:: 0.17
           *sample_weight* support to LogisticRegression.

我猜测 sample_weight = weight 个 x 变量如果没有给出则设置为 1，偏差值是否也为 1？

Answer 1

您听起来有些困惑，也许是在寻找与神经网络的权重和偏差的类比。但这种情况并非如此; sample_weight这里与神经网络的权重无关，即使作为一个概念。

sample_weight 是这样的，如果（业务）问题需要，我们可以给予一些样本比其他样本更多的权重（即更多 重要性 ），而这种重要性直接影响损失。它有时用于数据不平衡的情况；引用 documentation 的 Tips on practical use 部分（它是关于决策树的，但原理是一样的）：

Class balancing can be done by sampling an equal number of samples from each class, or preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value.

来自 Cross Validated 的相关主题：

Sample weights are used to increase the importance of a single data-point (let's say, some of your data is more trustworthy, then they receive a higher weight). So: The sample weights exist to change the importance of data-points

您可以在 SO 线程中看到更改某些样本的权重如何改变最终模型的实际演示（同样，它是关于决策树的，但基本原理是相同的）。

澄清之后，现在应该很明显这里没有任何类型的“偏差”参数的余地。实际上，您问题中的介绍性段落是错误的：逻辑回归不计算此类权重和偏差；它 returns coefficients 和 intercept 项（有时本身称为 bias），这些系数和截距与 sample_weight.

无关

sklearn 逻辑回归中如何选择初始偏差值？

how initial bias value is chosen in sklearn logistic regression?

python

scikit-learn

logistic-regression