sklearn 逻辑回归中如何选择初始偏差值?

how initial bias value is chosen in sklearn logistic regression?

训练逻辑回归时,它会经历一个迭代过程,在每个过程中,它都会计算 x 变量的权重和偏差值,以最小化损失函数。

来自sklearn官方代码class LogisticRegression | linear model in scikit-learn,逻辑回归class'拟合方法如下

def fit(self, X, y, sample_weight=None):
    """
    Fit the model according to the given training data.
    Parameters
    ----------
    X : {array-like, sparse matrix} of shape (n_samples, n_features)
        Training vector, where n_samples is the number of samples and
        n_features is the number of features.
    y : array-like of shape (n_samples,)
        Target vector relative to X.
    sample_weight : array-like of shape (n_samples,) default=None
        Array of weights that are assigned to individual samples.
        If not provided, then each sample is given unit weight.
        .. versionadded:: 0.17
           *sample_weight* support to LogisticRegression.

我猜测 sample_weight = weight 个 x 变量如果没有给出则设置为 1,偏差值是否也为 1?

您听起来有些困惑,也许是在寻找与神经网络的权重和偏差的类比。但这种情况并非如此; sample_weight这里与神经网络的权重无关,即使作为一个概念。

sample_weight 是这样的,如果(业务)问题需要,我们可以给予一些样本比其他样本更多的权重(即更多 重要性 ) ,而这种重要性直接影响损失。它有时用于数据不平衡的情况;引用 documentationTips on practical use 部分(它是关于决策树的,但原理是一样的):

Class balancing can be done by sampling an equal number of samples from each class, or preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value.

来自 Cross Validated 的相关主题:

Sample weights are used to increase the importance of a single data-point (let's say, some of your data is more trustworthy, then they receive a higher weight). So: The sample weights exist to change the importance of data-points

您可以在 SO 线程 中看到更改某些样本的权重如何改变最终模型的实际演示(同样,它是关于决策树的,但基本原理是相同的)。

澄清之后,现在应该很明显这里没有任何类型的“偏差”参数的余地。实际上,您问题中的介绍性段落是错误的:逻辑回归 计算此类权重和偏差;它 returns coefficientsintercept 项(有时本身称为 bias),这些系数和截距与 sample_weight.

无关