Standard scaler 和 MinMaxScaler 之间的区别
Difference between Standard scaler and MinMaxScaler
MinMaxScaler 和标准缩放器有什么区别。
MMS= MinMaxScaler(feature_range = (0, 1))
(在程序 1 中使用)
sc = StandardScaler()
(在另一个程序中,他们使用标准缩放器而不是 minMaxScaler)
来自 ScikitLearn site:
StandardScaler
removes the mean and scales the data to unit variance.
However, the outliers have an influence when computing the empirical
mean and standard deviation which shrink the range of the feature
values as shown in the left figure below. Note in particular that
because the outliers on each feature have different magnitudes, the
spread of the transformed data on each feature is very different: most
of the data lie in the [-2, 4] range for the transformed median income
feature while the same data is squeezed in the smaller [-0.2, 0.2]
range for the transformed number of households.
StandardScaler therefore cannot guarantee balanced feature scales in
the presence of outliers.
MinMaxScaler
rescales the data set such that all feature values are in
the range [0, 1] as shown in the right panel below. However, this
scaling compress all inliers in the narrow range [0, 0.005] for the
transformed number of households.
MinMaxScaler(feature_range = (0, 1))
将在 [0,1] 范围内按比例变换列中的每个值。将其用作转换特征的第一个缩放器选择,因为它将保留数据集的形状(无失真)。
StandardScaler()
会将列中的每个值转换为均值 0 和标准差 1 左右的范围,即每个值将通过减去均值并除以标准差来归一化。如果您知道数据分布是正常的,请使用 StandardScaler。
如果有异常值,使用RobustScaler()
。或者,您可以删除异常值并使用上述 2 个缩放器中的任何一个(选择取决于数据是否呈正态分布)
补充说明:如果在train_test_split之前使用scaler,会发生数据泄露。在 train_test_split
之后使用定标器
当数值输入变量被缩放到标准范围时,许多机器学习算法的性能会更好。
缩放数据意味着它有助于在特定范围内规范化数据。
当使用 MinMaxScaler 时,它也称为归一化,它转换范围在(0 到 1)之间的所有值
公式为 x = [(value - min)/(Max- Min)]
StandardScaler 属于标准化,其值范围在(-3 到 +3)之间
公式为 z = [(x - x.mean)/Std_deviation]
MinMaxScaler 和标准缩放器有什么区别。
MMS= MinMaxScaler(feature_range = (0, 1))
(在程序 1 中使用)
sc = StandardScaler()
(在另一个程序中,他们使用标准缩放器而不是 minMaxScaler)
来自 ScikitLearn site:
StandardScaler
removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean and standard deviation which shrink the range of the feature values as shown in the left figure below. Note in particular that because the outliers on each feature have different magnitudes, the spread of the transformed data on each feature is very different: most of the data lie in the [-2, 4] range for the transformed median income feature while the same data is squeezed in the smaller [-0.2, 0.2] range for the transformed number of households.StandardScaler therefore cannot guarantee balanced feature scales in the presence of outliers.
MinMaxScaler
rescales the data set such that all feature values are in the range [0, 1] as shown in the right panel below. However, this scaling compress all inliers in the narrow range [0, 0.005] for the transformed number of households.
MinMaxScaler(feature_range = (0, 1))
将在 [0,1] 范围内按比例变换列中的每个值。将其用作转换特征的第一个缩放器选择,因为它将保留数据集的形状(无失真)。
StandardScaler()
会将列中的每个值转换为均值 0 和标准差 1 左右的范围,即每个值将通过减去均值并除以标准差来归一化。如果您知道数据分布是正常的,请使用 StandardScaler。
如果有异常值,使用RobustScaler()
。或者,您可以删除异常值并使用上述 2 个缩放器中的任何一个(选择取决于数据是否呈正态分布)
补充说明:如果在train_test_split之前使用scaler,会发生数据泄露。在 train_test_split
之后使用定标器当数值输入变量被缩放到标准范围时,许多机器学习算法的性能会更好。 缩放数据意味着它有助于在特定范围内规范化数据。
当使用 MinMaxScaler 时,它也称为归一化,它转换范围在(0 到 1)之间的所有值 公式为 x = [(value - min)/(Max- Min)]
StandardScaler 属于标准化,其值范围在(-3 到 +3)之间 公式为 z = [(x - x.mean)/Std_deviation]