将立方根变换和 StandardScaler 应用于 pandas 数据框中的某些特定列

Question

我确实有一个包含很多列的数据框。我想先应用 cbrt 转换，然后将 StandardScaler() 应用于每个月数据框中的某些特定列，但我收到了一些错误

df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
[30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
[23,78,95,52,60,76,68,92,34,76,34,12]})
df

我下面的代码是但不用担心月份

df['X1']=pd.Series(np.cbrt(df['X1'])).values

以下为但需考虑组月

  from sklearn.preprocessing import StandardScaler
  scaler = StandardScaler()
  df['X1_scale'] = scaler.group('Month').fit(df['X1'])

我想将这两个操作结合到一个添加列 X1_Scale 和 X2_Scale 的自动函数上，但是因为我有很多列，所以我想在前 2 个列上执行此操作（df.loc[:,2:3]) 一般。请帮忙。谢谢。

Answer 1

我们可以使用 np.cbrt 计算前两列的元素立方根，然后是 month 的 groupby 和使用 zscore 计算标准的转换每个样本每个月的得分。

from scipy.stats import zscore

c = df.columns[1:3]
df[c + '_Scale'] = np.cbrt(df[c]).groupby(df['month']).transform(zscore)

   month  X1   X2  X3  X1_Scale  X2_Scale
0      1  30   10  23  0.286075 -1.531934
1      1  42   76  78  1.220298  0.705876
2      1  25  100  95 -0.178042  1.142135
3      1  32   23  52  0.457241 -0.790689
4      1  12   65  60 -1.785572  0.474613
5      2  10   94  76  0.004353  1.026875
6      2   4   67  68 -1.208026  0.093139
7      2   6   24  92 -0.716861 -2.171608
8      2   5   67  34 -0.945947  0.093139
9      2  10   54  76  0.004353 -0.449041
10     2  24   87  34  1.565310  0.804088
11     2  21   81  12  1.296817  0.603408

将立方根变换和 StandardScaler 应用于 pandas 数据框中的某些特定列

Apply cubic root transformation and StandardScaler to some specific columns in pandas dataframe

python

multiple-columns

pandas