根据 python 中字典中的值添加新标签列

Question

我是新手 python 我有一个包含不同组和标题的数据框。现在我想为每个组添加一个基于中位数的列 (grp_pred)，但我不确定如何完成此操作。这就是我的 df 的样子

df
    title   M18-34       V18-34       18-34      25-54       V25-54      M25-54       18-54      V18-54      M18-54
    HEPEN   0.102488    0.200995    0.312438    0.667662    0.334328    0.321393    0.739303    0.380100    0.344279
    MATED   0.151090    0.208723    0.361371    0.733645    0.428349    0.280374    0.880062    0.503115    0.352025
    PEERT   0.098296    0.157929    0.262779    0.624509    0.325033    0.283093    0.717562    0.384010    0.316514
    RZOEK   0.143695    0.336882    0.503607    0.657216    0.414844    0.214674    0.838560    0.548663    0.255410
    ERKEN   0.204918    0.409836    0.631148    0.467213    0.286885    0.163934    0.877049    0.557377    0.303279

median_dict = 

{'18-34': 0.395992275,
 '18-54': 0.79392129200000006,
 '25-54': 0.64958055850000007,
 'M18-34': 0.1171878905,
 'M18-54': 0.27340067349999997,
 'M25-54': 0.23422200100000001,
 'V18-34': 0.2283782815,
 'V18-54': 0.4497918595,
 'V25-54': 0.37749252799999999}

需要输出所以基本上我想比较每个标题存储在字典中的中值，然后如果该值等于该特定中值，则分配给某个组。 e.g say if the median is 0.395992275 then pred_grp is 18-24 and so forth

 df_out
        title   M18-34       V18-34       18-34      25-54       V25-54      M25-54        18-54      V18-54      M18-54  pred_grp
        HEPEN   0.102488    0.200995    0.312438    0.667662    0.334328    0.321393    0.739303    0.380100    0.344279 18-54
        MATED   0.151090    0.208723    0.361371    0.733645    0.428349    0.280374    0.880062    0.503115    0.352025
        PEERT   0.098296    0.157929    0.262779    0.624509    0.325033    0.283093    0.717562    0.384010    0.316514
        RZOEK   0.143695    0.336882    0.503607    0.657216    0.414844    0.214674    0.838560    0.548663    0.255410
        ERKEN   0.204918    0.409836    0.631148    0.467213    0.286885    0.163934    0.877049    0.557377    0.303279

非常感谢您的帮助！！

提前致谢

Answer 1

根据我从评论中了解到的内容，您可以尝试从字典中创建一个与输入数据框具有相同结构的 df，然后获取差异最小的列：

u = df.set_index("title")
v = pd.DataFrame.from_dict(median_dict,orient='index').T.reindex(u.columns,axis=1)
df['pred_group'] = (u - v.to_numpy()).idxmin(axis=1).to_numpy()

print(df)

   title    M18-34    V18-34     18-34     25-54    V25-54    M25-54  \
0  HEPEN  0.102488  0.200995  0.312438  0.667662  0.334328  0.321393   
1  MATED  0.151090  0.208723  0.361371  0.733645  0.428349  0.280374   
2  PEERT  0.098296  0.157929  0.262779  0.624509  0.325033  0.283093   
3  RZOEK  0.143695  0.336882  0.503607  0.657216  0.414844  0.214674   
4  ERKEN  0.204918  0.409836  0.631148  0.467213  0.286885  0.163934   

      18-54    V18-54    M18-54 pred_group  
0  0.739303  0.380100  0.344279      18-34  
1  0.880062  0.503115  0.352025      18-34  
2  0.717562  0.384010  0.316514      18-34  
3  0.838560  0.548663  0.255410     M25-54  
4  0.877049  0.557377  0.303279      25-54

根据 python 中字典中的值添加新标签列

Add new label column based on values in dictionary in python

python

pandas

data-wrangling