根据 python 中字典中的值添加新标签列
Add new label column based on values in dictionary in python
我是新手 python 我有一个包含不同组和标题的数据框。现在我想为每个组添加一个基于中位数的列 (grp_pred),但我不确定如何完成此操作。
这就是我的 df 的样子
df
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 18-54 V18-54 M18-54
HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393 0.739303 0.380100 0.344279
MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374 0.880062 0.503115 0.352025
PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093 0.717562 0.384010 0.316514
RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674 0.838560 0.548663 0.255410
ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934 0.877049 0.557377 0.303279
median_dict =
{'18-34': 0.395992275,
'18-54': 0.79392129200000006,
'25-54': 0.64958055850000007,
'M18-34': 0.1171878905,
'M18-54': 0.27340067349999997,
'M25-54': 0.23422200100000001,
'V18-34': 0.2283782815,
'V18-54': 0.4497918595,
'V25-54': 0.37749252799999999}
需要输出
所以基本上我想比较每个标题存储在字典中的中值,然后如果该值等于该特定中值,则分配给某个组。 e.g say if the median is 0.395992275 then pred_grp is 18-24 and so forth
df_out
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 18-54 V18-54 M18-54 pred_grp
HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393 0.739303 0.380100 0.344279 18-54
MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374 0.880062 0.503115 0.352025
PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093 0.717562 0.384010 0.316514
RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674 0.838560 0.548663 0.255410
ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934 0.877049 0.557377 0.303279
非常感谢您的帮助!!
提前致谢
根据我从评论中了解到的内容,您可以尝试从字典中创建一个与输入数据框具有相同结构的 df,然后获取差异最小的列:
u = df.set_index("title")
v = pd.DataFrame.from_dict(median_dict,orient='index').T.reindex(u.columns,axis=1)
df['pred_group'] = (u - v.to_numpy()).idxmin(axis=1).to_numpy()
print(df)
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 \
0 HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393
1 MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374
2 PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093
3 RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674
4 ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934
18-54 V18-54 M18-54 pred_group
0 0.739303 0.380100 0.344279 18-34
1 0.880062 0.503115 0.352025 18-34
2 0.717562 0.384010 0.316514 18-34
3 0.838560 0.548663 0.255410 M25-54
4 0.877049 0.557377 0.303279 25-54
我是新手 python 我有一个包含不同组和标题的数据框。现在我想为每个组添加一个基于中位数的列 (grp_pred),但我不确定如何完成此操作。 这就是我的 df 的样子
df
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 18-54 V18-54 M18-54
HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393 0.739303 0.380100 0.344279
MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374 0.880062 0.503115 0.352025
PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093 0.717562 0.384010 0.316514
RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674 0.838560 0.548663 0.255410
ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934 0.877049 0.557377 0.303279
median_dict =
{'18-34': 0.395992275,
'18-54': 0.79392129200000006,
'25-54': 0.64958055850000007,
'M18-34': 0.1171878905,
'M18-54': 0.27340067349999997,
'M25-54': 0.23422200100000001,
'V18-34': 0.2283782815,
'V18-54': 0.4497918595,
'V25-54': 0.37749252799999999}
需要输出
所以基本上我想比较每个标题存储在字典中的中值,然后如果该值等于该特定中值,则分配给某个组。 e.g say if the median is 0.395992275 then pred_grp is 18-24 and so forth
df_out
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 18-54 V18-54 M18-54 pred_grp
HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393 0.739303 0.380100 0.344279 18-54
MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374 0.880062 0.503115 0.352025
PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093 0.717562 0.384010 0.316514
RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674 0.838560 0.548663 0.255410
ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934 0.877049 0.557377 0.303279
非常感谢您的帮助!!
提前致谢
根据我从评论中了解到的内容,您可以尝试从字典中创建一个与输入数据框具有相同结构的 df,然后获取差异最小的列:
u = df.set_index("title")
v = pd.DataFrame.from_dict(median_dict,orient='index').T.reindex(u.columns,axis=1)
df['pred_group'] = (u - v.to_numpy()).idxmin(axis=1).to_numpy()
print(df)
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 \
0 HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393
1 MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374
2 PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093
3 RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674
4 ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934
18-54 V18-54 M18-54 pred_group
0 0.739303 0.380100 0.344279 18-34
1 0.880062 0.503115 0.352025 18-34
2 0.717562 0.384010 0.316514 18-34
3 0.838560 0.548663 0.255410 M25-54
4 0.877049 0.557377 0.303279 25-54