python

Question

我在下面有一个数据框：

df = pd.DataFrame({'channels' : ['EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email','EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email_vod'],
                   'ID' : [1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006],
                   'INTR_COUNT' : [1,1,1,1,1,1,1,2,3,4,5,6,1,2,3,4,5,6],
                   'PERSONA' : ['A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C']})

我想要做的是定义一个函数，该函数将采用上述 df 并根据 'channels' 中的唯一类别创建新列。

ID	EMAIL	FAX	MAIL	PHONE	SMS	VISIT	Marketing	Approved_email	Persona
1001	1	0	0	1	0	1	0	0	A

其余 ID 以此类推

到目前为止我已经写了这个函数：

def channel_pivot(df: pd.DataFrame):
    #where df is the df stated above
    x = df
    #subsetting to pivot only on focus cols
    y = df[['channels', 'INTR', 'HCP']] 
    #pivot operation
    y = set_index('ID')
    y1 = y.pivot(columns='channels', values = sum('INTR')).apply(lambda x: pd.Series(x.dropna().values))
    df1 = y1.merge(x, left_index=True, right_on='ID')
    return df1

虽然我无法让求和函数在数据透视函数内部工作，但我如何才能对每个渠道的给定 ID 的交互计数求和？

Answer 1

您可以将 .groupby() 与 .agg() 一起使用：

x = (
    df.groupby(["ID", "channels"])
    .agg({"INTR_COUNT": "sum", "PERSONA": "first"})
    .set_index("PERSONA", append=True)
    .unstack(level=1)
    .droplevel(0, axis=1)
    .fillna(0)
    .astype(int)
    .reset_index()
)
x.columns.name = None
print(x)

打印：

	ID	PERSONA	Approved_Email	Approved_Email_vod	EMAIL	FAX	MAIL	Marketing	PHONE	Profiling	SMS	VISIT
0	1001	A	0	0	1	0	0	0	1	0	0	1
1	1002	B	0	0	0	1	0	2	0	2	0	0
2	1003	C	3	0	0	0	1	0	0	0	3	0
3	1004	A	0	0	4	0	0	0	1	0	0	4
4	1005	B	0	0	0	5	0	1	0	5	0	0
5	1006	C	0	6	0	0	6	0	0	0	1	0

python - 将行转列，并将新列值计算为另一列的总和

python - transpose rows to columns, and calculate new column value as sum of another column

pivot

pandas