python - 将行转列,并将新列值计算为另一列的总和

python - transpose rows to columns, and calculate new column value as sum of another column

我在下面有一个数据框:

df = pd.DataFrame({'channels' : ['EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email','EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email_vod'],
                   'ID' : [1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006],
                   'INTR_COUNT' : [1,1,1,1,1,1,1,2,3,4,5,6,1,2,3,4,5,6],
                   'PERSONA' : ['A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C']})

我想要做的是定义一个函数,该函数将采用上述 df 并根据 'channels' 中的唯一类别创建新列。

ID EMAIL FAX MAIL PHONE SMS VISIT Marketing Approved_email Persona
1001 1 0 0 1 0 1 0 0 A

其余 ID 以此类推

到目前为止我已经写了这个函数:

def channel_pivot(df: pd.DataFrame):
    #where df is the df stated above
    x = df
    #subsetting to pivot only on focus cols
    y = df[['channels', 'INTR', 'HCP']] 
    #pivot operation
    y = set_index('ID')
    y1 = y.pivot(columns='channels', values = sum('INTR')).apply(lambda x: pd.Series(x.dropna().values))
    df1 = y1.merge(x, left_index=True, right_on='ID')
    return df1

虽然我无法让求和函数在数据透视函数内部工作,但我如何才能对每个渠道的给定 ID 的交互计数求和?

您可以将 .groupby().agg() 一起使用:

x = (
    df.groupby(["ID", "channels"])
    .agg({"INTR_COUNT": "sum", "PERSONA": "first"})
    .set_index("PERSONA", append=True)
    .unstack(level=1)
    .droplevel(0, axis=1)
    .fillna(0)
    .astype(int)
    .reset_index()
)
x.columns.name = None
print(x)

打印:

ID PERSONA Approved_Email Approved_Email_vod EMAIL FAX MAIL Marketing PHONE Profiling SMS VISIT
0 1001 A 0 0 1 0 0 0 1 0 0 1
1 1002 B 0 0 0 1 0 2 0 2 0 0
2 1003 C 3 0 0 0 1 0 0 0 3 0
3 1004 A 0 0 4 0 0 0 1 0 0 4
4 1005 B 0 0 0 5 0 1 0 5 0 0
5 1006 C 0 6 0 0 6 0 0 0 1 0