python - 将行转列,并将新列值计算为另一列的总和
python - transpose rows to columns, and calculate new column value as sum of another column
我在下面有一个数据框:
df = pd.DataFrame({'channels' : ['EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email','EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email_vod'],
'ID' : [1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006],
'INTR_COUNT' : [1,1,1,1,1,1,1,2,3,4,5,6,1,2,3,4,5,6],
'PERSONA' : ['A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C']})
我想要做的是定义一个函数,该函数将采用上述 df 并根据 'channels' 中的唯一类别创建新列。
ID
EMAIL
FAX
MAIL
PHONE
SMS
VISIT
Marketing
Approved_email
Persona
1001
1
0
0
1
0
1
0
0
A
其余 ID 以此类推
到目前为止我已经写了这个函数:
def channel_pivot(df: pd.DataFrame):
#where df is the df stated above
x = df
#subsetting to pivot only on focus cols
y = df[['channels', 'INTR', 'HCP']]
#pivot operation
y = set_index('ID')
y1 = y.pivot(columns='channels', values = sum('INTR')).apply(lambda x: pd.Series(x.dropna().values))
df1 = y1.merge(x, left_index=True, right_on='ID')
return df1
虽然我无法让求和函数在数据透视函数内部工作,但我如何才能对每个渠道的给定 ID 的交互计数求和?
您可以将 .groupby()
与 .agg()
一起使用:
x = (
df.groupby(["ID", "channels"])
.agg({"INTR_COUNT": "sum", "PERSONA": "first"})
.set_index("PERSONA", append=True)
.unstack(level=1)
.droplevel(0, axis=1)
.fillna(0)
.astype(int)
.reset_index()
)
x.columns.name = None
print(x)
打印:
ID
PERSONA
Approved_Email
Approved_Email_vod
EMAIL
FAX
MAIL
Marketing
PHONE
Profiling
SMS
VISIT
0
1001
A
0
0
1
0
0
0
1
0
0
1
1
1002
B
0
0
0
1
0
2
0
2
0
0
2
1003
C
3
0
0
0
1
0
0
0
3
0
3
1004
A
0
0
4
0
0
0
1
0
0
4
4
1005
B
0
0
0
5
0
1
0
5
0
0
5
1006
C
0
6
0
0
6
0
0
0
1
0
我在下面有一个数据框:
df = pd.DataFrame({'channels' : ['EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email','EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email_vod'],
'ID' : [1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006],
'INTR_COUNT' : [1,1,1,1,1,1,1,2,3,4,5,6,1,2,3,4,5,6],
'PERSONA' : ['A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C']})
我想要做的是定义一个函数,该函数将采用上述 df 并根据 'channels' 中的唯一类别创建新列。
ID | FAX | PHONE | SMS | VISIT | Marketing | Approved_email | Persona | ||
---|---|---|---|---|---|---|---|---|---|
1001 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | A |
其余 ID 以此类推
到目前为止我已经写了这个函数:
def channel_pivot(df: pd.DataFrame):
#where df is the df stated above
x = df
#subsetting to pivot only on focus cols
y = df[['channels', 'INTR', 'HCP']]
#pivot operation
y = set_index('ID')
y1 = y.pivot(columns='channels', values = sum('INTR')).apply(lambda x: pd.Series(x.dropna().values))
df1 = y1.merge(x, left_index=True, right_on='ID')
return df1
虽然我无法让求和函数在数据透视函数内部工作,但我如何才能对每个渠道的给定 ID 的交互计数求和?
您可以将 .groupby()
与 .agg()
一起使用:
x = (
df.groupby(["ID", "channels"])
.agg({"INTR_COUNT": "sum", "PERSONA": "first"})
.set_index("PERSONA", append=True)
.unstack(level=1)
.droplevel(0, axis=1)
.fillna(0)
.astype(int)
.reset_index()
)
x.columns.name = None
print(x)
打印:
ID | PERSONA | Approved_Email | Approved_Email_vod | FAX | Marketing | PHONE | Profiling | SMS | VISIT | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1001 | A | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
1 | 1002 | B | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 2 | 0 | 0 |
2 | 1003 | C | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 |
3 | 1004 | A | 0 | 0 | 4 | 0 | 0 | 0 | 1 | 0 | 0 | 4 |
4 | 1005 | B | 0 | 0 | 0 | 5 | 0 | 1 | 0 | 5 | 0 | 0 |
5 | 1006 | C | 0 | 6 | 0 | 0 | 6 | 0 | 0 | 0 | 1 | 0 |