使用 GroupBy 合并组内的两个数据框
Merge two dataframes within a group with GroupBy
我有两个数据框,需要根据日期合并它们,但应该为每个组 (participant_id
) 分别进行合并。
df1:
response_date summary epis_mark participant_id
0 2012-01-04 0.0 False 13
1 2012-01-11 0.0 False 13
2 2012-01-19 0.0 False 13
3 2012-01-29 0.0 False 13
4 2012-02-02 0.0 False 13
0 2012-01-02 8.0 True 14
1 2012-01-10 5.0 False 14
2 2012-01-18 2.0 False 14
3 2012-01-24 1.0 False 14
4 2012-01-31 2.0 False 14
0 2012-01-07 4.0 False 17
1 2012-01-11 NaN False 17
2 2012-01-18 4.0 False 17
3 2012-01-25 NaN False 17
4 2012-02-01 NaN False 17
df2:
response_date summary epis_mark participant_id
0 2012-01-04 17.0 True 13
1 2012-01-11 18.0 True 13
2 2012-01-19 16.0 True 13
3 2012-01-29 15.0 True 13
4 2012-02-02 15.0 True 13
0 2012-01-02 12.0 True 14
1 2012-01-10 8.0 True 14
2 2012-01-18 21.0 True 14
3 2012-01-24 19.0 True 14
4 2012-01-31 20.0 True 14
0 2012-01-04 NaN False 17
1 2012-01-11 NaN False 17
2 2012-01-18 NaN False 17
3 2012-01-25 NaN False 17
4 2012-02-01 NaN False 17
我需要获取一个数据帧 (wide
),其中每个 participant_id
在 response date
上独立完成合并。类似于:
>> pd.merge(df1[df1.participant_id == i], df2[df2.participant_id == i], on='response_date', how='outer')
但没有循环 i
和使用 groupby。
在 response_date
和 participant_id
上合并:
In [75]: pd.merge(df1, df2, on=['response_date', 'participant_id'], how='outer')
Out[75]:
response_date summary_x epis_mark_x participant_id summary_y epis_mark_y
0 2012-01-04 0.0 False 13 17.0 True
1 2012-01-11 0.0 False 13 18.0 True
2 2012-01-19 0.0 False 13 16.0 True
3 2012-01-29 0.0 False 13 15.0 True
4 2012-02-02 0.0 False 13 15.0 True
5 2012-01-02 8.0 True 14 12.0 True
6 2012-01-10 5.0 False 14 8.0 True
7 2012-01-18 2.0 False 14 21.0 True
8 2012-01-24 1.0 False 14 19.0 True
9 2012-01-31 2.0 False 14 20.0 True
10 2012-01-07 4.0 False 17 NaN NaN
11 2012-01-11 NaN False 17 NaN False
12 2012-01-18 4.0 False 17 NaN False
13 2012-01-25 NaN False 17 NaN False
14 2012-02-01 NaN False 17 NaN False
15 2012-01-04 NaN NaN 17 NaN False
我不确定我是否理解正确。
您可以尝试以下方法:
pd.merge(df1, df2, on=['response date', 'participant_id'], how='outer')
我有两个数据框,需要根据日期合并它们,但应该为每个组 (participant_id
) 分别进行合并。
df1:
response_date summary epis_mark participant_id
0 2012-01-04 0.0 False 13
1 2012-01-11 0.0 False 13
2 2012-01-19 0.0 False 13
3 2012-01-29 0.0 False 13
4 2012-02-02 0.0 False 13
0 2012-01-02 8.0 True 14
1 2012-01-10 5.0 False 14
2 2012-01-18 2.0 False 14
3 2012-01-24 1.0 False 14
4 2012-01-31 2.0 False 14
0 2012-01-07 4.0 False 17
1 2012-01-11 NaN False 17
2 2012-01-18 4.0 False 17
3 2012-01-25 NaN False 17
4 2012-02-01 NaN False 17
df2:
response_date summary epis_mark participant_id
0 2012-01-04 17.0 True 13
1 2012-01-11 18.0 True 13
2 2012-01-19 16.0 True 13
3 2012-01-29 15.0 True 13
4 2012-02-02 15.0 True 13
0 2012-01-02 12.0 True 14
1 2012-01-10 8.0 True 14
2 2012-01-18 21.0 True 14
3 2012-01-24 19.0 True 14
4 2012-01-31 20.0 True 14
0 2012-01-04 NaN False 17
1 2012-01-11 NaN False 17
2 2012-01-18 NaN False 17
3 2012-01-25 NaN False 17
4 2012-02-01 NaN False 17
我需要获取一个数据帧 (wide
),其中每个 participant_id
在 response date
上独立完成合并。类似于:
>> pd.merge(df1[df1.participant_id == i], df2[df2.participant_id == i], on='response_date', how='outer')
但没有循环 i
和使用 groupby。
在 response_date
和 participant_id
上合并:
In [75]: pd.merge(df1, df2, on=['response_date', 'participant_id'], how='outer')
Out[75]:
response_date summary_x epis_mark_x participant_id summary_y epis_mark_y
0 2012-01-04 0.0 False 13 17.0 True
1 2012-01-11 0.0 False 13 18.0 True
2 2012-01-19 0.0 False 13 16.0 True
3 2012-01-29 0.0 False 13 15.0 True
4 2012-02-02 0.0 False 13 15.0 True
5 2012-01-02 8.0 True 14 12.0 True
6 2012-01-10 5.0 False 14 8.0 True
7 2012-01-18 2.0 False 14 21.0 True
8 2012-01-24 1.0 False 14 19.0 True
9 2012-01-31 2.0 False 14 20.0 True
10 2012-01-07 4.0 False 17 NaN NaN
11 2012-01-11 NaN False 17 NaN False
12 2012-01-18 4.0 False 17 NaN False
13 2012-01-25 NaN False 17 NaN False
14 2012-02-01 NaN False 17 NaN False
15 2012-01-04 NaN NaN 17 NaN False
我不确定我是否理解正确。
您可以尝试以下方法:
pd.merge(df1, df2, on=['response date', 'participant_id'], how='outer')