通过在另一个数据框中为标记数据添加列来创建
Create by adding columns for labeled data in another data frame
假设我的数据框
df1
name startdate label
A 14-05-07 1
A 14-05-21 2
A 14-06-01 3
A 14-06-15 4
B 18-10-05 1
B 18-10-19 2
B 18-10-31 3
C 19-03-04 1
C 19-03-17 2
C 19-03-31 3
D 16-04-01 1
D 16-04-21 2
..
df2
name date
A 14-04-05
A 14-05-08
A 15-01-05
B 18-07-05
B 18-10-02
C 19-01-03
C 19-03-30
D 16-04-01
..
有两个数据帧df1和df2有相同的公共密钥'name'
我想将标记为 df1 的 'startdate' 添加到 df2 的列中。
输出我想要的
name date label1 label2 label3 label4 ..
A 14-04-05 14-05-07 14-05-21 14-06-01 14-06-15
A 14-05-08 14-05-07 14-05-21 14-06-01 14-06-15
A 15-01-05 14-05-07 14-05-21 14-06-01 14-06-15
B 18-07-05 18-10-05 18-10-19 18-10-31 nan
B 18-10-02 18-10-05 18-10-19 18-10-31 nan
C 19-01-03 19-03-04 19-03-17 19-03-31 nan
C 19-03-30 19-03-04 19-03-17 19-03-31 nan
D 16-04-01 16-04-01 16-04-21 nan nan
有办法吗?
感谢阅读。
使用DataFrame.join
with DataFrame.pivot
and DataFrame.add_prefix
:
df = df2.join(df1.pivot('name','label','startdate').add_prefix('label'), on='name')
print (df)
name date label1 label2 label3 label4
0 A 14-04-05 14-05-07 14-05-21 14-06-01 14-06-15
1 A 14-05-08 14-05-07 14-05-21 14-06-01 14-06-15
2 A 15-01-05 14-05-07 14-05-21 14-06-01 14-06-15
3 B 18-07-05 18-10-05 18-10-19 18-10-31 NaN
4 B 18-10-02 18-10-05 18-10-19 18-10-31 NaN
5 C 19-01-03 19-03-04 19-03-17 19-03-31 NaN
6 C 19-03-30 19-03-04 19-03-17 19-03-31 NaN
7 D 16-04-01 16-04-01 16-04-21 NaN NaN
假设我的数据框
df1
name startdate label
A 14-05-07 1
A 14-05-21 2
A 14-06-01 3
A 14-06-15 4
B 18-10-05 1
B 18-10-19 2
B 18-10-31 3
C 19-03-04 1
C 19-03-17 2
C 19-03-31 3
D 16-04-01 1
D 16-04-21 2
..
df2
name date
A 14-04-05
A 14-05-08
A 15-01-05
B 18-07-05
B 18-10-02
C 19-01-03
C 19-03-30
D 16-04-01
..
有两个数据帧df1和df2有相同的公共密钥'name'
我想将标记为 df1 的 'startdate' 添加到 df2 的列中。
输出我想要的
name date label1 label2 label3 label4 ..
A 14-04-05 14-05-07 14-05-21 14-06-01 14-06-15
A 14-05-08 14-05-07 14-05-21 14-06-01 14-06-15
A 15-01-05 14-05-07 14-05-21 14-06-01 14-06-15
B 18-07-05 18-10-05 18-10-19 18-10-31 nan
B 18-10-02 18-10-05 18-10-19 18-10-31 nan
C 19-01-03 19-03-04 19-03-17 19-03-31 nan
C 19-03-30 19-03-04 19-03-17 19-03-31 nan
D 16-04-01 16-04-01 16-04-21 nan nan
有办法吗? 感谢阅读。
使用DataFrame.join
with DataFrame.pivot
and DataFrame.add_prefix
:
df = df2.join(df1.pivot('name','label','startdate').add_prefix('label'), on='name')
print (df)
name date label1 label2 label3 label4
0 A 14-04-05 14-05-07 14-05-21 14-06-01 14-06-15
1 A 14-05-08 14-05-07 14-05-21 14-06-01 14-06-15
2 A 15-01-05 14-05-07 14-05-21 14-06-01 14-06-15
3 B 18-07-05 18-10-05 18-10-19 18-10-31 NaN
4 B 18-10-02 18-10-05 18-10-19 18-10-31 NaN
5 C 19-01-03 19-03-04 19-03-17 19-03-31 NaN
6 C 19-03-30 19-03-04 19-03-17 19-03-31 NaN
7 D 16-04-01 16-04-01 16-04-21 NaN NaN