Python 中的 Groupby 和 ffill 指定列
Groupby and ffill specified columns in Python
我想按 id_
、Code
、Timestamp
对值进行排序(因为时间顺序很重要),然后使用 id_
和 [=d1
分组 Code
,然后使用 ffill
for NaN
for each group, only on columns V1
and V2
only , while保持其他列不变,return 完整的 table.
d1
:
Type_x id_ Timestamp V1 Code Type_y V2
0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0
1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1
2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1
3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN NaN
4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN NaN
尝试过:
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code'])['V1', 'V2'].ffill()
只 return 编辑了两列:
V1 V2
69659 21.5 NaN
300886 21.5 1.0
300887 21.5 0.0
70086 23.0 0.0
300955 23.0 1.0
我该如何正确操作?
您需要退回什么?
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill()
Type_x Timestamp V1 Type_y V2
1 abcd 39-38-30-34 23:46:23.870 44.5 NaN 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 NaN 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 NaN 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 NaN 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 NaN 0.0
或
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill().dropna(1)
print(d2)
Type_x Timestamp V1 V2
1 abcd 39-38-30-34 23:46:23.870 44.5 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 0.0
如果您的实际数据框中除了您想要 groupby
的列和您想要 [=13= 的列之外还有其他列,您可以使用 transform
并逐列进行]:
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp'])
d2['V1'] = d2.groupby(['id_', 'Code'])['V1'].transform(lambda x: x.ffill())
d2['V2'] = d2.groupby(['id_', 'Code'])['V2'].transform(lambda x: x.ffill())
d2
Out[1]:
Type_x id_ Timestamp V1 Code Type_y V2
1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1.0
2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1.0
3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN 1.0
0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0.0
4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN 0.0
我想按 id_
、Code
、Timestamp
对值进行排序(因为时间顺序很重要),然后使用 id_
和 [=d1
分组 Code
,然后使用 ffill
for NaN
for each group, only on columns V1
and V2
only , while保持其他列不变,return 完整的 table.
d1
:
Type_x id_ Timestamp V1 Code Type_y V2
0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0
1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1
2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1
3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN NaN
4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN NaN
尝试过:
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code'])['V1', 'V2'].ffill()
只 return 编辑了两列:
V1 V2
69659 21.5 NaN
300886 21.5 1.0
300887 21.5 0.0
70086 23.0 0.0
300955 23.0 1.0
我该如何正确操作?
您需要退回什么?
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill()
Type_x Timestamp V1 Type_y V2
1 abcd 39-38-30-34 23:46:23.870 44.5 NaN 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 NaN 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 NaN 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 NaN 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 NaN 0.0
或
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill().dropna(1)
print(d2)
Type_x Timestamp V1 V2
1 abcd 39-38-30-34 23:46:23.870 44.5 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 0.0
如果您的实际数据框中除了您想要 groupby
的列和您想要 [=13= 的列之外还有其他列,您可以使用 transform
并逐列进行]:
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp'])
d2['V1'] = d2.groupby(['id_', 'Code'])['V1'].transform(lambda x: x.ffill())
d2['V2'] = d2.groupby(['id_', 'Code'])['V2'].transform(lambda x: x.ffill())
d2
Out[1]:
Type_x id_ Timestamp V1 Code Type_y V2
1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1.0
2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1.0
3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN 1.0
0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0.0
4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN 0.0