Python Pandas: 如何对组内的移位列执行操作?
Python Pandas: How do I perform an operation on a shifted column within a group?
我有一个数据框,我希望在其中对工作时间组内的移位列执行时差操作。例如,请参阅以下数据:
driver_id veh starttime stoptime
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38
3 kg214 10012 2019-01-01 00:46:20 2019-01-01 01:04:54
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43
我想添加一个列,从当前停止时间减去同一车辆中 driver 的下一个开始时间,以便识别任务之间的休息时间。但我想将操作保留在我选择的一组中,在本例中为 driver_id 和车辆。输出应如下所示:
driver-id veh starttime stoptime break_from_last
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57 NaT
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32 NaT
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3 kg124 10012 2019-01-01 00:46:20 2019-01-01 01:04:54 0 days 00:37:43
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43 NaT
在 R 中这很简单,如下所示使用 data.table
:
#starting shift
j = c("driver_id","veh")
df[,break_from_last:= round(
as.numeric(difftime(starttime, shift(stoptime, 1L, type = "lag"),units ="hours"))
,2),by = j]
如何在 python 中完成此操作?我可以产生一个转移的差异,我只需要添加组。见下文:
#produce a break
#BUT HOW DO I ADD A GROUP DESIGNATION?
df['break_from_last'] = df['stoptime'] - df['starttime'].shift(1)
试试这个,在开始时间列上进行分组和移动,让 pandas 使用索引上的固有数据对齐来处理数学运算:
df['break_from_last'] = df['stoptime'] - df.groupby('driver_id')['starttime'].shift()
df
输出:
driver_id veh starttime stoptime break_from_last
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57 NaT
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32 NaT
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3 kg124 10012 2019-01-01 00:46:20 2019-01-01 01:04:54 0 days 00:37:43
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43 NaT
添加veh
,
df['break_from_last'] = df['stoptime'] - df.groupby(['driver_id', 'veh'])['starttime'].shift()
输出:
driver_id veh starttime stoptime break_from_last
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57 NaT
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32 NaT
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3 kg214 10012 2019-01-01 00:46:20 2019-01-01 01:04:54 NaT
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43 NaT
我有一个数据框,我希望在其中对工作时间组内的移位列执行时差操作。例如,请参阅以下数据:
driver_id veh starttime stoptime
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38
3 kg214 10012 2019-01-01 00:46:20 2019-01-01 01:04:54
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43
我想添加一个列,从当前停止时间减去同一车辆中 driver 的下一个开始时间,以便识别任务之间的休息时间。但我想将操作保留在我选择的一组中,在本例中为 driver_id 和车辆。输出应如下所示:
driver-id veh starttime stoptime break_from_last
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57 NaT
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32 NaT
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3 kg124 10012 2019-01-01 00:46:20 2019-01-01 01:04:54 0 days 00:37:43
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43 NaT
在 R 中这很简单,如下所示使用 data.table
:
#starting shift
j = c("driver_id","veh")
df[,break_from_last:= round(
as.numeric(difftime(starttime, shift(stoptime, 1L, type = "lag"),units ="hours"))
,2),by = j]
如何在 python 中完成此操作?我可以产生一个转移的差异,我只需要添加组。见下文:
#produce a break
#BUT HOW DO I ADD A GROUP DESIGNATION?
df['break_from_last'] = df['stoptime'] - df['starttime'].shift(1)
试试这个,在开始时间列上进行分组和移动,让 pandas 使用索引上的固有数据对齐来处理数学运算:
df['break_from_last'] = df['stoptime'] - df.groupby('driver_id')['starttime'].shift()
df
输出:
driver_id veh starttime stoptime break_from_last
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57 NaT
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32 NaT
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3 kg124 10012 2019-01-01 00:46:20 2019-01-01 01:04:54 0 days 00:37:43
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43 NaT
添加veh
,
df['break_from_last'] = df['stoptime'] - df.groupby(['driver_id', 'veh'])['starttime'].shift()
输出:
driver_id veh starttime stoptime break_from_last
0 kg123 10010 2018-12-21 15:17:29 2018-12-21 15:18:57 NaT
1 kg124 10012 2019-01-01 00:10:16 2019-01-01 00:16:32 NaT
2 kg124 10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3 kg214 10012 2019-01-01 00:46:20 2019-01-01 01:04:54 NaT
4 kg125 10013 2019-01-01 00:19:06 2019-01-01 00:39:43 NaT