Python Pandas: 如何对组内的移位列执行操作?

Python Pandas: How do I perform an operation on a shifted column within a group?

我有一个数据框,我希望在其中对工作时间组内的移位列执行时差操作。例如,请参阅以下数据:

 driver_id    veh                starttime                stoptime
0  kg123     10010      2018-12-21 15:17:29    2018-12-21 15:18:57
1  kg124     10012      2019-01-01 00:10:16    2019-01-01 00:16:32
2  kg124     10012      2019-01-01 00:27:11    2019-01-01 00:31:38
3  kg214     10012      2019-01-01 00:46:20    2019-01-01 01:04:54
4  kg125     10013      2019-01-01 00:19:06    2019-01-01 00:39:43

我想添加一个列,从当前停止时间减去同一车辆中 driver 的下一个开始时间,以便识别任务之间的休息时间。但我想将操作保留在我选择的一组中,在本例中为 driver_id 和车辆。输出应如下所示:

 driver-id  veh         starttime              stoptime      break_from_last
0  kg123   10010   2018-12-21 15:17:29 2018-12-21 15:18:57               NaT
1  kg124   10012   2019-01-01 00:10:16 2019-01-01 00:16:32               NaT
2  kg124   10012   2019-01-01 00:27:11 2019-01-01 00:31:38   0 days 00:21:22
3  kg124   10012   2019-01-01 00:46:20 2019-01-01 01:04:54   0 days 00:37:43
4  kg125   10013   2019-01-01 00:19:06 2019-01-01 00:39:43               NaT

在 R 中这很简单,如下所示使用 data.table:

 #starting shift

      j = c("driver_id","veh")
      df[,break_from_last:= round(
        as.numeric(difftime(starttime, shift(stoptime, 1L, type = "lag"),units ="hours"))
        ,2),by = j]

如何在 python 中完成此操作?我可以产生一个转移的差异,我只需要添加组。见下文:

#produce a break
#BUT HOW DO I ADD A GROUP DESIGNATION?
df['break_from_last'] = df['stoptime'] - df['starttime'].shift(1)  

试试这个,在开始时间列上进行分组和移动,让 pandas 使用索引上的固有数据对齐来处理数学运算:

df['break_from_last'] = df['stoptime'] - df.groupby('driver_id')['starttime'].shift()
df

输出:

  driver_id    veh           starttime            stoptime break_from_last
0     kg123  10010 2018-12-21 15:17:29 2018-12-21 15:18:57             NaT
1     kg124  10012 2019-01-01 00:10:16 2019-01-01 00:16:32             NaT
2     kg124  10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3     kg124  10012 2019-01-01 00:46:20 2019-01-01 01:04:54 0 days 00:37:43
4     kg125  10013 2019-01-01 00:19:06 2019-01-01 00:39:43             NaT

添加veh,

df['break_from_last'] = df['stoptime'] - df.groupby(['driver_id', 'veh'])['starttime'].shift()

输出:

  driver_id    veh           starttime            stoptime break_from_last
0     kg123  10010 2018-12-21 15:17:29 2018-12-21 15:18:57             NaT
1     kg124  10012 2019-01-01 00:10:16 2019-01-01 00:16:32             NaT
2     kg124  10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
3     kg214  10012 2019-01-01 00:46:20 2019-01-01 01:04:54             NaT
4     kg125  10013 2019-01-01 00:19:06 2019-01-01 00:39:43             NaT