从 pandas 数据框中的前一行中减去日期时间值
Subtracting datetime value from previous row in pandas dataframe
我有一个包含两列的数据框:Category 和 Datetime
我想创建一个新列来显示当前行与上一行的日期时间之间的差异,从每个类别重新开始。
我有:
Category Datetime
A 2018-02-01 01:51:04
A 2018-02-01 02:04:04
B 2018-02-01 02:28:34
B 2018-02-01 02:41:34
B 2018-02-01 02:45:34
我想要的:
Category Datetime Difference
A 2018-02-01 01:51:04 NaT
A 2018-02-01 02:04:04 00:13:00
B 2018-02-01 02:28:34 NaT
B 2018-02-01 02:41:34 00:13:00
B 2018-02-01 02:45:34 00:04:00
编辑:
@sacul 我试过你做 df['Difference'] = list(by_group.apply(lambda x: x['Datetime']-x['Datetime'].shift()))
的解决方案,但它给了我奇怪的结果......这是我正在处理的实际数据:
Category Datetime Difference
A 2/1/18 1:51 NaT
A 2/1/18 2:04 1 days 02:52:00
B 2/1/18 2:28 NaT
C 2/1/18 2:41 NaT
D 2/1/18 6:31 0 days 00:10:30
E 2/1/18 8:26 3 days 23:19:30
F 2/1/18 10:03 0 days 00:21:00
G 2/1/18 11:11 NaT
G 2/1/18 11:11 NaT
G 2/1/18 11:11 0 days 00:00:02
G 2/1/18 11:11 0 days 00:02:30
H 2/1/18 11:12 0 days 00:00:02
H 2/1/18 11:22 0 days 00:02:28
I 2/1/18 15:26 0 days 00:00:02
I 2/1/18 16:01 0 days 00:08:26
I 2/1/18 17:26 0 days 00:00:01
J 2/1/18 17:42 0 days 00:01:31
J 2/1/18 17:42 NaT
假设您的数据位于名为 df
:
的数据框中
# In case Datetime is not a Datetime object yet (skip if it is):
df.Datetime = pd.to_datetime(df.Datetime)
by_group = df.groupby(df.Category)
df['Difference'] = list(by_group.apply(lambda x: x['Datetime']-x['Datetime'].shift()))
>>> df
Category Datetime Difference
0 A 2018-02-01 01:51:04 NaT
1 A 2018-02-01 02:04:04 00:13:00
2 B 2018-02-01 02:28:34 NaT
3 B 2018-02-01 02:41:34 00:13:00
4 B 2018-02-01 02:45:34 00:04:00
这按类别分组,然后从每组的下一行中减去每行中的日期时间对象。
编辑:
当从 strings 的 Datetime
列以 2/1/18 1:51
形式开始并修改它时,这似乎也适用于您的新数据通过 pd.to_datetime(df.Datetime)
:
>>> df1
Category Datetime Difference
0 A 2018-02-01 01:51:00 NaT
1 A 2018-02-01 02:04:00 00:13:00
2 B 2018-02-01 02:28:00 NaT
3 C 2018-02-01 02:41:00 NaT
4 D 2018-02-01 06:31:00 NaT
5 E 2018-02-01 08:26:00 NaT
6 F 2018-02-01 10:03:00 NaT
7 G 2018-02-01 11:11:00 NaT
8 G 2018-02-01 11:11:00 00:00:00
9 G 2018-02-01 11:11:00 00:00:00
10 G 2018-02-01 11:11:00 00:00:00
11 H 2018-02-01 11:12:00 NaT
12 H 2018-02-01 11:22:00 00:10:00
13 I 2018-02-01 15:26:00 NaT
14 I 2018-02-01 16:01:00 00:35:00
15 I 2018-02-01 17:26:00 01:25:00
16 J 2018-02-01 17:42:00 NaT
17 J 2018-02-01 17:42:00 00:00:00
备选方案
import pandas as pd
import numpy as np
df.DateTime = pd.to_datetime(df.DateTime)
df['Difference'] = np.where(df.Category == df.Category.shift(), df.DateTime - df.DateTime.shift(), np.nan)
注意:这仅在您的数据已预排序时有效
我有一个包含两列的数据框:Category 和 Datetime
我想创建一个新列来显示当前行与上一行的日期时间之间的差异,从每个类别重新开始。
我有:
Category Datetime
A 2018-02-01 01:51:04
A 2018-02-01 02:04:04
B 2018-02-01 02:28:34
B 2018-02-01 02:41:34
B 2018-02-01 02:45:34
我想要的:
Category Datetime Difference
A 2018-02-01 01:51:04 NaT
A 2018-02-01 02:04:04 00:13:00
B 2018-02-01 02:28:34 NaT
B 2018-02-01 02:41:34 00:13:00
B 2018-02-01 02:45:34 00:04:00
编辑:
@sacul 我试过你做 df['Difference'] = list(by_group.apply(lambda x: x['Datetime']-x['Datetime'].shift()))
的解决方案,但它给了我奇怪的结果......这是我正在处理的实际数据:
Category Datetime Difference
A 2/1/18 1:51 NaT
A 2/1/18 2:04 1 days 02:52:00
B 2/1/18 2:28 NaT
C 2/1/18 2:41 NaT
D 2/1/18 6:31 0 days 00:10:30
E 2/1/18 8:26 3 days 23:19:30
F 2/1/18 10:03 0 days 00:21:00
G 2/1/18 11:11 NaT
G 2/1/18 11:11 NaT
G 2/1/18 11:11 0 days 00:00:02
G 2/1/18 11:11 0 days 00:02:30
H 2/1/18 11:12 0 days 00:00:02
H 2/1/18 11:22 0 days 00:02:28
I 2/1/18 15:26 0 days 00:00:02
I 2/1/18 16:01 0 days 00:08:26
I 2/1/18 17:26 0 days 00:00:01
J 2/1/18 17:42 0 days 00:01:31
J 2/1/18 17:42 NaT
假设您的数据位于名为 df
:
# In case Datetime is not a Datetime object yet (skip if it is):
df.Datetime = pd.to_datetime(df.Datetime)
by_group = df.groupby(df.Category)
df['Difference'] = list(by_group.apply(lambda x: x['Datetime']-x['Datetime'].shift()))
>>> df
Category Datetime Difference
0 A 2018-02-01 01:51:04 NaT
1 A 2018-02-01 02:04:04 00:13:00
2 B 2018-02-01 02:28:34 NaT
3 B 2018-02-01 02:41:34 00:13:00
4 B 2018-02-01 02:45:34 00:04:00
这按类别分组,然后从每组的下一行中减去每行中的日期时间对象。
编辑:
当从 strings 的 Datetime
列以 2/1/18 1:51
形式开始并修改它时,这似乎也适用于您的新数据通过 pd.to_datetime(df.Datetime)
:
>>> df1
Category Datetime Difference
0 A 2018-02-01 01:51:00 NaT
1 A 2018-02-01 02:04:00 00:13:00
2 B 2018-02-01 02:28:00 NaT
3 C 2018-02-01 02:41:00 NaT
4 D 2018-02-01 06:31:00 NaT
5 E 2018-02-01 08:26:00 NaT
6 F 2018-02-01 10:03:00 NaT
7 G 2018-02-01 11:11:00 NaT
8 G 2018-02-01 11:11:00 00:00:00
9 G 2018-02-01 11:11:00 00:00:00
10 G 2018-02-01 11:11:00 00:00:00
11 H 2018-02-01 11:12:00 NaT
12 H 2018-02-01 11:22:00 00:10:00
13 I 2018-02-01 15:26:00 NaT
14 I 2018-02-01 16:01:00 00:35:00
15 I 2018-02-01 17:26:00 01:25:00
16 J 2018-02-01 17:42:00 NaT
17 J 2018-02-01 17:42:00 00:00:00
备选方案
import pandas as pd
import numpy as np
df.DateTime = pd.to_datetime(df.DateTime)
df['Difference'] = np.where(df.Category == df.Category.shift(), df.DateTime - df.DateTime.shift(), np.nan)
注意:这仅在您的数据已预排序时有效