Python : 两列之间的时间差,以小时为单位

Python : time difference between two columns in hours

我在这个数据框中有以下两列。

      DATE1                       DATE2

2020-07-08 23:54:17.0   2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0   2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0   2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0   2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0   2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0   2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0   2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0   2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0   2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0   2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0   2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0   2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0   2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0   2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0   2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0   2020-07-08 13:20:54.9810000

我想找出这些时间戳之间的差异,并添加一个布尔值列来说明这两个日期之间的差异是否大于 24 小时。

我尝试了以下代码片段,但出现错误:“不支持的操作数类型 -: 'str' 和 'str'”

df['diff_hours'] = df['DATE2'] - df['DATE1']
df['diff_hours']= df['diff_hours']/np.timedelta64(1,'h')

有人可以帮我解决这个片段,或者有其他方法可以轻松解决这个问题吗?提前致谢!

样本数据没有大于24小时的时间差

In [26]: df = pd.read_csv("a.csv", parse_dates=["DATE1","DATE2"])

In [27]: df
Out[27]:
                 DATE1                   DATE2
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981

In [28]: df["diff_hours"] = (df.DATE2-df.DATE1).astype('timedelta64[h]')

In [29]: df
Out[29]:
                 DATE1                   DATE2  diff_hours
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997        19.0
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363         5.0
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899         3.0
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690        22.0
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996         7.0
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039         3.0
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655        23.0
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625        20.0
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982         3.0
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375         3.0
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515         6.0
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971        24.0
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769        23.0
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006         2.0
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892         7.0
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981         7.0

In [30]: df["status"] = df["diff_hours"] > 24

In [31]: df
Out[31]:
                 DATE1                   DATE2  diff_hours  status
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997        19.0   False
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363         5.0   False
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899         3.0   False
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690        22.0   False
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996         7.0   False
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039         3.0   False
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655        23.0   False
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625        20.0   False
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982         3.0   False
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375         3.0   False
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515         6.0   False
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971        24.0   False
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769        23.0   False
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006         2.0   False
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892         7.0   False
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981         7.0   False

您想将这些列设为日期数据类型。

尝试

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""
DATE1                   DATE2
2020-07-08 23:54:17.0   2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0   2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0   2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0   2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0   2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0   2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0   2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0   2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0   2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0   2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0   2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0   2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0   2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0   2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0   2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0   2020-07-08 13:20:54.9810000
"""), sep='\s\s+')

df['ge24'] = pd.to_datetime(df.DATE2) - pd.to_datetime(df.DATE1) > '24 hours'
print(df)

输出

                    DATE1                        DATE2   ge24
0   2020-07-08 23:54:17.0  2020-07-09 19:00:56.9970000  False
1   2020-07-08 08:22:28.0  2020-07-08 13:23:10.3630000  False
2   2020-07-08 10:24:25.0  2020-07-08 13:25:30.8990000  False
3   2020-07-08 20:19:35.0  2020-07-09 18:57:07.6900000  False
4   2020-07-08 06:07:45.0  2020-07-08 13:20:49.9960000  False
5   2020-07-08 10:20:25.0  2020-07-08 13:25:20.0390000  False
6   2020-07-08 19:18:23.0  2020-07-09 18:56:06.6550000  False
7   2020-07-08 22:12:03.0  2020-07-09 18:59:11.6250000  False
8   2020-07-08 09:38:44.0  2020-07-08 13:24:44.9820000  False
9   2020-07-08 09:54:44.0  2020-07-08 13:24:45.3750000  False
10  2020-07-08 06:23:45.0  2020-07-08 13:21:05.5150000  False
11  2020-07-08 18:49:17.0  2020-07-09 18:55:41.9710000   True
12  2020-07-08 19:47:23.0  2020-07-09 18:56:37.7690000  False
13  2020-07-08 10:48:25.0  2020-07-08 13:25:45.0060000  False
14  2020-07-08 05:30:45.0  2020-07-08 13:20:15.8920000  False
15  2020-07-08 06:09:45.0  2020-07-08 13:20:54.9810000  False