Python : 两列之间的时间差,以小时为单位
Python : time difference between two columns in hours
我在这个数据框中有以下两列。
DATE1 DATE2
2020-07-08 23:54:17.0 2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0 2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0 2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0 2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0 2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0 2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0 2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0 2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0 2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0 2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0 2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0 2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0 2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0 2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0 2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0 2020-07-08 13:20:54.9810000
我想找出这些时间戳之间的差异,并添加一个布尔值列来说明这两个日期之间的差异是否大于 24 小时。
我尝试了以下代码片段,但出现错误:“不支持的操作数类型 -: 'str' 和 'str'”
df['diff_hours'] = df['DATE2'] - df['DATE1']
df['diff_hours']= df['diff_hours']/np.timedelta64(1,'h')
有人可以帮我解决这个片段,或者有其他方法可以轻松解决这个问题吗?提前致谢!
样本数据没有大于24小时的时间差
In [26]: df = pd.read_csv("a.csv", parse_dates=["DATE1","DATE2"])
In [27]: df
Out[27]:
DATE1 DATE2
0 2020-07-08 23:54:17 2020-07-09 19:00:56.997
1 2020-07-08 08:22:28 2020-07-08 13:23:10.363
2 2020-07-08 10:24:25 2020-07-08 13:25:30.899
3 2020-07-08 20:19:35 2020-07-09 18:57:07.690
4 2020-07-08 06:07:45 2020-07-08 13:20:49.996
5 2020-07-08 10:20:25 2020-07-08 13:25:20.039
6 2020-07-08 19:18:23 2020-07-09 18:56:06.655
7 2020-07-08 22:12:03 2020-07-09 18:59:11.625
8 2020-07-08 09:38:44 2020-07-08 13:24:44.982
9 2020-07-08 09:54:44 2020-07-08 13:24:45.375
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981
In [28]: df["diff_hours"] = (df.DATE2-df.DATE1).astype('timedelta64[h]')
In [29]: df
Out[29]:
DATE1 DATE2 diff_hours
0 2020-07-08 23:54:17 2020-07-09 19:00:56.997 19.0
1 2020-07-08 08:22:28 2020-07-08 13:23:10.363 5.0
2 2020-07-08 10:24:25 2020-07-08 13:25:30.899 3.0
3 2020-07-08 20:19:35 2020-07-09 18:57:07.690 22.0
4 2020-07-08 06:07:45 2020-07-08 13:20:49.996 7.0
5 2020-07-08 10:20:25 2020-07-08 13:25:20.039 3.0
6 2020-07-08 19:18:23 2020-07-09 18:56:06.655 23.0
7 2020-07-08 22:12:03 2020-07-09 18:59:11.625 20.0
8 2020-07-08 09:38:44 2020-07-08 13:24:44.982 3.0
9 2020-07-08 09:54:44 2020-07-08 13:24:45.375 3.0
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515 6.0
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971 24.0
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769 23.0
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006 2.0
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892 7.0
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981 7.0
In [30]: df["status"] = df["diff_hours"] > 24
In [31]: df
Out[31]:
DATE1 DATE2 diff_hours status
0 2020-07-08 23:54:17 2020-07-09 19:00:56.997 19.0 False
1 2020-07-08 08:22:28 2020-07-08 13:23:10.363 5.0 False
2 2020-07-08 10:24:25 2020-07-08 13:25:30.899 3.0 False
3 2020-07-08 20:19:35 2020-07-09 18:57:07.690 22.0 False
4 2020-07-08 06:07:45 2020-07-08 13:20:49.996 7.0 False
5 2020-07-08 10:20:25 2020-07-08 13:25:20.039 3.0 False
6 2020-07-08 19:18:23 2020-07-09 18:56:06.655 23.0 False
7 2020-07-08 22:12:03 2020-07-09 18:59:11.625 20.0 False
8 2020-07-08 09:38:44 2020-07-08 13:24:44.982 3.0 False
9 2020-07-08 09:54:44 2020-07-08 13:24:45.375 3.0 False
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515 6.0 False
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971 24.0 False
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769 23.0 False
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006 2.0 False
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892 7.0 False
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981 7.0 False
您想将这些列设为日期数据类型。
尝试
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
DATE1 DATE2
2020-07-08 23:54:17.0 2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0 2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0 2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0 2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0 2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0 2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0 2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0 2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0 2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0 2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0 2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0 2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0 2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0 2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0 2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0 2020-07-08 13:20:54.9810000
"""), sep='\s\s+')
df['ge24'] = pd.to_datetime(df.DATE2) - pd.to_datetime(df.DATE1) > '24 hours'
print(df)
输出
DATE1 DATE2 ge24
0 2020-07-08 23:54:17.0 2020-07-09 19:00:56.9970000 False
1 2020-07-08 08:22:28.0 2020-07-08 13:23:10.3630000 False
2 2020-07-08 10:24:25.0 2020-07-08 13:25:30.8990000 False
3 2020-07-08 20:19:35.0 2020-07-09 18:57:07.6900000 False
4 2020-07-08 06:07:45.0 2020-07-08 13:20:49.9960000 False
5 2020-07-08 10:20:25.0 2020-07-08 13:25:20.0390000 False
6 2020-07-08 19:18:23.0 2020-07-09 18:56:06.6550000 False
7 2020-07-08 22:12:03.0 2020-07-09 18:59:11.6250000 False
8 2020-07-08 09:38:44.0 2020-07-08 13:24:44.9820000 False
9 2020-07-08 09:54:44.0 2020-07-08 13:24:45.3750000 False
10 2020-07-08 06:23:45.0 2020-07-08 13:21:05.5150000 False
11 2020-07-08 18:49:17.0 2020-07-09 18:55:41.9710000 True
12 2020-07-08 19:47:23.0 2020-07-09 18:56:37.7690000 False
13 2020-07-08 10:48:25.0 2020-07-08 13:25:45.0060000 False
14 2020-07-08 05:30:45.0 2020-07-08 13:20:15.8920000 False
15 2020-07-08 06:09:45.0 2020-07-08 13:20:54.9810000 False
我在这个数据框中有以下两列。
DATE1 DATE2
2020-07-08 23:54:17.0 2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0 2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0 2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0 2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0 2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0 2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0 2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0 2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0 2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0 2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0 2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0 2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0 2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0 2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0 2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0 2020-07-08 13:20:54.9810000
我想找出这些时间戳之间的差异,并添加一个布尔值列来说明这两个日期之间的差异是否大于 24 小时。
我尝试了以下代码片段,但出现错误:“不支持的操作数类型 -: 'str' 和 'str'”
df['diff_hours'] = df['DATE2'] - df['DATE1']
df['diff_hours']= df['diff_hours']/np.timedelta64(1,'h')
有人可以帮我解决这个片段,或者有其他方法可以轻松解决这个问题吗?提前致谢!
样本数据没有大于24小时的时间差
In [26]: df = pd.read_csv("a.csv", parse_dates=["DATE1","DATE2"])
In [27]: df
Out[27]:
DATE1 DATE2
0 2020-07-08 23:54:17 2020-07-09 19:00:56.997
1 2020-07-08 08:22:28 2020-07-08 13:23:10.363
2 2020-07-08 10:24:25 2020-07-08 13:25:30.899
3 2020-07-08 20:19:35 2020-07-09 18:57:07.690
4 2020-07-08 06:07:45 2020-07-08 13:20:49.996
5 2020-07-08 10:20:25 2020-07-08 13:25:20.039
6 2020-07-08 19:18:23 2020-07-09 18:56:06.655
7 2020-07-08 22:12:03 2020-07-09 18:59:11.625
8 2020-07-08 09:38:44 2020-07-08 13:24:44.982
9 2020-07-08 09:54:44 2020-07-08 13:24:45.375
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981
In [28]: df["diff_hours"] = (df.DATE2-df.DATE1).astype('timedelta64[h]')
In [29]: df
Out[29]:
DATE1 DATE2 diff_hours
0 2020-07-08 23:54:17 2020-07-09 19:00:56.997 19.0
1 2020-07-08 08:22:28 2020-07-08 13:23:10.363 5.0
2 2020-07-08 10:24:25 2020-07-08 13:25:30.899 3.0
3 2020-07-08 20:19:35 2020-07-09 18:57:07.690 22.0
4 2020-07-08 06:07:45 2020-07-08 13:20:49.996 7.0
5 2020-07-08 10:20:25 2020-07-08 13:25:20.039 3.0
6 2020-07-08 19:18:23 2020-07-09 18:56:06.655 23.0
7 2020-07-08 22:12:03 2020-07-09 18:59:11.625 20.0
8 2020-07-08 09:38:44 2020-07-08 13:24:44.982 3.0
9 2020-07-08 09:54:44 2020-07-08 13:24:45.375 3.0
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515 6.0
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971 24.0
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769 23.0
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006 2.0
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892 7.0
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981 7.0
In [30]: df["status"] = df["diff_hours"] > 24
In [31]: df
Out[31]:
DATE1 DATE2 diff_hours status
0 2020-07-08 23:54:17 2020-07-09 19:00:56.997 19.0 False
1 2020-07-08 08:22:28 2020-07-08 13:23:10.363 5.0 False
2 2020-07-08 10:24:25 2020-07-08 13:25:30.899 3.0 False
3 2020-07-08 20:19:35 2020-07-09 18:57:07.690 22.0 False
4 2020-07-08 06:07:45 2020-07-08 13:20:49.996 7.0 False
5 2020-07-08 10:20:25 2020-07-08 13:25:20.039 3.0 False
6 2020-07-08 19:18:23 2020-07-09 18:56:06.655 23.0 False
7 2020-07-08 22:12:03 2020-07-09 18:59:11.625 20.0 False
8 2020-07-08 09:38:44 2020-07-08 13:24:44.982 3.0 False
9 2020-07-08 09:54:44 2020-07-08 13:24:45.375 3.0 False
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515 6.0 False
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971 24.0 False
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769 23.0 False
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006 2.0 False
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892 7.0 False
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981 7.0 False
您想将这些列设为日期数据类型。
尝试
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
DATE1 DATE2
2020-07-08 23:54:17.0 2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0 2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0 2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0 2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0 2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0 2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0 2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0 2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0 2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0 2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0 2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0 2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0 2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0 2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0 2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0 2020-07-08 13:20:54.9810000
"""), sep='\s\s+')
df['ge24'] = pd.to_datetime(df.DATE2) - pd.to_datetime(df.DATE1) > '24 hours'
print(df)
输出
DATE1 DATE2 ge24
0 2020-07-08 23:54:17.0 2020-07-09 19:00:56.9970000 False
1 2020-07-08 08:22:28.0 2020-07-08 13:23:10.3630000 False
2 2020-07-08 10:24:25.0 2020-07-08 13:25:30.8990000 False
3 2020-07-08 20:19:35.0 2020-07-09 18:57:07.6900000 False
4 2020-07-08 06:07:45.0 2020-07-08 13:20:49.9960000 False
5 2020-07-08 10:20:25.0 2020-07-08 13:25:20.0390000 False
6 2020-07-08 19:18:23.0 2020-07-09 18:56:06.6550000 False
7 2020-07-08 22:12:03.0 2020-07-09 18:59:11.6250000 False
8 2020-07-08 09:38:44.0 2020-07-08 13:24:44.9820000 False
9 2020-07-08 09:54:44.0 2020-07-08 13:24:45.3750000 False
10 2020-07-08 06:23:45.0 2020-07-08 13:21:05.5150000 False
11 2020-07-08 18:49:17.0 2020-07-09 18:55:41.9710000 True
12 2020-07-08 19:47:23.0 2020-07-09 18:56:37.7690000 False
13 2020-07-08 10:48:25.0 2020-07-08 13:25:45.0060000 False
14 2020-07-08 05:30:45.0 2020-07-08 13:20:15.8920000 False
15 2020-07-08 06:09:45.0 2020-07-08 13:20:54.9810000 False