如何转换时间列并在 python pandas 中查找具有条件的时间增量
How to cast time columns and find timedelta with condition in python pandas
我有一个非空对象的时间列,我无法将它转换为 timedelta 或 datetime。
Time msg
12:29:36.306000 Setup
12:29:36.507000 Alerting
12:29:38.207000 Service
12:29:39.194000 Setup
12:30:05.773000 Alerting
12:30:06.205000 Service
12:32:07.315000 Setup
12:32:17.194000 Service
12:32:26.889000 Setup
12:36:06.274000 Alerting
12:36:08.523000 Service
12:37:59.200000 Setup
12:47:10.652000 Alerting
12:47:43.921000 Setup
当我键入 df.info() 时,我得到一个 'Time' 列是非空对象,我无法将它转换为 timedelta 或 datetime(为此很明显我为什么不能'不要这样做)。那么,找到连续消息(时间增量)之间差异的解决方案是什么,但如果 timedelta < 5sec 则通过。
输出:
Time msg diff
12:29:36.306000 Setup
12:29:36.507000 Alerting
12:29:38.207000 Service
12:29:39.194000 Setup
12:30:05.773000 Alerting
12:30:06.205000 Service
12:32:07.315000 Setup
12:32:17.194000 Service
12:32:26.889000 Setup
12:36:06.274000 Alerting 6.30***
12:36:08.523000 Service
12:37:59.200000 Setup
12:47:10.652000 Alerting 11.02***
12:47:43.921000 Setup
我试过这样的东西:
df['diff'] = (df['Time']df['Time'].shift()).fillna(0)
但是我不知道要写5秒间隔的条件。
我认为首先需要转换为 str
然后调用 to_timedelta
。
然后得到diff
并与5s
比较。
新列的最后一个掩码使用 mask
:
df['Time'] = pd.to_timedelta(df['Time'].astype(str))
df['diff'] = df['Time'].diff()
df['mask'] = df['Time'].diff() > pd.Timedelta(5, unit='s')
print (df)
Time msg diff mask
0 12:29:36.306000 Setup NaT False
1 12:29:36.507000 Alerting 00:00:00.201000 False
2 12:29:38.207000 Service 00:00:01.700000 False
3 12:29:39.194000 Setup 00:00:00.987000 False
4 12:30:05.773000 Alerting 00:00:26.579000 True
5 12:30:06.205000 Service 00:00:00.432000 False
6 12:32:07.315000 Setup 00:02:01.110000 True
7 12:32:17.194000 Service 00:00:09.879000 True
8 12:32:26.889000 Setup 00:00:09.695000 True
9 12:36:06.274000 Alerting 00:03:39.385000 True
10 12:36:08.523000 Service 00:00:02.249000 False
11 12:37:59.200000 Setup 00:01:50.677000 True
12 12:47:10.652000 Alerting 00:09:11.452000 True
13 12:47:43.921000 Setup 00:00:33.269000 True
df['Time'] = pd.to_timedelta(df['Time'])
diff = df['Time'].diff()
mask = df['Time'].diff() > pd.Timedelta(5, unit='s')
df['new'] = diff.where(mask)
print (df)
Time msg new
0 12:29:36.306000 Setup NaT
1 12:29:36.507000 Alerting NaT
2 12:29:38.207000 Service NaT
3 12:29:39.194000 Setup NaT
4 12:30:05.773000 Alerting 00:00:26.579000
5 12:30:06.205000 Service NaT
6 12:32:07.315000 Setup 00:02:01.110000
7 12:32:17.194000 Service 00:00:09.879000
8 12:32:26.889000 Setup 00:00:09.695000
9 12:36:06.274000 Alerting 00:03:39.385000
10 12:36:08.523000 Service NaT
11 12:37:59.200000 Setup 00:01:50.677000
12 12:47:10.652000 Alerting 00:09:11.452000
13 12:47:43.921000 Setup 00:00:33.269000
我有一个非空对象的时间列,我无法将它转换为 timedelta 或 datetime。
Time msg
12:29:36.306000 Setup
12:29:36.507000 Alerting
12:29:38.207000 Service
12:29:39.194000 Setup
12:30:05.773000 Alerting
12:30:06.205000 Service
12:32:07.315000 Setup
12:32:17.194000 Service
12:32:26.889000 Setup
12:36:06.274000 Alerting
12:36:08.523000 Service
12:37:59.200000 Setup
12:47:10.652000 Alerting
12:47:43.921000 Setup
当我键入 df.info() 时,我得到一个 'Time' 列是非空对象,我无法将它转换为 timedelta 或 datetime(为此很明显我为什么不能'不要这样做)。那么,找到连续消息(时间增量)之间差异的解决方案是什么,但如果 timedelta < 5sec 则通过。 输出:
Time msg diff
12:29:36.306000 Setup
12:29:36.507000 Alerting
12:29:38.207000 Service
12:29:39.194000 Setup
12:30:05.773000 Alerting
12:30:06.205000 Service
12:32:07.315000 Setup
12:32:17.194000 Service
12:32:26.889000 Setup
12:36:06.274000 Alerting 6.30***
12:36:08.523000 Service
12:37:59.200000 Setup
12:47:10.652000 Alerting 11.02***
12:47:43.921000 Setup
我试过这样的东西:
df['diff'] = (df['Time']df['Time'].shift()).fillna(0)
但是我不知道要写5秒间隔的条件。
我认为首先需要转换为 str
然后调用 to_timedelta
。
然后得到diff
并与5s
比较。
新列的最后一个掩码使用 mask
:
df['Time'] = pd.to_timedelta(df['Time'].astype(str))
df['diff'] = df['Time'].diff()
df['mask'] = df['Time'].diff() > pd.Timedelta(5, unit='s')
print (df)
Time msg diff mask
0 12:29:36.306000 Setup NaT False
1 12:29:36.507000 Alerting 00:00:00.201000 False
2 12:29:38.207000 Service 00:00:01.700000 False
3 12:29:39.194000 Setup 00:00:00.987000 False
4 12:30:05.773000 Alerting 00:00:26.579000 True
5 12:30:06.205000 Service 00:00:00.432000 False
6 12:32:07.315000 Setup 00:02:01.110000 True
7 12:32:17.194000 Service 00:00:09.879000 True
8 12:32:26.889000 Setup 00:00:09.695000 True
9 12:36:06.274000 Alerting 00:03:39.385000 True
10 12:36:08.523000 Service 00:00:02.249000 False
11 12:37:59.200000 Setup 00:01:50.677000 True
12 12:47:10.652000 Alerting 00:09:11.452000 True
13 12:47:43.921000 Setup 00:00:33.269000 True
df['Time'] = pd.to_timedelta(df['Time'])
diff = df['Time'].diff()
mask = df['Time'].diff() > pd.Timedelta(5, unit='s')
df['new'] = diff.where(mask)
print (df)
Time msg new
0 12:29:36.306000 Setup NaT
1 12:29:36.507000 Alerting NaT
2 12:29:38.207000 Service NaT
3 12:29:39.194000 Setup NaT
4 12:30:05.773000 Alerting 00:00:26.579000
5 12:30:06.205000 Service NaT
6 12:32:07.315000 Setup 00:02:01.110000
7 12:32:17.194000 Service 00:00:09.879000
8 12:32:26.889000 Setup 00:00:09.695000
9 12:36:06.274000 Alerting 00:03:39.385000
10 12:36:08.523000 Service NaT
11 12:37:59.200000 Setup 00:01:50.677000
12 12:47:10.652000 Alerting 00:09:11.452000
13 12:47:43.921000 Setup 00:00:33.269000