从时间序列中识别事件

Identifying events from time series

我有一个能见度数据时间序列,其中包含每半小时的能见度测量值。当能见度低于 1 公里时定义雾事件,当能见度超过 1 公里时雾事件结束。请在下面找到代码。我打算找出此类雾事件的数量以及每个此类雾事件的持续时间。

from IPython.display import display
import pandas as pd

import matplotlib.pyplot as plt

from google.colab import files
uploaded = files.upload()

import io
df = pd.read_csv(io.BytesIO(uploaded['visibility.csv']))

df.set_index('Unnamed: 0',inplace=True)
df.index = pd.to_datetime(df.index)

df=df.interpolate(method='linear', limit_direction='forward')
display(df)


Unnamed: 0          visibility_km
2016-01-01 00:00:00 0.595456
2016-01-01 00:30:00 0.595456
2016-01-01 01:00:00 0.595456
2016-01-01 01:30:00 0.595456
2016-01-01 02:00:00 0.595456
... ...
2020-12-31 21:30:00 0.925370
2020-12-31 22:00:00 0.901230
2020-12-31 22:30:00 0.804670
2020-12-31 23:00:00 0.804670
2020-12-31 23:30:00 0.692016

# FOG Events

fog_events=df[df<1.0].count()
print('no. of fog events',fog_events)
no. of fog events 10318

但它只是给出能见度低于 1 公里的次数,而不是雾事件的次数。

您可以像这样创建样本时间序列数据:

import pandas as pd

tdf = pd.DataFrame({'Time':pd.date_range(start='1/1/2016', periods=11, freq='30s'),
                   'Visibility_km': [0.56, 0.75, 0.99, 1.01, 1.1, 1.3, 0.5, 0.6, 0.7, 1.2, 1.3]})

这种格式的数据可以更轻松地复制和粘贴您的问题。要获取雾事件的总数及其持续时间,首先为事件创建一列,并在事件开始和结束时标记一列

# Create column to mark duration of events
tdf['fog_event'] = (tdf['Visibility_km'] < 1.).astype(int)
# Create column to mark event start and end
tdf['event_diff'] = tdf['fog_event'] != tdf['fog_event'].shift(1)
print(tdf)

               Time     Visibility_km   fog_event   event_diff
0   2016-01-01 00:00:00     0.56              1     True
1   2016-01-01 00:00:30     0.75              1     False
2   2016-01-01 00:01:00     0.99              1     False
3   2016-01-01 00:01:30     1.01              0     True
4   2016-01-01 00:02:00     1.10              0     False
5   2016-01-01 00:02:30     1.30              0     False
6   2016-01-01 00:03:00     0.50              1     True
7   2016-01-01 00:03:30     0.60              1     False
8   2016-01-01 00:04:00     0.70              1     False
9   2016-01-01 00:04:30     1.20              0     True
10  2016-01-01 00:05:00     1.30              0     False

现在您可以通过两种方式获取事件:

第一种方式不使用 Pandas,这是我对事件进行分组的原始方式。

from itertools import groupby

groups = [list(g) for _, g in groupby(tdf.fog_event.values)]
fog_durations = np.array([sum(g) for g in groups])

duration_each_event = fog_durations[fog_durations != 0]
total_fog_events = sum(fog_durations != 0)

print(duration_each_event)
array([3, 3])
print(total_fog_events)
2

要使用Pandas,您可以按事件差异的累积总和进行分组

fdf = tdf.groupby([tdf['event_diff'].cumsum(), 'fog_event']).size()
fdf = fdf.reset_index(name = 'duration').rename(columns = {'event_diff': 'index'})

duration_each_event = fdf.loc[fdf['fog_event'] == 1, 'duration'].values
total_fog_events = fdf.loc[fdf['fog_event'] == 1, 'fog_event'].sum()

print(duration_each_event)
[3, 3]
print(total_fog_events)
2

假设测量之间的时间间隔不变(即总是相隔 30 秒进行测量),您可以将 duration_each_event 乘以 30(秒)或 0.5(分钟)以获得持续时间单位。