从时间序列中识别事件
Identifying events from time series
我有一个能见度数据时间序列,其中包含每半小时的能见度测量值。当能见度低于 1 公里时定义雾事件,当能见度超过 1 公里时雾事件结束。请在下面找到代码。我打算找出此类雾事件的数量以及每个此类雾事件的持续时间。
from IPython.display import display
import pandas as pd
import matplotlib.pyplot as plt
from google.colab import files
uploaded = files.upload()
import io
df = pd.read_csv(io.BytesIO(uploaded['visibility.csv']))
df.set_index('Unnamed: 0',inplace=True)
df.index = pd.to_datetime(df.index)
df=df.interpolate(method='linear', limit_direction='forward')
display(df)
Unnamed: 0 visibility_km
2016-01-01 00:00:00 0.595456
2016-01-01 00:30:00 0.595456
2016-01-01 01:00:00 0.595456
2016-01-01 01:30:00 0.595456
2016-01-01 02:00:00 0.595456
... ...
2020-12-31 21:30:00 0.925370
2020-12-31 22:00:00 0.901230
2020-12-31 22:30:00 0.804670
2020-12-31 23:00:00 0.804670
2020-12-31 23:30:00 0.692016
# FOG Events
fog_events=df[df<1.0].count()
print('no. of fog events',fog_events)
no. of fog events 10318
但它只是给出能见度低于 1 公里的次数,而不是雾事件的次数。
您可以像这样创建样本时间序列数据:
import pandas as pd
tdf = pd.DataFrame({'Time':pd.date_range(start='1/1/2016', periods=11, freq='30s'),
'Visibility_km': [0.56, 0.75, 0.99, 1.01, 1.1, 1.3, 0.5, 0.6, 0.7, 1.2, 1.3]})
这种格式的数据可以更轻松地复制和粘贴您的问题。要获取雾事件的总数及其持续时间,首先为事件创建一列,并在事件开始和结束时标记一列
# Create column to mark duration of events
tdf['fog_event'] = (tdf['Visibility_km'] < 1.).astype(int)
# Create column to mark event start and end
tdf['event_diff'] = tdf['fog_event'] != tdf['fog_event'].shift(1)
print(tdf)
Time Visibility_km fog_event event_diff
0 2016-01-01 00:00:00 0.56 1 True
1 2016-01-01 00:00:30 0.75 1 False
2 2016-01-01 00:01:00 0.99 1 False
3 2016-01-01 00:01:30 1.01 0 True
4 2016-01-01 00:02:00 1.10 0 False
5 2016-01-01 00:02:30 1.30 0 False
6 2016-01-01 00:03:00 0.50 1 True
7 2016-01-01 00:03:30 0.60 1 False
8 2016-01-01 00:04:00 0.70 1 False
9 2016-01-01 00:04:30 1.20 0 True
10 2016-01-01 00:05:00 1.30 0 False
现在您可以通过两种方式获取事件:
第一种方式不使用 Pandas,这是我对事件进行分组的原始方式。
from itertools import groupby
groups = [list(g) for _, g in groupby(tdf.fog_event.values)]
fog_durations = np.array([sum(g) for g in groups])
duration_each_event = fog_durations[fog_durations != 0]
total_fog_events = sum(fog_durations != 0)
print(duration_each_event)
array([3, 3])
print(total_fog_events)
2
要使用Pandas,您可以按事件差异的累积总和进行分组
fdf = tdf.groupby([tdf['event_diff'].cumsum(), 'fog_event']).size()
fdf = fdf.reset_index(name = 'duration').rename(columns = {'event_diff': 'index'})
duration_each_event = fdf.loc[fdf['fog_event'] == 1, 'duration'].values
total_fog_events = fdf.loc[fdf['fog_event'] == 1, 'fog_event'].sum()
print(duration_each_event)
[3, 3]
print(total_fog_events)
2
假设测量之间的时间间隔不变(即总是相隔 30 秒进行测量),您可以将 duration_each_event
乘以 30(秒)或 0.5(分钟)以获得持续时间单位。
我有一个能见度数据时间序列,其中包含每半小时的能见度测量值。当能见度低于 1 公里时定义雾事件,当能见度超过 1 公里时雾事件结束。请在下面找到代码。我打算找出此类雾事件的数量以及每个此类雾事件的持续时间。
from IPython.display import display
import pandas as pd
import matplotlib.pyplot as plt
from google.colab import files
uploaded = files.upload()
import io
df = pd.read_csv(io.BytesIO(uploaded['visibility.csv']))
df.set_index('Unnamed: 0',inplace=True)
df.index = pd.to_datetime(df.index)
df=df.interpolate(method='linear', limit_direction='forward')
display(df)
Unnamed: 0 visibility_km
2016-01-01 00:00:00 0.595456
2016-01-01 00:30:00 0.595456
2016-01-01 01:00:00 0.595456
2016-01-01 01:30:00 0.595456
2016-01-01 02:00:00 0.595456
... ...
2020-12-31 21:30:00 0.925370
2020-12-31 22:00:00 0.901230
2020-12-31 22:30:00 0.804670
2020-12-31 23:00:00 0.804670
2020-12-31 23:30:00 0.692016
# FOG Events
fog_events=df[df<1.0].count()
print('no. of fog events',fog_events)
no. of fog events 10318
但它只是给出能见度低于 1 公里的次数,而不是雾事件的次数。
您可以像这样创建样本时间序列数据:
import pandas as pd
tdf = pd.DataFrame({'Time':pd.date_range(start='1/1/2016', periods=11, freq='30s'),
'Visibility_km': [0.56, 0.75, 0.99, 1.01, 1.1, 1.3, 0.5, 0.6, 0.7, 1.2, 1.3]})
这种格式的数据可以更轻松地复制和粘贴您的问题。要获取雾事件的总数及其持续时间,首先为事件创建一列,并在事件开始和结束时标记一列
# Create column to mark duration of events
tdf['fog_event'] = (tdf['Visibility_km'] < 1.).astype(int)
# Create column to mark event start and end
tdf['event_diff'] = tdf['fog_event'] != tdf['fog_event'].shift(1)
print(tdf)
Time Visibility_km fog_event event_diff
0 2016-01-01 00:00:00 0.56 1 True
1 2016-01-01 00:00:30 0.75 1 False
2 2016-01-01 00:01:00 0.99 1 False
3 2016-01-01 00:01:30 1.01 0 True
4 2016-01-01 00:02:00 1.10 0 False
5 2016-01-01 00:02:30 1.30 0 False
6 2016-01-01 00:03:00 0.50 1 True
7 2016-01-01 00:03:30 0.60 1 False
8 2016-01-01 00:04:00 0.70 1 False
9 2016-01-01 00:04:30 1.20 0 True
10 2016-01-01 00:05:00 1.30 0 False
现在您可以通过两种方式获取事件:
第一种方式不使用 Pandas,这是我对事件进行分组的原始方式。
from itertools import groupby
groups = [list(g) for _, g in groupby(tdf.fog_event.values)]
fog_durations = np.array([sum(g) for g in groups])
duration_each_event = fog_durations[fog_durations != 0]
total_fog_events = sum(fog_durations != 0)
print(duration_each_event)
array([3, 3])
print(total_fog_events)
2
要使用Pandas,您可以按事件差异的累积总和进行分组
fdf = tdf.groupby([tdf['event_diff'].cumsum(), 'fog_event']).size()
fdf = fdf.reset_index(name = 'duration').rename(columns = {'event_diff': 'index'})
duration_each_event = fdf.loc[fdf['fog_event'] == 1, 'duration'].values
total_fog_events = fdf.loc[fdf['fog_event'] == 1, 'fog_event'].sum()
print(duration_each_event)
[3, 3]
print(total_fog_events)
2
假设测量之间的时间间隔不变(即总是相隔 30 秒进行测量),您可以将 duration_each_event
乘以 30(秒)或 0.5(分钟)以获得持续时间单位。