识别 pandas 时间序列中的极值

Question

我正在寻找一种方法来识别 pandas 时间序列中的局部极值。

MWE 将是

import math
import matplotlib.pyplot as plt
import pandas as pd

sin_list = []
for i in range(200):
    sin_list.append(math.sin(i / 10) + i / 100)

idx = pd.date_range('2018-01-01', periods=200, freq='H')

ts = pd.Series(sin_list, index=idx)

ts.plot(style='.')
plt.show()

并且红线将标记我想要识别的时间戳。请注意，当然，这个系列中有有限的步骤。

一个可能的解决方案是用一条曲线拟合它，导出它，然后确定梯度为 0 的确切位置。这似乎是我自己编程的一个很大的努力，我假设这样的实现存在于某处.

Answer 1

我基于.diff()模块开发了一个解决问题的方法。这里的关键属性是 get_percentile 函数的 p 因子。由于有限数量的值意味着梯度不会达到值 0，因此解决方案 space 必须有点模糊。这意味着，值越少，p 因子就必须越高。在我的解决方案中，0.05 被证明足以识别极值，但小到足以以合理的精度定位极值。

这是代码：

import copy
import math

import matplotlib.pyplot as plt
import pandas as pd


def get_percentile(data: list, p: float):
    _data = copy.copy(data)
    _data.sort()
    result = _data[math.floor(len(_data) * p) - 1]
    return result


sin_list = []
for i in range(200):
    sin_list.append(math.sin(i / 10) + i / 100)

idx = pd.date_range('2018-01-01', periods=200, freq='H')

ts = pd.Series(sin_list, index=idx)

gradient_ts = abs(ts.diff())

percentile = get_percentile(gradient_ts.values, p=0.05)

binary_ts = gradient_ts.where(gradient_ts > percentile, 1).where(gradient_ts < percentile, 0)

fig, ax = plt.subplots()
binary_ts.plot(drawstyle="steps", ax=ax)
ax.fill_between(binary_ts.index, binary_ts, facecolor='green', alpha=0.5, step='pre')

ts.plot(secondary_y=True, style='.')

plt.show()

识别 pandas 时间序列中的极值

Identifying extremes in pandas timeseries

python

math

pandas

data-science