使用 NumPy 平均符号归一化数据段的最快方法?

Fastest way to average sign-normalized segments of data with NumPy?

在数据集中的每个点从 NumPy 数组收集数据段,根据段开始处的符号 (+ve/-ve) 对其进行归一化,然后取平均值的最快方法是什么所有细分一起?

目前我有:

import numpy as np

x0 = np.random.normal(0,1,5000) # Dataset to be analysed

l0 = 100 # Length of segment to be averaged

def average_seg(x,l):
    return np.mean([x[i:i+l]*np.sign(x[i]) for i in range(len(x)-l)],axis=0)

av_seg = average_seg(x0,l0)

时间安排如下:

%timeit average_seg(x0,l0)
22.2 ms ± 362 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

这样就可以了,但是有没有更快的方法呢?

当 x0 的长度很大且 l0 的值很大时,上面的代码会出错。我们正在考虑循环执行此代码数百万次,因此即使是渐进式改进也会有所帮助!

我们可以利用 1D convolution -

np.convolve(x,np.sign(x[:-l+1][::-1]),'valid')/(len(x)-l+1)

想法是根据 convolution definition.

使用卷积和翻转内核进行加窗求和

计时 -

In [150]: x = np.random.normal(0,1,5000) # Dataset to be analysed
     ...: l = 100 # Length of segment to be averaged

In [151]: %timeit average_seg(x,l)
17.2 ms ± 689 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [152]: %timeit np.convolve(x,np.sign(x[:-l+1][::-1]),'valid')/(len(x)-l+1)
149 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [153]: av_seg = average_seg(x,l)
     ...: out = np.convolve(x,np.sign(x[:-l+1][::-1]),'valid')/(len(x)-l+1)
     ...: print(np.allclose(out, av_seg))
True

100x+加速!