时间序列的傅里叶变换（fft），但清理数据的两端相互靠近

Question

我有一个表示虚拟环境中的 X 和 Z 坐标的时间序列。

X = np.array(df["X"])
Z = np.array(df["Z"])

X 和 Z 坐标都包含来自不同来源的噪声。为了滤除噪音，我想使用傅立叶变换。经过一些研究，我使用 https://medium.com/swlh/5-tips-for-working-with-time-series-in-python-d889109e676d 中的代码对我的数据进行去噪。

def fft_denoiser(x, n_components, to_real=True):
    n = len(x)

    # compute the fft
    fft = np.fft.fft(x, n)

    # compute power spectrum density
    # squared magnitud of each fft coefficient
    PSD = fft * np.conj(fft) / n

    # keep high frequencies
    _mask = PSD > n_components
    fft = _mask * fft

    # inverse fourier transform
    clean_data = np.fft.ifft(fft)

    if to_real:
        clean_data = clean_data.real

    return clean_data

设置n_components后我喜欢使用清理过的数据。一切顺利，正如我绘制的 X 坐标：

只有在开始和结束时，清理后的数据突然向彼此的值移动...有人可以帮助或解释一下是什么原因造成的，我该如何克服这个问题？

Answer 1

您遇到此问题的原因是 FFT 隐含地假定提供的输入信号是周期性的。如果您重复原始数据，您会发现在每个周期都有很大的不连续性（随着信号从 ~20 回落到 ~5）。移除一些较高频率的分量后，您会看到边缘处的不连续性稍微不那么尖锐（开始时有几个样本，最后有几个样本）。

为避免这种情况，您可以使用线性 FIR 滤波器在时域中进行滤波，它可以在没有周期性假设的情况下处理数据序列。

为了这个答案的目的，我构建了一个综合测试信号（您可以使用它来重新创建相同的条件），但您显然可以改用自己的数据：

# Generate synthetic signal for testing purposes
fs = 1 # Hz
f0 = 0.002/fs
f1 = 0.01/fs
dt = 1/fs
t = np.arange(200, 901)*dt
m = (25-5)/(t[-1]-t[0])
phi = 4.2
x = 5 + m*(t-t[0]) + 2*np.sin(2*np.pi*f0*t) + 1*np.sin(2*np.pi*f1*t+phi) + 0.2*np.random.randn(len(t))

现在要设计过滤器，我们可以采用 _mask 的逆变换（而不是应用掩码）：

import numpy as np

# Design denoising filter
def freq_sampling_filter(x, threshold):
  n = len(x)

  # compute the fft
  fft = np.fft.fft(x, n)

  # compute power spectrum density
  # squared magnitud of each fft coefficient
  PSD = fft * np.conj(fft) / n

  # keep frequencies with large contributions
  _mask = PSD > threshold
  _coff = np.fft.fftshift(np.real(np.fft.ifft(_mask)))
  return _coff

coff = freq_sampling_filter(x, threshold)

threshold 是一个可调参数，可以选择它来保留您想要保留的足够频率分量并去除不需要的频率分量。这当然是非常主观的。

然后我们可以简单地应用过滤器 scipy.signal.filtfilt:

from scipy.signal import filtfilt

# apply the denoising filter
cleaned = filtfilt(coff, 1, x, padlen=len(x)-1, padtype='constant')

出于说明的目的，使用 10 的 threshold 和上面生成的合成信号会产生以下原始数据（变量 x）和清理后的数据（变量 cleaned） :

padtype 到 'constant' 的选择可确保过滤后的值在未过滤数据的开始和结束值处开始和结束。

备选

如评论中所述，filtfilt 对于较长的数据集来说可能很昂贵。作为替代方案，可以通过使用 scipy.fftconvolve 使用基于 FFT 的卷积来执行过滤。请注意，在这种情况下，没有 filtfilt 的 padtype 参数的等效项，因此我们需要手动填充信号以避免开始和结束处的边缘效应。

n = len(x)
# Manually pad signal to avoid edge effects
x_padded = np.concatenate((x[0]*np.ones(n-1), x, x[-1]*np.ones((n-1)//2)))
# Filter using FFT-based convolution
cleaned = fftconvolve(x_padded, coff, mode='same')
# Extract result (remove data from padding)
cleaned = cleaned[2*(n-1)//2:-n//2+1]

作为参考，以下是上述长度为 700 的信号的一些基准比较（以秒为单位计时，越小越好）：

filtfilt    : 0.3831593
fftconvolve : 0.00028040000000029153

请注意，相对性能会有所不同，但随着信号变长，基于 FFT 的卷积预计会表现得相对更好。

时间序列的傅里叶变换（fft），但清理数据的两端相互靠近

Fourier transformation (fft) for Time Series, but both ends of cleaned data move towards each other

python

time

fft

series

coordinates