如何估计噪声层后面的高斯分布？

Question

所以我有这个一维数据的直方图，其中包含一些以秒为单位的过渡时间。数据包含很多噪音，但噪音背后隐藏着一些 peaks/gaussians，它们描述了正确的时间值。（见图片）

从正常步行速度分布（平均 1.4m/s）中获取的不同速度的人在两个位置之间行走的过渡时间检索数据。有时，两个位置之间可能存在多条路径，这可能会产生多个高斯分布。

我想提取噪声上方显示的底层高斯分布。但是，由于数据可能来自不同的场景，但具有任意数量（比如 0-3 左右）的正确路径/'gaussians'，我不能真正使用 GMM（高斯混合模型），因为这需要我知道高斯分量的个数吗？

我 assume/know 正确的过渡时间分布是高斯分布，而噪声来自其他分布（卡方？）。我对这个话题很陌生，所以我可能完全错了。

因为我事先知道两点之间的真实距离，所以我知道方法应该位于何处。

此图像有两个正确的高斯函数，均值在 250s 和 640s 上。（时间越长方差越大）

这张图片有一个正确的高斯分布，平均值在 428s.

问题： 是否有一些好的方法来检索高斯分布或至少显着降低上述数据之类的噪声？我不希望捕捉到淹没在噪音中的高斯函数。

Answer 1

我会使用 Kernel Density Estimation. I allows you to estimate the probability density directly from data, without too many assumptions about the underlying distribution. By changing the kernel bandwidth you can control how much smoothing you apply, which I assume could be tuned manually by visual inspection until you get something that meets your expectations. An example of KDE implementation in python using scikit-learn can be found here 来解决这个问题。

示例：

import numpy as np
from sklearn.neighbors import KernelDensity

# x is your original data
x = ...
# Adjust bandwidth to get the smoothness to your liking
bandwidth = ...

kde = KernelDensity(kernel='gaussian', bandwidth=bandwidth).fit(x)
support = np.linspace(min(x), max(x), 1000)
density = kde.score_samples(support)

估计过滤后的分布后，您可以对其进行分析并使用 this.

之类的方法识别峰值

from scipy.signal import find_peaks

# You can tweak with the other arguments of the 'find_peaks' function
# in order to fine-tune the extracted peaks according to your PDF
peaks = find_peaks(density)

免责声明：这是一个或多或少的高水平回答，因为你的问题也是高水平的。我假设您知道自己在代码方面做什么，并且只是在寻找想法。但是，如果您需要任何具体的帮助，请向我们展示一些代码以及到目前为止您尝试过的内容，以便我们提供更具体的信息。

Answer 2

我建议看一下高斯混合估计

https://scikit-learn.org/stable/modules/mixture.html#gmm

"A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters."

Answer 3

正如@Pasa 所指出的，您可以使用 Kernel Density Estimation 来做到这一点。 scipy.stats.gaussian_kde 可以轻松做到这一点。语法如下例所示，它生成 3 个高斯分布，将它们叠加，并添加一些噪声，然后使用 gaussian_kde 估计高斯曲线，然后绘制所有内容以进行演示。

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats.kde import gaussian_kde

# Create three Gaussian curves and add some noise behind them
norm1 = np.random.normal(loc=10.0, size=5000, scale=1.1)
norm2 = np.random.normal(loc=5.0, size=3000)
norm3 = np.random.normal(loc=14.0, size=1000)
noise = np.random.rand(8000)*18
norm = np.concatenate((norm1, norm2, norm3, noise))

# The plotting is purely for demonstration
fig = plt.figure(dpi=300, figsize=(10,6))
plt.hist(norm, facecolor=(0, 0.4, 0.8), bins=200, rwidth=0.8, normed=True, alpha=0.3)
plt.xlim([0.0, 18.0])

# This is the relevant part, modifier modifies the estimation,
# lower values follow the data more closesly, higher more loosely
modifier= 0.03
kde = gaussian_kde(norm, modifier)

# Plots the KDE output for demonstration
kde_x = np.linspace(0, 18, 10000)
plt.plot(kde_x, kde(kde_x), 'k--', linewidth = 1.0)
plt.title("KDE example", fontsize=17)
plt.show()

您会注意到，如您所料，以 10.0 为中心的最明显的高斯峰的估计最强。估计的 'sharpness' 可以通过更改传递给 gaussian_kde 构造函数的 modifier 变量（在示例中修改内核带宽）来修改。较低的值将产生 'sharper' 估计值，较高的值将产生 'smoother' 估计值。另请注意 gaussian_kde returns 标准化 值。

如何估计噪声层后面的高斯分布？

How to estimate gaussian distributions behind a noise layer?

python

statistics

signal-processing

distribution

matplotlib