从 NumPy 或 SciPy 中的 N dim 直方图中获取单个值

Question

假设我有这样的数据：

x = np.random.randn(4, 100000) 我适合直方图 hist = np.histogramdd(x, density=True) 我想要的是得到数字g的概率，例如g=0.1。假设一些假设函数 foo 那么。

g = 0.1
prob = foo(hist, g)
print(prob)
>> 0.2223124214

我怎么能做这样的事情，在其中我可以得到拟合直方图的单个或数字向量的概率？尤其是N维的直方图

Answer 1

histogramdd 占用 O(r^D) 内存，除非您有非常大的数据集或非常小的维度，否则您的估计会很差。考虑您的示例数据，4-D space 中的 100k 个点，默认直方图将为 10 x 10 x 10 x 10，因此它将有 10k 个 bin。

x = np.random.randn(4, 100000)
hist = np.histogramdd(x.transpose(), density=True)
np.mean(hist[0] == 0)

给出了 0.77 附近的一些东西，这意味着直方图中的 77% 个 bin 没有点。

您可能想要平滑分布。除非你有充分的理由不这样做，否则我建议你使用 Gaussian kernel-density Estimate

x = np.random.randn(4, 100000) # d x n array
f = scipy.stats.gaussian_kde(x) # d-dimensional PDF
f([1,2,3,4]) # evaluate the PDF in a given point

Getting single value from the N dim histogram in NumPy or SciPy