直方图 Matplotlib 与 Numpy

Question

我正在尝试从一维直方图中获取 1-sigma（或 2-sigma）值。我不需要情节，但只是为了确保它是正确的，我决定绘制它。但与 matplotlib 直方图相比，它有点偏差。这是一个简单的 MWE。这是正确的吗？

import numpy as np
import matplotlib.pyplot as plt

a=np.array([-100,100,9,-9,2,-2,3,-3,5,-5])

#matplotlib histogram
plt.figure(),plt.hist(a)

#numpy histogram
ha,ba = np.histogram(a)
plt.figure()
plt.plot(ba[:-1],ha)
sigma = 1
sigmaleft  = ha.mean() - sigma * ha.std()
sigmaright = ha.mean() + sigma * ha.std()
print([sigmaleft, sigmaright])

Answer 1

ha,ba = np.histogram(a)中返回的变量ba是binedges，这就是为什么比直方图数据[=17]多了一个元素=].要绘制 "correctly"，请将 x 轴设置为 bin 中心：

bc = 0.5 * (ba[:-1] + ba[1:])
plt.bar(bc, ha)

纯属娱乐，也可以这样写

bc = np.lib.stride_tricks.as_strided(ba, strides=ba.strides * 2, shape=(2, ba.size - 1)).mean(0)

综上所述，ha.std 不是数据标准差的良好近似值，ha.mean 是均值。 bin 计数只是权重，而数据在 x 轴上编码。平均值可以近似为

approxMean = (bc * ha).sum() / ha.sum()

同样，您可以对标准偏差执行以下操作：

approxStd = np.sqrt(((bc - approxMean)**2 * ha).sum() / ha.sum())

您也可以使用替代公式：

approxStd = np.sqrt((bc**2 * ha).sum() / ha.sum() - ((bc * ha) / ha.sum()).sum()**2)

在所有情况下，仅当您无权访问真实数据时才执行此操作。计算均值和标准差将比直方图准确得多。

直方图 Matplotlib 与 Numpy

Histograms Matplotlib vs Numpy

python

numpy

matplotlib

histogram