密度函数之和(基于直方图)不等于1
The sum of density function (based on histogram) is not equal to 1
我正在尝试生成密度函数,但生成的直方图的分量之和似乎并不接近 1。
这是什么原因以及如何使密度函数的总和接近(即使不完全等于)1?
最小示例:
import numpy as np
x = np.random.normal(0, 0.5, 1000) # mu, sigma, num
bins = np.linspace(min(x), max(x), num=50) # lower and upper bounds
hist, hist_bins = np.histogram(x, bins=bins, density = True)
print(np.sum(hist))
>>> 10.4614
如果我没有指定 bins 边缘,输出会更小但仍然大于 1:
import numpy as np
x = np.random.normal(0, 0.5, 1000) # mu, sigma, num
hist, hist_bins = np.histogram(x, density = True)
print(np.sum(hist))
>>> 3.1332
此行为的原因 is stated in the docs:
density: bool, optional
If False, the result will contain the number
of samples in each bin. If True, the result is the value of the
probability density function at the bin, normalized such that the
integral over the range is 1. Note that the sum of the histogram
values will not be equal to 1 unless bins of unity width are chosen;
it is not a probability mass function.
另外,提供一个样本,显示直方图的总和不等于1.0:
import numpy as np
a = np.arange(5)
hist, bin_edges = np.histogram(a, density=True)
print(hist)
# hist --> [0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5]
print(hist.sum())
# --> 2.4999999999999996
print(np.sum(hist * np.diff(bin_edges)))
# --> 1.0
所以我们可以将其应用于您的代码片段:
x = np.random.normal(0, 0.5, 1000) # mu, sigma, num
bins = np.linspace(min(x), max(x), num=50) # lower and upper bounds
hist, hist_bins = np.histogram(x, bins=bins, density=True)
print(hist)
print(np.sum(hist))
print(np.sum(hist * np.diff(hist_bins)))
# --> 1.0
此外,您应该考虑您是如何选择垃圾箱的,并确保使用 .linspace()
是一种合理的方式。
我正在尝试生成密度函数,但生成的直方图的分量之和似乎并不接近 1。
这是什么原因以及如何使密度函数的总和接近(即使不完全等于)1?
最小示例:
import numpy as np
x = np.random.normal(0, 0.5, 1000) # mu, sigma, num
bins = np.linspace(min(x), max(x), num=50) # lower and upper bounds
hist, hist_bins = np.histogram(x, bins=bins, density = True)
print(np.sum(hist))
>>> 10.4614
如果我没有指定 bins 边缘,输出会更小但仍然大于 1:
import numpy as np
x = np.random.normal(0, 0.5, 1000) # mu, sigma, num
hist, hist_bins = np.histogram(x, density = True)
print(np.sum(hist))
>>> 3.1332
此行为的原因 is stated in the docs:
density: bool, optional
If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.
另外,提供一个样本,显示直方图的总和不等于1.0:
import numpy as np
a = np.arange(5)
hist, bin_edges = np.histogram(a, density=True)
print(hist)
# hist --> [0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5]
print(hist.sum())
# --> 2.4999999999999996
print(np.sum(hist * np.diff(bin_edges)))
# --> 1.0
所以我们可以将其应用于您的代码片段:
x = np.random.normal(0, 0.5, 1000) # mu, sigma, num
bins = np.linspace(min(x), max(x), num=50) # lower and upper bounds
hist, hist_bins = np.histogram(x, bins=bins, density=True)
print(hist)
print(np.sum(hist))
print(np.sum(hist * np.diff(hist_bins)))
# --> 1.0
此外,您应该考虑您是如何选择垃圾箱的,并确保使用 .linspace()
是一种合理的方式。