使用 Matplotlib 或 Pandas 在 Python 中绘制直方图

Question

我已经离开了这个论坛上的不同帖子，但我找不到我所看到的行为的答案。

我有一个 csv 文件，其中 header 有许多条目，每个条目有 300 分。对于每个字段（csv 文件的列），我想绘制一个直方图。 x 轴包含该列上的元素，y-axis 应该包含落在每个 bin 内的样本数。因为我有 300 个点，所有箱子中的样本总数加在一起应该是 300，所以 y-axis 应该从 0 到，比方说，50（只是一个例子）。然而，这些值是巨大的 (400e8)，这是没有意义的。

table 样本点mydata

1 | 250.23e-9 2 | 250.123e-9 ... | ... 300 | 251.34e-9

请检查下面我的代码。我正在使用 pandas 打开 csv 和 Matplotlib 以供其余使用。


    df=pd.read_csv("/home/pcardoso/raw_data/myData.csv")
    
    # Figure parameters
    figPath='/home/pcardoso/scripts/python/matplotlib/figures/'
    figPrefix='hist_'           # Prefix to the name of the file.
    figSuffix='_something'      # Suffix to the name of the file.
    figString=''    # Full string passed as the figure name to be saved
    
    precision=3
    num_bins = 50
    
    columns=list(df)
    
    for fieldName in columns:
    
        vectorData=df[fieldName]
        
        # statistical data
        mu = np.mean(vectorData)  # mean of distribution
        sigma = np.std(vectorData)  # standard deviation of distribution
    
        # Create plot instance
        fig, ax = plt.subplots()
    
        # Histogram
        n, bins, patches = ax.hist(vectorData, num_bins, density='True',alpha=0.75,rwidth=0.9, label=fieldName)
        ax.legend()
        
        # Best-fit curve
        y=mlab.normpdf(bins, mu, sigma)
        ax.plot(bins, y, '--')
        
        # Setting axis names, grid and title
        ax.set_xlabel(fieldName)
        ax.set_ylabel('Number of points')
        ax.set_title(fieldName + ': $\mu=$' + eng_notation(mu,precision) + ', $\sigma=$' + eng_notation(sigma,precision))
        ax.grid(True, alpha=0.2)
        
        fig.tight_layout()      # Tweak spacing to prevent clipping of ylabel
        
        # Saving figure
        figString=figPrefix + fieldName +figSuffix
        fig.savefig(figPath + figString)
    
    plt.show()
    
    plt.close(fig)

总而言之，我想知道如何正确设置 y-axis 值。

编辑：2020 年 7 月 6 日

编辑 2020 年 6 月 8 日我希望密度估计器遵循这样的情节：

提前致谢。最好的祝福，佩德罗

Answer 1

不要使用 density='True'，与该选项一样，显示的值是 bin 中的成员数除以 bin 的宽度。如果该宽度很小（如您的 x 值相当小的情况，则值会变大。

编辑： 好的，要取消标准化曲线，您需要将其乘以点数和一个箱子的宽度。我做了一个更简化的例子：

from numpy.random import normal
from scipy.stats import norm
import pylab

N = 300
sigma = 10.0
B = 30

def main():
    x = normal(0, sigma, N)

    h, bins, _ = pylab.hist(x, bins=B, rwidth=0.8)
    bin_width = bins[1] - bins[0]

    h_n = norm.pdf(bins[:-1], 0, sigma) * N * bin_width
    pylab.plot(bins[:-1], h_n)

if __name__ == "__main__":
    main()

使用 Matplotlib 或 Pandas 在 Python 中绘制直方图

Plotting histograms in Python using Matplotlib or Pandas

python

matplotlib

histogram

pandas

table 样本点mydata

使用 Matplotlib 或 Pandas 在 Python 中绘制直方图

Plotting histograms in Python using Matplotlib or Pandas

python

matplotlib

histogram

pandas

table 样本 点mydata

table 样本点mydata