为什么我在绘制分布时得到多条线？

Question

我有一些数据，我正试图为其拟合正态分布和对数正态分布。 df.head(10)

year    Q
1885     7241
1886     9164
1887     7407
1888     6870
1889     9855
1890    11887
1891     8827
1892     7546
1893     8498
1894    16757
Name: Q, dtype: int64

拟合分布

from scipy import stats
mean = df['Q'].mean()
std = df['Q'].std()
print(mean, std)
6636.172413793103 3130.779541854595

#Fitting
distnormal = stats.norm.pdf(df['Q'], loc = mean, scale = std)
distlognormal = stats.pearson3.pdf(df['Q'], skew = 1, loc = mean, scale = std)

# Plotting
df.hist(bins=10, edgecolor='#4aaaaa', density = True)
plt.plot(df['Q'], distnormal, color = 'red')
plt.plot(df['Q'], distlognormal, color = 'blue')
plt.show()

但是我得到的是这样的情节，线条太多了。我怎样才能正确地适应分布？

Answer 1

您将 df['Q'] 作为 x 参数传递给 plt.plot。正如您的数据片段所示，df['Q'] 中的值未排序 - 这就是问题的原因。在使用它绘图之前，尝试按 Q 列对数据框进行排序。

为什么我在绘制分布时得到多条线？

Why am I getting multiple lines while plotting distributions?

python

statistics

distribution

matplotlib

scipy

拟合分布