在直方图上展开数据 matplotlib jupyter

Question

请注意，我是 matplotlib 的新手，我正在尝试分散直方图中的数据，如下所示。以下是我编码的结果：

我想达到的效果是这样的：

我尝试展开垃圾箱，但它只会降低频率而不会展开图表。下面是我的代码：

#Loading data
url = 'https://raw.githubusercontent.com/diggledoot/dataset/master/uber-raw-data-apr14.csv'
latlong = pd.read_csv(url)

#Rounding off data for more focused results
n=2
latlong['Lon']=[round(x,n) for x in latlong['Lon']]
latlong['Lat']=[round(x,n) for x in latlong['Lat']]

#Plot
plt.figure(figsize=(8,6))
plt.title('Rides based on latitude')
plt.hist(latlong['Lat'],bins=100,color='cyan')
plt.xlabel('Latitude')
plt.ylabel('Frequency')
plt.xticks(np.arange(round(latlong.Lat.min(),1),round(latlong.Lat.max(),1),0.1),rotation=45)
plt.show()

如何 space 以类似于我想要实现的直方图的方式输出 x 刻度？

Answer 1

如果你这样做

frequency, bins = np.histogram(latlong['Lat'], bins=20)
print(frequency)
print(bins)

你得到

[     1      7     12     18    301  35831 504342  22081   1256    580
     63     12      8      1      2      0      0      0      0      1]
[40.07   40.1725 40.275  40.3775 40.48   40.5825 40.685  40.7875 40.89
 40.9925 41.095  41.1975 41.3    41.4025 41.505  41.6075 41.71   41.8125
 41.915  42.0175 42.12  ]

你可以看到有些计数离均值很远。

您可以通过在指定的最小值和最大值之间剪裁您感兴趣的变量然后绘制直方图来忽略那些远离均值 bins 的东西，就像这样

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#Loading data
url = 'https://raw.githubusercontent.com/diggledoot/dataset/master/uber-raw-data-apr14.csv'
latlong = pd.read_csv(url)

#Plot
plt.figure(figsize=(8,6))
plt.title('Rides based on latitude')
plt.hist(np.clip(latlong['Lat'], 40.6, 40.9),bins=50,color='cyan')
plt.xlabel('Latitude')
plt.ylabel('Frequency')
plt.show()

这将产生以下结果

在直方图上展开数据 matplotlib jupyter

Spread out data on the histogram matplotlib jupyter

python

matplotlib

histogram