Seaborn distplot y 轴归一化错误的刻度标签
Seaborn distplot y-axis normalisation wrong ticklabels
请注意,我已经检查过 and this question。
所以,我正在使用 distplot
在单独的子图上绘制一些直方图:
import numpy as np
#import netCDF4 as nc # used to get p0_dict
import matplotlib.pyplot as plt
from collections import OrderedDict
import seaborn.apionly as sns
import cPickle as pickle
'''
LINK TO PICKLE
https://drive.google.com/file/d/0B8Xks3meeDq0aTFYcTZEZGFFVk0/view?usp=sharing
'''
p0_dict = pickle.load(open('/path/to/pickle/test.dat', 'r'))
fig = plt.figure(figsize = (15,10))
ax = plt.gca()
j=1
for region, val in p0_dict.iteritems():
val = np.asarray(val)
subax = plt.subplot(5,5,j)
print region
try:
sns.distplot(val, bins=11, hist=True, kde=True, rug=True,
ax = subax, color = 'k', norm_hist=True)
except Exception as Ex:
print Ex
subax.set_title(region)
subax.set_xlim(0, 1) # the data varies from 0 to 1
j+=1
plt.subplots_adjust(left = 0.06, right = 0.99, bottom = 0.07,
top = 0.92, wspace = 0.14, hspace = 0.6)
fig.text(0.5, 0.02, r'$ P(W) = 0,1 $', ha ='center', fontsize = 15)
fig.text(0.02, 0.5, '% occurrence', ha ='center',
rotation='vertical', fontsize = 15)
# obviously I'd multiply the fractional ticklabels by 100 to get
# the percentage...
plt.show()
我期望 KDE 曲线下的面积总和为 1,并且 y 轴刻度标签反映这一点。但是,我得到以下信息:
如您所见,y 轴刻度标签不在预期范围内 [0,1]。转动 on/off norm_hist
或 kde
不会改变这一点。作为参考,两者都关闭的输出:
验证一下:
aus = np.asarray(p0_dict['AUS'])
aus_bins = np.histogram(aus, bins=11)[0]
plt.subplot(121)
plt.hist(aus,11)
plt.subplot(122)
plt.bar(range(0,11),aus_bins.astype(np.float)/np.sum(aus_bins))
plt.show()
本例中的 y 刻度标签正确反映了标准化直方图的那些。
我做错了什么?
感谢您的帮助。
y轴是密度,不是概率。我认为您期望归一化直方图显示概率质量函数,其中条形高度之和等于 1。但这是错误的;归一化确保条形高度 乘以条形宽度 的总和等于 1。这就是确保归一化直方图与核密度估计相当的原因,核密度估计被归一化以便面积曲线下等于 1.
请注意,我已经检查过
所以,我正在使用 distplot
在单独的子图上绘制一些直方图:
import numpy as np
#import netCDF4 as nc # used to get p0_dict
import matplotlib.pyplot as plt
from collections import OrderedDict
import seaborn.apionly as sns
import cPickle as pickle
'''
LINK TO PICKLE
https://drive.google.com/file/d/0B8Xks3meeDq0aTFYcTZEZGFFVk0/view?usp=sharing
'''
p0_dict = pickle.load(open('/path/to/pickle/test.dat', 'r'))
fig = plt.figure(figsize = (15,10))
ax = plt.gca()
j=1
for region, val in p0_dict.iteritems():
val = np.asarray(val)
subax = plt.subplot(5,5,j)
print region
try:
sns.distplot(val, bins=11, hist=True, kde=True, rug=True,
ax = subax, color = 'k', norm_hist=True)
except Exception as Ex:
print Ex
subax.set_title(region)
subax.set_xlim(0, 1) # the data varies from 0 to 1
j+=1
plt.subplots_adjust(left = 0.06, right = 0.99, bottom = 0.07,
top = 0.92, wspace = 0.14, hspace = 0.6)
fig.text(0.5, 0.02, r'$ P(W) = 0,1 $', ha ='center', fontsize = 15)
fig.text(0.02, 0.5, '% occurrence', ha ='center',
rotation='vertical', fontsize = 15)
# obviously I'd multiply the fractional ticklabels by 100 to get
# the percentage...
plt.show()
我期望 KDE 曲线下的面积总和为 1,并且 y 轴刻度标签反映这一点。但是,我得到以下信息:
如您所见,y 轴刻度标签不在预期范围内 [0,1]。转动 on/off norm_hist
或 kde
不会改变这一点。作为参考,两者都关闭的输出:
验证一下:
aus = np.asarray(p0_dict['AUS'])
aus_bins = np.histogram(aus, bins=11)[0]
plt.subplot(121)
plt.hist(aus,11)
plt.subplot(122)
plt.bar(range(0,11),aus_bins.astype(np.float)/np.sum(aus_bins))
plt.show()
本例中的 y 刻度标签正确反映了标准化直方图的那些。
我做错了什么?
感谢您的帮助。
y轴是密度,不是概率。我认为您期望归一化直方图显示概率质量函数,其中条形高度之和等于 1。但这是错误的;归一化确保条形高度 乘以条形宽度 的总和等于 1。这就是确保归一化直方图与核密度估计相当的原因,核密度估计被归一化以便面积曲线下等于 1.