Seaborn 如何在 sns.catplot 中添加每个 HUE 的样本数
Seaborn how to add number of samples per HUE in sns.catplot
我有一个 catplot 绘图使用:
s = sns.catplot(x="type", y="val", hue="Condition", kind='box', data=df)
然而,"Condition" 每个色调的大小不相等:
蓝色有 n=8 个样本,绿色有 n=11 个样本。
将此信息添加到图表的最佳方式是什么?
这与 本质上是相同的解决方案,我对其进行了一些简化,因为:
df = sns.load_dataset('tips')
x_col='day'
y_col='total_bill'
order=['Thur','Fri','Sat','Sun']
hue_col='smoker'
hue_order=['Yes','No']
width=0.8
g = sns.catplot(kind="box", x=x_col, y=y_col, order=order, hue=hue_col, hue_order=hue_order, data=df)
ax = g.axes[0,0]
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
pos = [x+o for x in np.arange(len(order)) for o in offsets]
counts = df.groupby([x_col,hue_col])[y_col].size()
counts = counts.reindex(pd.MultiIndex.from_product([order,hue_order]))
medians = df.groupby([x_col,hue_col])[y_col].median()
medians = medians.reindex(pd.MultiIndex.from_product([order,hue_order]))
for p,n,m in zip(pos,counts,medians):
if not np.isnan(m):
ax.annotate('N={:.0f}'.format(n), xy=(p, m), xycoords='data', ha='center', va='bottom')
我有一个 catplot 绘图使用:
s = sns.catplot(x="type", y="val", hue="Condition", kind='box', data=df)
然而,"Condition" 每个色调的大小不相等: 蓝色有 n=8 个样本,绿色有 n=11 个样本。
将此信息添加到图表的最佳方式是什么?
这与
df = sns.load_dataset('tips')
x_col='day'
y_col='total_bill'
order=['Thur','Fri','Sat','Sun']
hue_col='smoker'
hue_order=['Yes','No']
width=0.8
g = sns.catplot(kind="box", x=x_col, y=y_col, order=order, hue=hue_col, hue_order=hue_order, data=df)
ax = g.axes[0,0]
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
pos = [x+o for x in np.arange(len(order)) for o in offsets]
counts = df.groupby([x_col,hue_col])[y_col].size()
counts = counts.reindex(pd.MultiIndex.from_product([order,hue_order]))
medians = df.groupby([x_col,hue_col])[y_col].median()
medians = medians.reindex(pd.MultiIndex.from_product([order,hue_order]))
for p,n,m in zip(pos,counts,medians):
if not np.isnan(m):
ax.annotate('N={:.0f}'.format(n), xy=(p, m), xycoords='data', ha='center', va='bottom')