Seaborn 如何在 sns.catplot 中添加每个 HUE 的样本数

Question

我有一个 catplot 绘图使用：

s = sns.catplot(x="type", y="val", hue="Condition", kind='box', data=df)

然而，"Condition" 每个色调的大小不相等：蓝色有 n=8 个样本，绿色有 n=11 个样本。

将此信息添加到图表的最佳方式是什么？

Answer 1

这与本质上是相同的解决方案，我对其进行了一些简化，因为：

df = sns.load_dataset('tips')
x_col='day'
y_col='total_bill'
order=['Thur','Fri','Sat','Sun']
hue_col='smoker'
hue_order=['Yes','No']
width=0.8


g = sns.catplot(kind="box", x=x_col, y=y_col, order=order, hue=hue_col, hue_order=hue_order, data=df)
ax = g.axes[0,0]

# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()

pos = [x+o for x in np.arange(len(order)) for o in offsets]

counts = df.groupby([x_col,hue_col])[y_col].size()
counts = counts.reindex(pd.MultiIndex.from_product([order,hue_order]))
medians = df.groupby([x_col,hue_col])[y_col].median()
medians = medians.reindex(pd.MultiIndex.from_product([order,hue_order]))

for p,n,m in zip(pos,counts,medians):
    if not np.isnan(m):
        ax.annotate('N={:.0f}'.format(n), xy=(p, m), xycoords='data', ha='center', va='bottom')

Seaborn 如何在 sns.catplot 中添加每个 HUE 的样本数

Seaborn how to add number of samples per HUE in sns.catplot

python

visualization

data-visualization

pandas

seaborn