如何在 seaborn 的条形图顶部添加百分比
How to add percentages on top of bars in seaborn
鉴于以下计数图,我如何将百分比放在条形图的顶部?
import seaborn as sns
sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
ax = sns.countplot(x="class", hue="who", data=titanic)
例如对于“First”,我想要 total First men/total First,total First women/total First,total First children/total First 在它们各自的柱子之上。
seaborn.catplot
组织函数 [=27=] FacetGrid,它使您可以访问无花果、斧头及其补丁。如果在没有绘制任何其他内容时添加标签,您就会知道哪些条形图来自哪些变量。从@LordZsolt 的回答中,我选择了 catplot
的 order
参数:我喜欢将其明确化,因为现在我们不再依赖使用我们认为是默认顺序的 barplot 函数。
import seaborn as sns
from itertools import product
titanic = sns.load_dataset("titanic")
class_order = ['First','Second','Third']
hue_order = ['child', 'man', 'woman']
bar_order = product(class_order, hue_order)
catp = sns.catplot(data=titanic, kind='count',
x='class', hue='who',
order = class_order,
hue_order = hue_order )
# As long as we haven't plotted anything else into this axis,
# we know the rectangles in it are our barplot bars
# and we know the order, so we can match up graphic and calculations:
spots = zip(catp.ax.patches, bar_order)
for spot in spots:
class_total = len(titanic[titanic['class']==spot[1][0]])
class_who_total = len(titanic[(titanic['class']==spot[1][0]) &
(titanic['who']==spot[1][1])])
height = spot[0].get_height()
catp.ax.text(spot[0].get_x(), height+3, '{:1.2f}'.format(class_who_total/class_total))
#checking the patch order, not for final:
#catp.ax.text(spot[0].get_x(), -3, spot[1][0][0]+spot[1][1][0])
产生
另一种方法是明确地进行子求和,例如用优秀的pandas
,用matplotlib
绘图,还自己做造型。 (尽管即使使用 matplotlib
绘图函数,您也可以从 sns
上下文中获得相当多的样式。试试看 -- )
在 cphlewis's 解决方案的帮助下,我设法将正确的百分比放在图表顶部,因此 类 总和为一。
for index, category in enumerate(categorical):
plt.subplot(plot_count, 1, index + 1)
order = sorted(data[category].unique())
ax = sns.countplot(category, data=data, hue="churn", order=order)
ax.set_ylabel('')
bars = ax.patches
half = int(len(bars)/2)
left_bars = bars[:half]
right_bars = bars[half:]
for left, right in zip(left_bars, right_bars):
height_l = left.get_height()
height_r = right.get_height()
total = height_l + height_r
ax.text(left.get_x() + left.get_width()/2., height_l + 40, '{0:.0%}'.format(height_l/total), ha="center")
ax.text(right.get_x() + right.get_width()/2., height_r + 40, '{0:.0%}'.format(height_r/total), ha="center")
但是,该解决方案假设有 2 个选项(男人、女人)而不是 3 个(男人、女人、child)。
由于 Axes.patches
的排列方式很奇怪(首先是所有蓝色条,然后是所有绿色条,然后是所有红色条),您必须将它们分开并相应地拉回一起。
如果您的绘图中有 'hue' 参数,with_hue 函数将在条形图上绘制百分比。它以实际的图,特征,特征中的Number_of_categories,和hue_categories(色调特征中的类别数)作为参数。
without_hue 函数将在条形图上绘制百分比,如果您有正常图。它以实际图形和特征作为参数。
def with_hue(ax, feature, Number_of_categories, hue_categories):
a = [p.get_height() for p in ax.patches]
patch = [p for p in ax.patches]
for i in range(Number_of_categories):
total = feature.value_counts().values[i]
for j in range(hue_categories):
percentage = '{:.1f}%'.format(100 * a[(j*Number_of_categories + i)]/total)
x = patch[(j*Number_of_categories + i)].get_x() + patch[(j*Number_of_categories + i)].get_width() / 2 - 0.15
y = patch[(j*Number_of_categories + i)].get_y() + patch[(j*Number_of_categories + i)].get_height()
ax.annotate(percentage, (x, y), size = 12)
def without_hue(ax, feature):
total = len(feature)
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/total)
x = p.get_x() + p.get_width() / 2 - 0.05
y = p.get_y() + p.get_height()
ax.annotate(percentage, (x, y), size = 12)
答案是从 jrjc 和 cphlewis 的答案中得到启发,但更简单易懂
sns.set(style="whitegrid")
plt.figure(figsize=(8,5))
total = float(len(train_df))
ax = sns.countplot(x="event", hue="event", data=train_df)
plt.title('Data provided for each event', fontsize=20)
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/total)
x = p.get_x() + p.get_width()
y = p.get_height()
ax.annotate(percentage, (x, y),ha='center')
plt.show()
如果色调类别超过 2 个,我就无法使用这些方法。
我使用了@Lord Zsolt 的方法,增加了任意数量的色调类别。
def barPerc(df,xVar,ax):
'''
barPerc(): Add percentage for hues to bar plots
args:
df: pandas dataframe
xVar: (string) X variable
ax: Axes object (for Seaborn Countplot/Bar plot or
pandas bar plot)
'''
# 1. how many X categories
## check for NaN and remove
numX=len([x for x in df[xVar].unique() if x==x])
# 2. The bars are created in hue order, organize them
bars = ax.patches
## 2a. For each X variable
for ind in range(numX):
## 2b. Get every hue bar
## ex. 8 X categories, 4 hues =>
## [0, 8, 16, 24] are hue bars for 1st X category
hueBars=bars[ind:][::numX]
## 2c. Get the total height (for percentages)
total = sum([x.get_height() for x in hueBars])
# 3. Print the percentage on the bars
for bar in hueBars:
ax.text(bar.get_x() + bar.get_width()/2.,
bar.get_height(),
f'{bar.get_height()/total:.0%}',
ha="center",va="bottom")
如您所见,此方法满足了原始发布者的要求:
I want total First men/total First, total First women/total First, and total First children/total First on top of their respective bars.
也就是说,添加的值是每个色调的 百分比(对于每个 X 类别)- 因此 对于每个 X 类别百分比相加为 100%
(这也适用于 Seaborn 的 .barplot())
- 以
matplotlib 3.4.2
开头的最简单的选择是使用 matplotlib.pyplot.bar_label
。
- 有关使用
.bar_label
的更多选项和信息,请参阅此 answer。
labels
的列表理解使用赋值表达式 (:=
),这需要 python >= 3.8
。这可以重写为标准 for 循环。
labels = [f'{v.get_height()/data.who.count()*100:0.1f}' for v in c]
在没有赋值表达式的情况下工作。
- 水平条的注释应使用
v.get_width()
。
- 示例中的注释占总数的百分比。要根据组的总数添加注释,请参阅此 answer。
- 另见 How to plot percentage with seaborn distplot / histplot / displot
导入和示例 DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
# load the data
data = sns.load_dataset('titanic')[['survived', 'class', 'who']]
survived class who
0 0 Third man
1 1 First woman
2 1 Third woman
轴水平图
- 适用于
seaborn.countplot
或 seaborn.barplot
# plot
ax = sns.countplot(x="class", hue="who", data=data)
ax.set(ylabel='Bar Count', title='Bar Count and Percent of Total')
# add annotations
for c in ax.containers:
# custom label calculates percent and add an empty string so 0 value bars don't have a number
labels = [f'{h/data.who.count()*100:0.1f}%' if (h := v.get_height()) > 0 else '' for v in c]
ax.bar_label(c, labels=labels, label_type='edge')
plt.show()
图级图
fg = sns.catplot(data=data, kind='count', x='class', hue='who', col='survived')
fg.fig.subplots_adjust(top=0.9)
fg.fig.suptitle('Bar Count and Percent of Total')
for ax in fg.axes.ravel():
# add annotations
for c in ax.containers:
# custom label calculates percent and add an empty string so 0 value bars don't have a number
labels = [f'{h/data.who.count()*100:0.1f}%' if (h := v.get_height()) > 0 else '' for v in c]
ax.bar_label(c, labels=labels, label_type='edge')
plt.show()
鉴于以下计数图,我如何将百分比放在条形图的顶部?
import seaborn as sns
sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
ax = sns.countplot(x="class", hue="who", data=titanic)
例如对于“First”,我想要 total First men/total First,total First women/total First,total First children/total First 在它们各自的柱子之上。
seaborn.catplot
组织函数 [=27=] FacetGrid,它使您可以访问无花果、斧头及其补丁。如果在没有绘制任何其他内容时添加标签,您就会知道哪些条形图来自哪些变量。从@LordZsolt 的回答中,我选择了 catplot
的 order
参数:我喜欢将其明确化,因为现在我们不再依赖使用我们认为是默认顺序的 barplot 函数。
import seaborn as sns
from itertools import product
titanic = sns.load_dataset("titanic")
class_order = ['First','Second','Third']
hue_order = ['child', 'man', 'woman']
bar_order = product(class_order, hue_order)
catp = sns.catplot(data=titanic, kind='count',
x='class', hue='who',
order = class_order,
hue_order = hue_order )
# As long as we haven't plotted anything else into this axis,
# we know the rectangles in it are our barplot bars
# and we know the order, so we can match up graphic and calculations:
spots = zip(catp.ax.patches, bar_order)
for spot in spots:
class_total = len(titanic[titanic['class']==spot[1][0]])
class_who_total = len(titanic[(titanic['class']==spot[1][0]) &
(titanic['who']==spot[1][1])])
height = spot[0].get_height()
catp.ax.text(spot[0].get_x(), height+3, '{:1.2f}'.format(class_who_total/class_total))
#checking the patch order, not for final:
#catp.ax.text(spot[0].get_x(), -3, spot[1][0][0]+spot[1][1][0])
产生
另一种方法是明确地进行子求和,例如用优秀的pandas
,用matplotlib
绘图,还自己做造型。 (尽管即使使用 matplotlib
绘图函数,您也可以从 sns
上下文中获得相当多的样式。试试看 -- )
在 cphlewis's 解决方案的帮助下,我设法将正确的百分比放在图表顶部,因此 类 总和为一。
for index, category in enumerate(categorical):
plt.subplot(plot_count, 1, index + 1)
order = sorted(data[category].unique())
ax = sns.countplot(category, data=data, hue="churn", order=order)
ax.set_ylabel('')
bars = ax.patches
half = int(len(bars)/2)
left_bars = bars[:half]
right_bars = bars[half:]
for left, right in zip(left_bars, right_bars):
height_l = left.get_height()
height_r = right.get_height()
total = height_l + height_r
ax.text(left.get_x() + left.get_width()/2., height_l + 40, '{0:.0%}'.format(height_l/total), ha="center")
ax.text(right.get_x() + right.get_width()/2., height_r + 40, '{0:.0%}'.format(height_r/total), ha="center")
但是,该解决方案假设有 2 个选项(男人、女人)而不是 3 个(男人、女人、child)。
由于 Axes.patches
的排列方式很奇怪(首先是所有蓝色条,然后是所有绿色条,然后是所有红色条),您必须将它们分开并相应地拉回一起。
with_hue 函数将在条形图上绘制百分比。它以实际的图,特征,特征中的Number_of_categories,和hue_categories(色调特征中的类别数)作为参数。
without_hue 函数将在条形图上绘制百分比,如果您有正常图。它以实际图形和特征作为参数。
def with_hue(ax, feature, Number_of_categories, hue_categories):
a = [p.get_height() for p in ax.patches]
patch = [p for p in ax.patches]
for i in range(Number_of_categories):
total = feature.value_counts().values[i]
for j in range(hue_categories):
percentage = '{:.1f}%'.format(100 * a[(j*Number_of_categories + i)]/total)
x = patch[(j*Number_of_categories + i)].get_x() + patch[(j*Number_of_categories + i)].get_width() / 2 - 0.15
y = patch[(j*Number_of_categories + i)].get_y() + patch[(j*Number_of_categories + i)].get_height()
ax.annotate(percentage, (x, y), size = 12)
def without_hue(ax, feature):
total = len(feature)
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/total)
x = p.get_x() + p.get_width() / 2 - 0.05
y = p.get_y() + p.get_height()
ax.annotate(percentage, (x, y), size = 12)
答案是从 jrjc 和 cphlewis 的答案中得到启发,但更简单易懂
sns.set(style="whitegrid")
plt.figure(figsize=(8,5))
total = float(len(train_df))
ax = sns.countplot(x="event", hue="event", data=train_df)
plt.title('Data provided for each event', fontsize=20)
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/total)
x = p.get_x() + p.get_width()
y = p.get_height()
ax.annotate(percentage, (x, y),ha='center')
plt.show()
如果色调类别超过 2 个,我就无法使用这些方法。
我使用了@Lord Zsolt 的方法,增加了任意数量的色调类别。
def barPerc(df,xVar,ax):
'''
barPerc(): Add percentage for hues to bar plots
args:
df: pandas dataframe
xVar: (string) X variable
ax: Axes object (for Seaborn Countplot/Bar plot or
pandas bar plot)
'''
# 1. how many X categories
## check for NaN and remove
numX=len([x for x in df[xVar].unique() if x==x])
# 2. The bars are created in hue order, organize them
bars = ax.patches
## 2a. For each X variable
for ind in range(numX):
## 2b. Get every hue bar
## ex. 8 X categories, 4 hues =>
## [0, 8, 16, 24] are hue bars for 1st X category
hueBars=bars[ind:][::numX]
## 2c. Get the total height (for percentages)
total = sum([x.get_height() for x in hueBars])
# 3. Print the percentage on the bars
for bar in hueBars:
ax.text(bar.get_x() + bar.get_width()/2.,
bar.get_height(),
f'{bar.get_height()/total:.0%}',
ha="center",va="bottom")
如您所见,此方法满足了原始发布者的要求:
I want total First men/total First, total First women/total First, and total First children/total First on top of their respective bars.
也就是说,添加的值是每个色调的 百分比(对于每个 X 类别)- 因此 对于每个 X 类别百分比相加为 100%
(这也适用于 Seaborn 的 .barplot())
- 以
matplotlib 3.4.2
开头的最简单的选择是使用matplotlib.pyplot.bar_label
。 - 有关使用
.bar_label
的更多选项和信息,请参阅此 answer。 labels
的列表理解使用赋值表达式 (:=
),这需要python >= 3.8
。这可以重写为标准 for 循环。labels = [f'{v.get_height()/data.who.count()*100:0.1f}' for v in c]
在没有赋值表达式的情况下工作。- 水平条的注释应使用
v.get_width()
。
- 示例中的注释占总数的百分比。要根据组的总数添加注释,请参阅此 answer。
- 另见 How to plot percentage with seaborn distplot / histplot / displot
导入和示例 DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
# load the data
data = sns.load_dataset('titanic')[['survived', 'class', 'who']]
survived class who
0 0 Third man
1 1 First woman
2 1 Third woman
轴水平图
- 适用于
seaborn.countplot
或seaborn.barplot
# plot
ax = sns.countplot(x="class", hue="who", data=data)
ax.set(ylabel='Bar Count', title='Bar Count and Percent of Total')
# add annotations
for c in ax.containers:
# custom label calculates percent and add an empty string so 0 value bars don't have a number
labels = [f'{h/data.who.count()*100:0.1f}%' if (h := v.get_height()) > 0 else '' for v in c]
ax.bar_label(c, labels=labels, label_type='edge')
plt.show()
图级图
fg = sns.catplot(data=data, kind='count', x='class', hue='who', col='survived')
fg.fig.subplots_adjust(top=0.9)
fg.fig.suptitle('Bar Count and Percent of Total')
for ax in fg.axes.ravel():
# add annotations
for c in ax.containers:
# custom label calculates percent and add an empty string so 0 value bars don't have a number
labels = [f'{h/data.who.count()*100:0.1f}%' if (h := v.get_height()) > 0 else '' for v in c]
ax.bar_label(c, labels=labels, label_type='edge')
plt.show()