频率和百分比不均匀组 sns barplot
frequency and percentage uneven groups sns barplot
我试图在 sns 条形图中按组显示相对百分比以及总频率。我比较的两组在大小上有很大不同,这就是为什么我在下面的函数中按组显示百分比。
这是我创建的示例数据框的语法,它在目标分类变量 ('item') 中具有与我的数据 ('groups') 相似的相对组大小。 'rand' 只是我用来制作 df 的一个变量。
# import pandas and seaborn
import pandas as pd
import seaborn as sns
import numpy as np
# create dataframe
foobar = pd.DataFrame(np.random.randn(100, 3), columns=('groups', 'item', 'rand'))
# get relative groupsizes
for row, val in enumerate(foobar.rand) :
if val > -1.2 :
foobar.loc[row, 'groups'] = 'A'
else:
foobar.loc[row, 'groups'] = 'B'
# assign categories that I am comparing graphically
if row < 20:
foobar.loc[row, 'item'] = 'Z'
elif row < 40:
foobar.loc[row, 'item'] = 'Y'
elif row < 60:
foobar.loc[row, 'item'] = 'X'
elif row < 80:
foobar.loc[row, 'item'] = 'W'
else:
foobar.loc[row, 'item'] = 'V'
这是我写的函数,它按组比较相对频率。它有一些默认变量,但我已经为这个问题重新分配了它们。
def percent_categorical(item, df=IA, grouper='Active Status') :
# plot categorical responses to an item ('column name')
# by percent by group ('diff column name w categorical data')
# select a data frame (default is IA)
# 'Active Status' is default grouper
# create df of item grouped by status
grouped = (df.groupby(grouper)[item]
# convert to percentage by group rather than total count
.value_counts(normalize=True)
# rename column
.rename('percentage')
# multiple by 100 for easier interpretation
.mul(100)
# change order from value to name
.reset_index()
.sort_values(item))
# create plot
PercPlot = sns.barplot(x=item,
y='percentage',
hue=grouper,
data=grouped,
palette='RdBu'
).set_xticklabels(
labels = grouped[item
].value_counts().index.tolist(), rotation=90)
#show plot
return PercPlot
函数和结果图如下:
percent_categorical('item', df=foobar, grouper='groups')
这很好,因为它允许我按组显示相对百分比。但是,我还想显示每个组的绝对数字,最好在图例中显示。在这种情况下,我希望它显示 A 组共有 89 名成员,B 组共有 11 名成员。
提前感谢您的帮助。
我通过拆分 groupby
操作解决了这个问题:一个用于获取百分比,一个用于计算对象的数量。
我调整了你的percent_catergorical
功能如下:
def percent_categorical(item, df=IA, grouper='Active Status') :
# plot categorical responses to an item ('column name')
# by percent by group ('diff column name w categorical data')
# select a data frame (default is IA)
# 'Active Status' is default grouper
# create groupby of item grouped by status
groupbase = df.groupby(grouper)[item]
# count the number of occurences
groupcount = groupbase.count()
# convert to percentage by group rather than total count
groupper = (groupbase.value_counts(normalize=True)
# rename column
.rename('percentage')
# multiple by 100 for easier interpretation
.mul(100)
# change order from value to name
.reset_index()
.sort_values(item))
# create plot
fig, ax = plt.subplots()
brplt = sns.barplot(x=item,
y='percentage',
hue=groupper,
data=groupper,
palette='RdBu',
ax=ax).set_xticklabels(
labels = grouper[item
].value_counts().index.tolist(), rotation=90)
# get the handles and the labels of the legend
# these are the bars and the corresponding text in the legend
thehandles, thelabels = ax.get_legend_handles_labels()
# for each label, add the total number of occurences
# you can get this from groupcount as the labels in the figure have
# the same name as in the values in column of your df
for counter, label in enumerate(thelabels):
# the new label looks like this (dummy name and value)
# 'XYZ (42)'
thelabels[counter] = label + ' ({})'.format(groupcount[label])
# add the new legend to the figure
ax.legend(thehandles, thelabels)
#show plot
return fig, ax, brplt
得到你的身材:
fig, ax, brplt = percent_categorical('item', df=foobar, grouper='groups')
结果图如下所示:
您可以随意更改此图例的外观,我只是添加括号作为示例。
我试图在 sns 条形图中按组显示相对百分比以及总频率。我比较的两组在大小上有很大不同,这就是为什么我在下面的函数中按组显示百分比。
这是我创建的示例数据框的语法,它在目标分类变量 ('item') 中具有与我的数据 ('groups') 相似的相对组大小。 'rand' 只是我用来制作 df 的一个变量。
# import pandas and seaborn
import pandas as pd
import seaborn as sns
import numpy as np
# create dataframe
foobar = pd.DataFrame(np.random.randn(100, 3), columns=('groups', 'item', 'rand'))
# get relative groupsizes
for row, val in enumerate(foobar.rand) :
if val > -1.2 :
foobar.loc[row, 'groups'] = 'A'
else:
foobar.loc[row, 'groups'] = 'B'
# assign categories that I am comparing graphically
if row < 20:
foobar.loc[row, 'item'] = 'Z'
elif row < 40:
foobar.loc[row, 'item'] = 'Y'
elif row < 60:
foobar.loc[row, 'item'] = 'X'
elif row < 80:
foobar.loc[row, 'item'] = 'W'
else:
foobar.loc[row, 'item'] = 'V'
这是我写的函数,它按组比较相对频率。它有一些默认变量,但我已经为这个问题重新分配了它们。
def percent_categorical(item, df=IA, grouper='Active Status') :
# plot categorical responses to an item ('column name')
# by percent by group ('diff column name w categorical data')
# select a data frame (default is IA)
# 'Active Status' is default grouper
# create df of item grouped by status
grouped = (df.groupby(grouper)[item]
# convert to percentage by group rather than total count
.value_counts(normalize=True)
# rename column
.rename('percentage')
# multiple by 100 for easier interpretation
.mul(100)
# change order from value to name
.reset_index()
.sort_values(item))
# create plot
PercPlot = sns.barplot(x=item,
y='percentage',
hue=grouper,
data=grouped,
palette='RdBu'
).set_xticklabels(
labels = grouped[item
].value_counts().index.tolist(), rotation=90)
#show plot
return PercPlot
函数和结果图如下:
percent_categorical('item', df=foobar, grouper='groups')
这很好,因为它允许我按组显示相对百分比。但是,我还想显示每个组的绝对数字,最好在图例中显示。在这种情况下,我希望它显示 A 组共有 89 名成员,B 组共有 11 名成员。
提前感谢您的帮助。
我通过拆分 groupby
操作解决了这个问题:一个用于获取百分比,一个用于计算对象的数量。
我调整了你的percent_catergorical
功能如下:
def percent_categorical(item, df=IA, grouper='Active Status') :
# plot categorical responses to an item ('column name')
# by percent by group ('diff column name w categorical data')
# select a data frame (default is IA)
# 'Active Status' is default grouper
# create groupby of item grouped by status
groupbase = df.groupby(grouper)[item]
# count the number of occurences
groupcount = groupbase.count()
# convert to percentage by group rather than total count
groupper = (groupbase.value_counts(normalize=True)
# rename column
.rename('percentage')
# multiple by 100 for easier interpretation
.mul(100)
# change order from value to name
.reset_index()
.sort_values(item))
# create plot
fig, ax = plt.subplots()
brplt = sns.barplot(x=item,
y='percentage',
hue=groupper,
data=groupper,
palette='RdBu',
ax=ax).set_xticklabels(
labels = grouper[item
].value_counts().index.tolist(), rotation=90)
# get the handles and the labels of the legend
# these are the bars and the corresponding text in the legend
thehandles, thelabels = ax.get_legend_handles_labels()
# for each label, add the total number of occurences
# you can get this from groupcount as the labels in the figure have
# the same name as in the values in column of your df
for counter, label in enumerate(thelabels):
# the new label looks like this (dummy name and value)
# 'XYZ (42)'
thelabels[counter] = label + ' ({})'.format(groupcount[label])
# add the new legend to the figure
ax.legend(thehandles, thelabels)
#show plot
return fig, ax, brplt
得到你的身材:
fig, ax, brplt = percent_categorical('item', df=foobar, grouper='groups')
结果图如下所示:
您可以随意更改此图例的外观,我只是添加括号作为示例。