如何在 python 中做一个堆栈图,按一个类别按百分比排序?
How to do a stack chart in python, order by one category in percentage?
我正在尝试使用此数据集制作条形图。我已经做到了这一点,但是,我想按照“m”百分比的降序排列条形图。到目前为止,这是我的代码。
cars = {'Column_1': ['A','A','C','B','D','B','C','A','C','D','D','D','B','B','D','D','D'],
'Column_2': ['m','f','m','m','f','f','m','m','f','m','f','f','m','m','m','m','m']
}
df = pd.DataFrame(cars, columns = ['Column_1', 'Column_2'])
df3 = df.groupby(['Column_1', 'Column_2'])['Column_1'].count().unstack('Column_2')
df3[['f','m']].plot(kind='bar', stacked=True)
但我想要的方式,B应该是第一个(因为它的总量(4)的75%是“m”),其次是A或C,最后是D(因为它的总量的58% (7) 是“m”。我该怎么做?是否可以在每个条上也写上百分比标签?
更新(解决方案几乎完成)
import pandas as pd
cars = {'Column_1': ['A','A','C','B','D','B','C','A','C','D','D','D','B','B','D','D','D'],
'Column_2': ['m','f','m','m','f','f','m','m','f','m','f','f','m','m','m','m','m']
}
df = pd.DataFrame(cars, columns = ['Column_1', 'Column_2'])
df3 = df.groupby(['Column_1', 'Column_2'])['Column_1'].count().unstack('Column_2')
temp_df = df3[['f','m']]
temp_df["fraction_m"] = temp_df["m"]/(temp_df["m"]+temp_df["f"])
temp_df["fraction_f"] = temp_df["f"]/(temp_df["m"]+temp_df["f"])
temp_df = temp_df.sort_values(by=["fraction_m"], ascending=True)
temp_df[["fraction_f","fraction_m"]].plot(kind='bar', stacked=True)
现在它只缺少写在每个条上的百分比标签。有人可以帮忙吗?
我只是使用了一个临时数据框,并在您的基本代码中添加了一些代码。
代码
# your base code
import pandas as pd
cars = {'Column_1': ['A','A','C','B','D','B','C','A','C','D','D','D','B','B','D','D','D'],
'Column_2': ['m','f','m','m','f','f','m','m','f','m','f','f','m','m','m','m','m']
}
df = pd.DataFrame(cars, columns = ['Column_1', 'Column_2'])
df3 = df.groupby(['Column_1', 'Column_2'])['Column_1'].count().unstack('Column_2')
# your base code finished here,
# What I add to your code:
temp_df = df3[['f','m']]
temp_df["fraction"] = temp_df["m"]/(temp_df["m"]+temp_df["f"])
temp_df = temp_df.sort_values(by=["fraction"], ascending=False)
plot = temp_df[['f','m']].plot(kind='bar', stacked=True)
i = 0
to_annotate = temp_df["f"] + temp_df["m"]
for i in range(int(len(plot.patches)/2),len(plot.patches)):
p = plot.patches[i]
fraction = str(round(temp_df["fraction"][i-int(len(plot.patches)/2)] * 100)) + "%"
annotate_height = to_annotate[i-int(len(plot.patches)/2)]
plot.annotate(fraction, (p.get_x(), annotate_height))
结果
我正在尝试使用此数据集制作条形图。我已经做到了这一点,但是,我想按照“m”百分比的降序排列条形图。到目前为止,这是我的代码。
cars = {'Column_1': ['A','A','C','B','D','B','C','A','C','D','D','D','B','B','D','D','D'],
'Column_2': ['m','f','m','m','f','f','m','m','f','m','f','f','m','m','m','m','m']
}
df = pd.DataFrame(cars, columns = ['Column_1', 'Column_2'])
df3 = df.groupby(['Column_1', 'Column_2'])['Column_1'].count().unstack('Column_2')
df3[['f','m']].plot(kind='bar', stacked=True)
但我想要的方式,B应该是第一个(因为它的总量(4)的75%是“m”),其次是A或C,最后是D(因为它的总量的58% (7) 是“m”。我该怎么做?是否可以在每个条上也写上百分比标签?
更新(解决方案几乎完成)
import pandas as pd
cars = {'Column_1': ['A','A','C','B','D','B','C','A','C','D','D','D','B','B','D','D','D'],
'Column_2': ['m','f','m','m','f','f','m','m','f','m','f','f','m','m','m','m','m']
}
df = pd.DataFrame(cars, columns = ['Column_1', 'Column_2'])
df3 = df.groupby(['Column_1', 'Column_2'])['Column_1'].count().unstack('Column_2')
temp_df = df3[['f','m']]
temp_df["fraction_m"] = temp_df["m"]/(temp_df["m"]+temp_df["f"])
temp_df["fraction_f"] = temp_df["f"]/(temp_df["m"]+temp_df["f"])
temp_df = temp_df.sort_values(by=["fraction_m"], ascending=True)
temp_df[["fraction_f","fraction_m"]].plot(kind='bar', stacked=True)
现在它只缺少写在每个条上的百分比标签。有人可以帮忙吗?
我只是使用了一个临时数据框,并在您的基本代码中添加了一些代码。
代码
# your base code
import pandas as pd
cars = {'Column_1': ['A','A','C','B','D','B','C','A','C','D','D','D','B','B','D','D','D'],
'Column_2': ['m','f','m','m','f','f','m','m','f','m','f','f','m','m','m','m','m']
}
df = pd.DataFrame(cars, columns = ['Column_1', 'Column_2'])
df3 = df.groupby(['Column_1', 'Column_2'])['Column_1'].count().unstack('Column_2')
# your base code finished here,
# What I add to your code:
temp_df = df3[['f','m']]
temp_df["fraction"] = temp_df["m"]/(temp_df["m"]+temp_df["f"])
temp_df = temp_df.sort_values(by=["fraction"], ascending=False)
plot = temp_df[['f','m']].plot(kind='bar', stacked=True)
i = 0
to_annotate = temp_df["f"] + temp_df["m"]
for i in range(int(len(plot.patches)/2),len(plot.patches)):
p = plot.patches[i]
fraction = str(round(temp_df["fraction"][i-int(len(plot.patches)/2)] * 100)) + "%"
annotate_height = to_annotate[i-int(len(plot.patches)/2)]
plot.annotate(fraction, (p.get_x(), annotate_height))
结果