只有三个最好的堆积条形图

Question

我想用 seaborn 或 matplotlib 绘制 stacked bar plot。我想通过 .pivot_table 获取我需要的所有信息，然后我只想过滤计数最多的三个社区。但是有一个 KeyError: 'neighbourhood' 因为邻域是 df_new 中的索引。

如何从我的 df_new（df.pivot_table 必须是 ）生成仅包含前三个街区的堆积条形图？

d = {'host_id': [1, 1, 2, 3, 3], 
     'listing_id': [1, 2, 3, 4, 5],
     'neighbourhood': ['Sofia', 'New York', 'Berlin', 'London', 'London'], 
     'price': [50.0, 60.0, 50.0, 80.0, 90.0], 
     'room_type': ['Private', 'Private', 'Shared', 'Private', 'Shared']}
df = pd.DataFrame(data=d)
print(df)

[OUT]

   host_id  listing_id neighbourhood  price room_type
0        1           1         Sofia   50.0   Private
1        1           2      New York   60.0   Private
2        2           3        Berlin   50.0    Shared
3        3           4        London   80.0   Private
4        3           5        London   90.0    Shared

df_new = df.pivot_table(index='neighbourhood', columns='room_type',
                                             values='price', aggfunc='mean',
                                            fill_value=0.0)
print(df_new)
[OUT]

room_type      Private  Shared
neighbourhood                 
Berlin               0      50
London              80      90
New York            60       0
Sofia               50       0
df_Best = df.groupby(["neighbourhood"])["room_type"].count().reset_index(
                                                     name="count").sort_values(
                                                     by=['count'], ascending=False).head(3)
print(df_Best)
[OUT]
  neighbourhood  count
1        London      2
0        Berlin      1
2      New York      1

df_new.loc[df_new['neighbourhood'].isin(df_Best['neighbourhood'].head(1).values[0])]
print(df_new)

[OUT]
KeyError: 'neighbourhood'

# Because neighbourhood is index in df_new

最后我想要的是

Answer 1

您可以使用 df_Best 的 "neighbourhood" 列直接索引 df_new。例如。 df_new.loc[df_Best['neighbourhood'].head(1)].

from matplotlib import pyplot as plt
import pandas as pd

d = {'host_id': [1, 1, 2, 3, 3],
     'listing_id': [1, 2, 3, 4, 5],
     'neighbourhood': ['Sofia', 'New York', 'Berlin', 'London', 'London'],
     'price': [50.0, 60.0, 50.0, 80.0, 90.0],
     'room_type': ['Private', 'Private', 'Shared', 'Private', 'Shared']}
df = pd.DataFrame(data=d)
df_new = df.pivot_table(index='neighbourhood', columns='room_type',
                        values='price', aggfunc='mean',
                        fill_value=0.0)
df_Best = df.groupby(["neighbourhood"])["room_type"].count().reset_index(
    name="count").sort_values(
    by=['count'], ascending=False).head(3)
df_new.loc[df_Best['neighbourhood']].plot.bar(stacked=True, rot=0)
plt.show()

请注意 df_new 是通过 df.pivot_table(...) 创建的。如果你真的想纯粹从 df.pivot_table 创建情节，那么可读性较差的形式是：

df.pivot_table(index='neighbourhood', columns='room_type', values='price', aggfunc='mean',
               fill_value=0.0).loc[
    df.groupby(["neighbourhood"])["room_type"].count().sort_values(ascending=False).head(3).index]

只有三个最好的堆积条形图

Stacked bar plot with only the three best

python

matplotlib

dataframe

pandas

seaborn