只有三个最好的堆积条形图
Stacked bar plot with only the three best
我想用 seaborn
或 matplotlib
绘制 stacked bar plot
。我想通过 .pivot_table
获取我需要的所有信息,然后我只想过滤计数最多的三个社区。但是有一个 KeyError: 'neighbourhood'
因为邻域是 df_new 中的索引。
如何从我的 df_new
(df.pivot_table
必须是 )生成仅包含前三个街区的堆积条形图?
d = {'host_id': [1, 1, 2, 3, 3],
'listing_id': [1, 2, 3, 4, 5],
'neighbourhood': ['Sofia', 'New York', 'Berlin', 'London', 'London'],
'price': [50.0, 60.0, 50.0, 80.0, 90.0],
'room_type': ['Private', 'Private', 'Shared', 'Private', 'Shared']}
df = pd.DataFrame(data=d)
print(df)
[OUT]
host_id listing_id neighbourhood price room_type
0 1 1 Sofia 50.0 Private
1 1 2 New York 60.0 Private
2 2 3 Berlin 50.0 Shared
3 3 4 London 80.0 Private
4 3 5 London 90.0 Shared
df_new = df.pivot_table(index='neighbourhood', columns='room_type',
values='price', aggfunc='mean',
fill_value=0.0)
print(df_new)
[OUT]
room_type Private Shared
neighbourhood
Berlin 0 50
London 80 90
New York 60 0
Sofia 50 0
df_Best = df.groupby(["neighbourhood"])["room_type"].count().reset_index(
name="count").sort_values(
by=['count'], ascending=False).head(3)
print(df_Best)
[OUT]
neighbourhood count
1 London 2
0 Berlin 1
2 New York 1
df_new.loc[df_new['neighbourhood'].isin(df_Best['neighbourhood'].head(1).values[0])]
print(df_new)
[OUT]
KeyError: 'neighbourhood'
# Because neighbourhood is index in df_new
最后我想要的是
您可以使用 df_Best
的 "neighbourhood"
列直接索引 df_new
。例如。 df_new.loc[df_Best['neighbourhood'].head(1)]
.
from matplotlib import pyplot as plt
import pandas as pd
d = {'host_id': [1, 1, 2, 3, 3],
'listing_id': [1, 2, 3, 4, 5],
'neighbourhood': ['Sofia', 'New York', 'Berlin', 'London', 'London'],
'price': [50.0, 60.0, 50.0, 80.0, 90.0],
'room_type': ['Private', 'Private', 'Shared', 'Private', 'Shared']}
df = pd.DataFrame(data=d)
df_new = df.pivot_table(index='neighbourhood', columns='room_type',
values='price', aggfunc='mean',
fill_value=0.0)
df_Best = df.groupby(["neighbourhood"])["room_type"].count().reset_index(
name="count").sort_values(
by=['count'], ascending=False).head(3)
df_new.loc[df_Best['neighbourhood']].plot.bar(stacked=True, rot=0)
plt.show()
请注意 df_new
是通过 df.pivot_table(...)
创建的。
如果你真的想纯粹从 df.pivot_table
创建情节,那么可读性较差的形式是:
df.pivot_table(index='neighbourhood', columns='room_type', values='price', aggfunc='mean',
fill_value=0.0).loc[
df.groupby(["neighbourhood"])["room_type"].count().sort_values(ascending=False).head(3).index]
我想用 seaborn
或 matplotlib
绘制 stacked bar plot
。我想通过 .pivot_table
获取我需要的所有信息,然后我只想过滤计数最多的三个社区。但是有一个 KeyError: 'neighbourhood'
因为邻域是 df_new 中的索引。
如何从我的 df_new
(df.pivot_table
必须是 )生成仅包含前三个街区的堆积条形图?
d = {'host_id': [1, 1, 2, 3, 3],
'listing_id': [1, 2, 3, 4, 5],
'neighbourhood': ['Sofia', 'New York', 'Berlin', 'London', 'London'],
'price': [50.0, 60.0, 50.0, 80.0, 90.0],
'room_type': ['Private', 'Private', 'Shared', 'Private', 'Shared']}
df = pd.DataFrame(data=d)
print(df)
[OUT]
host_id listing_id neighbourhood price room_type
0 1 1 Sofia 50.0 Private
1 1 2 New York 60.0 Private
2 2 3 Berlin 50.0 Shared
3 3 4 London 80.0 Private
4 3 5 London 90.0 Shared
df_new = df.pivot_table(index='neighbourhood', columns='room_type',
values='price', aggfunc='mean',
fill_value=0.0)
print(df_new)
[OUT]
room_type Private Shared
neighbourhood
Berlin 0 50
London 80 90
New York 60 0
Sofia 50 0
df_Best = df.groupby(["neighbourhood"])["room_type"].count().reset_index(
name="count").sort_values(
by=['count'], ascending=False).head(3)
print(df_Best)
[OUT]
neighbourhood count
1 London 2
0 Berlin 1
2 New York 1
df_new.loc[df_new['neighbourhood'].isin(df_Best['neighbourhood'].head(1).values[0])]
print(df_new)
[OUT]
KeyError: 'neighbourhood'
# Because neighbourhood is index in df_new
最后我想要的是
您可以使用 df_Best
的 "neighbourhood"
列直接索引 df_new
。例如。 df_new.loc[df_Best['neighbourhood'].head(1)]
.
from matplotlib import pyplot as plt
import pandas as pd
d = {'host_id': [1, 1, 2, 3, 3],
'listing_id': [1, 2, 3, 4, 5],
'neighbourhood': ['Sofia', 'New York', 'Berlin', 'London', 'London'],
'price': [50.0, 60.0, 50.0, 80.0, 90.0],
'room_type': ['Private', 'Private', 'Shared', 'Private', 'Shared']}
df = pd.DataFrame(data=d)
df_new = df.pivot_table(index='neighbourhood', columns='room_type',
values='price', aggfunc='mean',
fill_value=0.0)
df_Best = df.groupby(["neighbourhood"])["room_type"].count().reset_index(
name="count").sort_values(
by=['count'], ascending=False).head(3)
df_new.loc[df_Best['neighbourhood']].plot.bar(stacked=True, rot=0)
plt.show()
请注意 df_new
是通过 df.pivot_table(...)
创建的。
如果你真的想纯粹从 df.pivot_table
创建情节,那么可读性较差的形式是:
df.pivot_table(index='neighbourhood', columns='room_type', values='price', aggfunc='mean',
fill_value=0.0).loc[
df.groupby(["neighbourhood"])["room_type"].count().sort_values(ascending=False).head(3).index]