seaborn 中的 Barplot 有几列
Barplot in seaborn with several columns
我有以下代码:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/ccaa_vacunas.csv", decimal = ",", thousands = ".")
df2 = df.dropna()
df2.columns = df2.columns.str.replace(' ', '_')
df2.columns = df2.columns.str.replace("-", "_")
mask = (df2["CCAA"] != "España") & (df2["CCAA"] != "Fuerzas Armadas")
sns.barplot(y = "CCAA", x = "Porcentaje_de_dosis_administradas_por_100_habitantes", data = df2[mask])
我想在一个轴上绘制变量 CCAA
,在另一个轴上绘制多个条形图
CCAA
: Dosis_entregadas_Pfizer
, Dosis_entregadas_Moderna
, Dosis_entregadas_AstraZeneca
, Dosis_entregadas_Janssen
。但是,我不知道如何处理这个问题。
您可以使用 melt
将所需的列转换为 "long form"。然后在新创建的列上使用 hue=
。
这是一个例子:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/ccaa_vacunas.csv",
decimal=",", thousands=".")
mask = (df["CCAA"] != "España") & (df["CCAA"] != "Fuerzas Armadas")
df = df[mask].dropna()
df_long = df.melt(id_vars=["CCAA"],
value_vars=["Dosis entregadas Pfizer", "Dosis entregadas Moderna", "Dosis entregadas AstraZeneca",
"Dosis entregadas Janssen"],
value_name="Dosis", var_name="Fabricante")
fig, ax = plt.subplots(figsize=(12, 10))
sns.barplot(y="CCAA", x="Dosis",
estimator=np.sum, ci=None,
hue="Fabricante", palette='Set2', data=df_long, ax=ax)
ax.ticklabel_format(axis='x', useOffset=False, style='plain')
sns.despine()
plt.tight_layout()
plt.show()
PS: df_long
看起来像:
CCAA Fabricante Dosis
0 Andalucía Dosis entregadas Pfizer 1653300.0
1 Aragón Dosis entregadas Pfizer 332595.0
2 Asturias Dosis entregadas Pfizer 319145.0
3 Baleares Dosis entregadas Pfizer 184050.0
4 Canarias Dosis entregadas Pfizer 362670.0
... ... ...
9267 Murcia Dosis entregadas Janssen 68300.0
9268 Navarra Dosis entregadas Janssen 23750.0
9269 País Vasco Dosis entregadas Janssen 101300.0
9270 Ceuta Dosis entregadas Janssen 4150.0
9271 Melilla Dosis entregadas Janssen 5750.0
我有以下代码:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/ccaa_vacunas.csv", decimal = ",", thousands = ".")
df2 = df.dropna()
df2.columns = df2.columns.str.replace(' ', '_')
df2.columns = df2.columns.str.replace("-", "_")
mask = (df2["CCAA"] != "España") & (df2["CCAA"] != "Fuerzas Armadas")
sns.barplot(y = "CCAA", x = "Porcentaje_de_dosis_administradas_por_100_habitantes", data = df2[mask])
我想在一个轴上绘制变量 CCAA
,在另一个轴上绘制多个条形图
CCAA
: Dosis_entregadas_Pfizer
, Dosis_entregadas_Moderna
, Dosis_entregadas_AstraZeneca
, Dosis_entregadas_Janssen
。但是,我不知道如何处理这个问题。
您可以使用 melt
将所需的列转换为 "long form"。然后在新创建的列上使用 hue=
。
这是一个例子:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/ccaa_vacunas.csv",
decimal=",", thousands=".")
mask = (df["CCAA"] != "España") & (df["CCAA"] != "Fuerzas Armadas")
df = df[mask].dropna()
df_long = df.melt(id_vars=["CCAA"],
value_vars=["Dosis entregadas Pfizer", "Dosis entregadas Moderna", "Dosis entregadas AstraZeneca",
"Dosis entregadas Janssen"],
value_name="Dosis", var_name="Fabricante")
fig, ax = plt.subplots(figsize=(12, 10))
sns.barplot(y="CCAA", x="Dosis",
estimator=np.sum, ci=None,
hue="Fabricante", palette='Set2', data=df_long, ax=ax)
ax.ticklabel_format(axis='x', useOffset=False, style='plain')
sns.despine()
plt.tight_layout()
plt.show()
PS: df_long
看起来像:
CCAA Fabricante Dosis
0 Andalucía Dosis entregadas Pfizer 1653300.0
1 Aragón Dosis entregadas Pfizer 332595.0
2 Asturias Dosis entregadas Pfizer 319145.0
3 Baleares Dosis entregadas Pfizer 184050.0
4 Canarias Dosis entregadas Pfizer 362670.0
... ... ...
9267 Murcia Dosis entregadas Janssen 68300.0
9268 Navarra Dosis entregadas Janssen 23750.0
9269 País Vasco Dosis entregadas Janssen 101300.0
9270 Ceuta Dosis entregadas Janssen 4150.0
9271 Melilla Dosis entregadas Janssen 5750.0