具有 2 个 y 轴的分组箱线图每个 x 刻度 2 个变量
Grouped boxplot with 2 y axes 2 variables per x tick
我正在尝试绘制成本(以卢比为单位)和装机容量(以兆瓦为单位)的箱线图,其中 xaxis
作为可再生能源的份额(以百分比为单位)。
即每个 x tick 与两个箱线图相关联,一个是成本,一个是装机容量。我有 3 个 xtick 值 (20%, 40%, 60%)
.
我尝试了 ,但我收到了附在底部的错误。
每个 xtick
我需要两个箱线图。
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
plt.rcParams["font.family"] = "Times New Roman"
plt.style.use('seaborn-ticks')
plt.grid(color='w', linestyle='solid')
data1 = pd.read_csv('RES_cap.csv')
df=pd.DataFrame(data1, columns=['per','cap','cost'])
cost= df['cost']
cap=df['cap']
per_res=df['per']
fig, ax1 = plt.subplots()
xticklabels = 3
ax1.set_xlabel('Percentage of RES integration')
ax1.set_ylabel('Production Capacity (MW)')
res1 = ax1.boxplot(cost, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res1[element])
for patch in res1['boxes']:
patch.set_facecolor('tab:blue')
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.set_ylabel('Costs', color='tab:orange')
res2 = ax2.boxplot(cap, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res2[element], color='k')
for patch in res2['boxes']:
patch.set_facecolor('tab:orange')
ax1.set_xticklabels(['20%','40%','60%'])
fig.tight_layout()
plt.show()
示例数据:
data attached
通过测试您的代码并将其与 进行比较,我发现有几点很突出:
- 您为
x
参数输入的数据具有一维形状而不是二维形状。你输入一个变量,所以你得到一个盒子,而不是你真正想要的三个;
- 未定义
positions
参数,导致两个箱线图的框重叠;
- 在
res1
的第一个 for
循环中,缺少 plt.setp
中的 color
参数;
- 您设置了 x 个刻度标签,但没有先设置 x 个刻度(如警告 here),这会导致出现错误消息。
我提供以下更多基于 的解决方案。它解决了正确塑造数据的问题,并利用字典来定义图中多个对象共享的许多参数。这使得根据您的喜好调整格式变得更容易,并且还使代码更清晰,因为它避免了 for
循环(在箱线图元素和 res
对象上)和重复参数共享相同参数的函数。
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create a random dataset similar to the one in the image you shared
rng = np.random.default_rng(seed=123) # random number generator
data = dict(per = np.repeat([20, 40, 60], [60, 30, 10]),
cap = rng.choice([70, 90, 220, 240, 320, 330, 340, 360, 410], size=100),
cost = rng.integers(low=2050, high=2250, size=100))
df = pd.DataFrame(data)
# Pivot table according to the 'per' categories so that the cap and
# cost variables are grouped by them:
df_pivot = df.pivot(columns=['per'])
# Create a list of the cap and cost grouped variables to be plotted
# in each (twinned) boxplot: note that the NaN values must be removed
# for the plotting function to work.
cap = [df_pivot['cap'][var].dropna() for var in df_pivot['cap']]
cost = [df_pivot['cost'][var].dropna() for var in df_pivot['cost']]
# Create figure and dictionary containing boxplot parameters that are
# common to both boxplots (according to my style preferences):
# note that I define the whis parameter so that values below the 5th
# percentile and above the 95th percentile are shown as outliers
nb_groups = df['per'].nunique()
fig, ax1 = plt.subplots(figsize=(9,6))
box_param = dict(whis=(5, 95), widths=0.2, patch_artist=True,
flierprops=dict(marker='.', markeredgecolor='black',
fillstyle=None), medianprops=dict(color='black'))
# Create boxplots for 'cap' variable: note the double asterisk used
# to unpack the dictionary of boxplot parameters
space = 0.15
ax1.boxplot(cap, positions=np.arange(nb_groups)-space,
boxprops=dict(facecolor='tab:blue'), **box_param)
# Create boxplots for 'cost' variable on twin Axes
ax2 = ax1.twinx()
ax2.boxplot(cost, positions=np.arange(nb_groups)+space,
boxprops=dict(facecolor='tab:orange'), **box_param)
# Format x ticks
labelsize = 12
ax1.set_xticks(np.arange(nb_groups))
ax1.set_xticklabels([f'{label}%' for label in df['per'].unique()])
ax1.tick_params(axis='x', labelsize=labelsize)
# Format y ticks
yticks_fmt = dict(axis='y', labelsize=labelsize)
ax1.tick_params(colors='tab:blue', **yticks_fmt)
ax2.tick_params(colors='tab:orange', **yticks_fmt)
# Format axes labels
label_fmt = dict(size=12, labelpad=15)
ax1.set_xlabel('Percentage of RES integration', **label_fmt)
ax1.set_ylabel('Production Capacity (MW)', color='tab:blue', **label_fmt)
ax2.set_ylabel('Costs (Rupees)', color='tab:orange', **label_fmt)
plt.show()
Matplotlib 文档:boxplot demo, boxplot function parameters, marker symbols for fliers, label text formatting parameters
考虑到设置它非常费力,如果我自己做这个,我会选择并排的子图,而不是创建成对的轴。这可以在 seaborn 中使用 catplot
函数轻松完成,该函数会自动处理大量格式设置。由于每个变量只有三个类别,因此使用不同颜色的百分比类别并排比较箱线图相对容易,如基于相同数据的示例所示:
import seaborn as sns # v 0.11.0
# Convert dataframe to long format with 'per' set aside as a grouping variable
df_melt = df.melt(id_vars='per')
# Create side-by-side boxplots of each variable: note that the boxes
# are colored by default
g = sns.catplot(kind='box', data=df_melt, x='per', y='value', col='variable',
height=4, palette='Blues', sharey=False, saturation=1,
width=0.3, fliersize=2, linewidth=1, whis=(5, 95))
g.fig.subplots_adjust(wspace=0.4)
g.set_titles(col_template='{col_name}', size=12, pad=20)
# Format Axes labels
label_fmt = dict(size=10, labelpad=10)
for ax in g.axes.flatten():
ax.set_xlabel('Percentage of RES integration', **label_fmt)
g.axes.flatten()[0].set_ylabel('Production Capacity (MW)', **label_fmt)
g.axes.flatten()[1].set_ylabel('Costs (Rupees)', **label_fmt)
plt.show()
我正在尝试绘制成本(以卢比为单位)和装机容量(以兆瓦为单位)的箱线图,其中 xaxis
作为可再生能源的份额(以百分比为单位)。
即每个 x tick 与两个箱线图相关联,一个是成本,一个是装机容量。我有 3 个 xtick 值 (20%, 40%, 60%)
.
我尝试了
每个 xtick
我需要两个箱线图。
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
plt.rcParams["font.family"] = "Times New Roman"
plt.style.use('seaborn-ticks')
plt.grid(color='w', linestyle='solid')
data1 = pd.read_csv('RES_cap.csv')
df=pd.DataFrame(data1, columns=['per','cap','cost'])
cost= df['cost']
cap=df['cap']
per_res=df['per']
fig, ax1 = plt.subplots()
xticklabels = 3
ax1.set_xlabel('Percentage of RES integration')
ax1.set_ylabel('Production Capacity (MW)')
res1 = ax1.boxplot(cost, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res1[element])
for patch in res1['boxes']:
patch.set_facecolor('tab:blue')
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.set_ylabel('Costs', color='tab:orange')
res2 = ax2.boxplot(cap, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res2[element], color='k')
for patch in res2['boxes']:
patch.set_facecolor('tab:orange')
ax1.set_xticklabels(['20%','40%','60%'])
fig.tight_layout()
plt.show()
示例数据: data attached
通过测试您的代码并将其与
- 您为
x
参数输入的数据具有一维形状而不是二维形状。你输入一个变量,所以你得到一个盒子,而不是你真正想要的三个; - 未定义
positions
参数,导致两个箱线图的框重叠; - 在
res1
的第一个for
循环中,缺少plt.setp
中的color
参数; - 您设置了 x 个刻度标签,但没有先设置 x 个刻度(如警告 here),这会导致出现错误消息。
我提供以下更多基于 for
循环(在箱线图元素和 res
对象上)和重复参数共享相同参数的函数。
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create a random dataset similar to the one in the image you shared
rng = np.random.default_rng(seed=123) # random number generator
data = dict(per = np.repeat([20, 40, 60], [60, 30, 10]),
cap = rng.choice([70, 90, 220, 240, 320, 330, 340, 360, 410], size=100),
cost = rng.integers(low=2050, high=2250, size=100))
df = pd.DataFrame(data)
# Pivot table according to the 'per' categories so that the cap and
# cost variables are grouped by them:
df_pivot = df.pivot(columns=['per'])
# Create a list of the cap and cost grouped variables to be plotted
# in each (twinned) boxplot: note that the NaN values must be removed
# for the plotting function to work.
cap = [df_pivot['cap'][var].dropna() for var in df_pivot['cap']]
cost = [df_pivot['cost'][var].dropna() for var in df_pivot['cost']]
# Create figure and dictionary containing boxplot parameters that are
# common to both boxplots (according to my style preferences):
# note that I define the whis parameter so that values below the 5th
# percentile and above the 95th percentile are shown as outliers
nb_groups = df['per'].nunique()
fig, ax1 = plt.subplots(figsize=(9,6))
box_param = dict(whis=(5, 95), widths=0.2, patch_artist=True,
flierprops=dict(marker='.', markeredgecolor='black',
fillstyle=None), medianprops=dict(color='black'))
# Create boxplots for 'cap' variable: note the double asterisk used
# to unpack the dictionary of boxplot parameters
space = 0.15
ax1.boxplot(cap, positions=np.arange(nb_groups)-space,
boxprops=dict(facecolor='tab:blue'), **box_param)
# Create boxplots for 'cost' variable on twin Axes
ax2 = ax1.twinx()
ax2.boxplot(cost, positions=np.arange(nb_groups)+space,
boxprops=dict(facecolor='tab:orange'), **box_param)
# Format x ticks
labelsize = 12
ax1.set_xticks(np.arange(nb_groups))
ax1.set_xticklabels([f'{label}%' for label in df['per'].unique()])
ax1.tick_params(axis='x', labelsize=labelsize)
# Format y ticks
yticks_fmt = dict(axis='y', labelsize=labelsize)
ax1.tick_params(colors='tab:blue', **yticks_fmt)
ax2.tick_params(colors='tab:orange', **yticks_fmt)
# Format axes labels
label_fmt = dict(size=12, labelpad=15)
ax1.set_xlabel('Percentage of RES integration', **label_fmt)
ax1.set_ylabel('Production Capacity (MW)', color='tab:blue', **label_fmt)
ax2.set_ylabel('Costs (Rupees)', color='tab:orange', **label_fmt)
plt.show()
Matplotlib 文档:boxplot demo, boxplot function parameters, marker symbols for fliers, label text formatting parameters
考虑到设置它非常费力,如果我自己做这个,我会选择并排的子图,而不是创建成对的轴。这可以在 seaborn 中使用 catplot
函数轻松完成,该函数会自动处理大量格式设置。由于每个变量只有三个类别,因此使用不同颜色的百分比类别并排比较箱线图相对容易,如基于相同数据的示例所示:
import seaborn as sns # v 0.11.0
# Convert dataframe to long format with 'per' set aside as a grouping variable
df_melt = df.melt(id_vars='per')
# Create side-by-side boxplots of each variable: note that the boxes
# are colored by default
g = sns.catplot(kind='box', data=df_melt, x='per', y='value', col='variable',
height=4, palette='Blues', sharey=False, saturation=1,
width=0.3, fliersize=2, linewidth=1, whis=(5, 95))
g.fig.subplots_adjust(wspace=0.4)
g.set_titles(col_template='{col_name}', size=12, pad=20)
# Format Axes labels
label_fmt = dict(size=10, labelpad=10)
for ax in g.axes.flatten():
ax.set_xlabel('Percentage of RES integration', **label_fmt)
g.axes.flatten()[0].set_ylabel('Production Capacity (MW)', **label_fmt)
g.axes.flatten()[1].set_ylabel('Costs (Rupees)', **label_fmt)
plt.show()