pandas 和 seaborn 中的分组箱线图

Grouped boxplots in pandas and seaborn

我知道了。数据框:

     season           A         B         C         D
0   current   26.978912  0.039233  1.248607  0.025874
1   current   26.978912  0.039233  0.836786  0.025874
2   current   26.978912  0.039233  3.047536  0.025874
3   current   26.978912  0.039233  3.726964  0.025874
4   current   26.978912  0.039233  1.171393  0.025874
5   current   26.978912  0.039233  0.180929  0.025874
6   current   26.978912  0.039233  0.000000  0.025874
7   current   34.709560  0.039233  0.700893  0.025874
8   current  111.140200  0.306142  3.068286  0.169244
9   current  111.140200  0.306142  2.931107  0.169244
10  current  111.140200  0.306142  2.121893  0.169244
11  current  111.140200  0.306142  1.479464  0.169244
12  current  111.140200  0.306142  2.186821  0.169244
13  current  111.140200  0.306142  9.542714  0.169244
14  current  111.140200  0.306142  9.890750  0.169244
15  current  111.140200  0.306142  8.864857  0.169244
16     past   88.176415  0.257901  3.416059  0.141809
17     past   88.176415  0.257901  4.835357  0.141809
18     past   88.176415  0.257901  5.238097  0.141809
19     past   88.176415  0.257901  5.535355  0.141809
20     past   88.176415  0.257901  6.479523  0.141809
21     past   88.176415  0.257901  7.727862  0.141809
22     past   88.176415  0.257901  8.046811  0.141809
23     past   94.037913  0.308439  8.541000  0.163651
24     past  101.630141  0.363136  8.416895  0.192256
25     past  101.630141  0.363136  6.531005  0.192256
26     past  101.630141  0.363136  6.397497  0.192256
27     past  101.630141  0.363136  6.500077  0.192256
28     past  101.630141  0.363136  7.088469  0.192256
29     past  101.630141  0.363136  7.821852  0.192256
30     past  101.630141  0.363136  8.011082  0.192256
31     past  101.037817  0.417099  8.279735  0.212376
32     past   88.176415  0.257901  3.416059  0.141809
33     past   88.176415  0.257901  4.835357  0.141809
34     past   88.176415  0.257901  5.238097  0.141809
35     past   88.176415  0.257901  5.535355  0.141809
36     past   88.176415  0.257901  6.479523  0.141809
37     past   88.176415  0.257901  7.727862  0.141809
38     past   88.176415  0.257901  8.046811  0.141809
39     past   94.037913  0.308439  8.541000  0.163651
40     past  101.630141  0.363136  8.416895  0.192256
41     past  101.630141  0.363136  6.531005  0.192256
42     past  101.630141  0.363136  6.397497  0.192256
43     past  101.630141  0.363136  6.500077  0.192256
44     past  101.630141  0.363136  7.088469  0.192256
45     past  101.630141  0.363136  7.821852  0.192256
46     past  101.630141  0.363136  8.011082  0.192256
47     past  101.037817  0.417099  8.279735  0.212376

我是这样画的:

df.boxplot(by='season')

如何确保不同的面板具有不同的 y 轴最小值和最大值?另外,我如何在 seaborn 中执行此操作?

好的,所以您首先需要的是长格式数据。假设您从这个开始:

import numpy
import pandas
import seaborn
numpy.random.seed(0)

N = 100
seasons = ['winter', 'spring', 'summer', 'autumn']
df = pandas.DataFrame({
    'season': numpy.random.choice(seasons, size=N),
    'A': numpy.random.normal(4, 1.75, size=N),
    'B': numpy.random.normal(4, 4.5, size=N),
    'C': numpy.random.lognormal(0.5, 0.05, size=N),
    'D': numpy.random.beta(3, 1, size=N)
})

print(df.sample(7))

           A         B         C         D  season
85  7.236212  5.044815  1.845659  0.550943  autumn
13  4.749581  1.014348  1.707000  0.630618  autumn
0   1.014027  4.750031  1.637803  0.285781  winter
3   3.233370  8.250158  1.516189  0.973797  winter
44  6.062864 -0.969725  1.564768  0.954225  autumn
43  7.317806 -3.209259  1.699684  0.968950  spring
39  5.576446 -2.187281  1.735002  0.436692  winter

您可以使用 pandas.melt 函数将其转换为长格式数据。

lf = pandas.melt(df, value_vars=['A', 'B', 'C', 'D'], id_vars='season')
print(lf.sample(7))

     season variable     value
399  winter        D  0.238061
227  spring        C  1.656770
322  autumn        D  0.933299
121  autumn        B  4.393981
6    autumn        A  1.175679
5    autumn        A  5.360608
51   spring        A  5.709118

然后你可以直接将所有内容输入 seaborn.factorplot

fg = (
    pandas.melt(df, value_vars=['A', 'B', 'C', 'D'], id_vars='season')
        .pipe(
            (seaborn.factorplot, 'data'), # (<fxn>, <dataframe var>)
            kind='box',                   # type of plot we want
            x='season', x_order=seasons,  # x-values of the plots
            y='value', palette='BrBG_r',  # y-values and colors
            col='variable', col_wrap=2,   # 'A-D' in columns, wrap at 2nd col
            sharey=False                  # tailor y-axes for each group
            notch=True, width=0.75,       # kwargs passed to boxplot
        )
)

这给了我: