如何从 3D numpy 数组获取嵌套的 seaborn 箱线图
How to obtain a nested seaborn boxplot from a 3D numpy array
我正在尝试(但失败了)从维度为 3 的 numpy 数组开始获取嵌套箱线图,例如 A = np.random.uniform(size = (4,100,2))
。
我指的那种情节如下图所示,来自seabornboxplot docs.
您可以使用 np.meshgrid()
生成 3 列索引 3D 数组。解开这些数组使它们适合作为 seaborn 的输入。或者,可以将这些数组转换为数据框,这有助于自动生成标签。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
A = np.random.normal(0.02, 1, size=(4, 100, 2)).reshape(-1).cumsum().reshape(4, -1, 2)
x_names = ['A', 'B', 'C', 'D']
hue_names = ['x', 'y']
dim1, dim2, dim3 = np.meshgrid(x_names, np.arange(A.shape[1]), hue_names, indexing='ij')
sns.boxplot(x=dim1.ravel(), y=A.ravel(), hue=dim3.ravel())
plt.tight_layout()
plt.show()
要创建数据框,代码可能如下所示。请注意,箱线图不需要明确的数字第二维。
df = pd.DataFrame({'dim1': dim1.ravel(),
'dim2': dim2.ravel(),
'dim3': dim3.ravel(),
'A': A.ravel()})
# some tests to be sure that the 3D array has been interpreted well
assert (A[0, :, 0].sum() == df[(df['dim1'] == 'A') & (df['dim3'] == 'x')]['values'].sum())
assert (A[2, :, 1].sum() == df[(df['dim1'] == 'C') & (df['dim3'] == 'y')]['values'].sum())
sns.boxplot(data=df, x='dim1', y='A', hue='dim3')
如果数组或名称很长,完全使用数字会占用更少的内存并加快速度:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
A = np.random.normal(0.01, 1, size=(10, 1000, 2)).reshape(-1).cumsum().reshape(10, -1, 2)
dim1, dim2, dim3 = np.meshgrid(np.arange(A.shape[0]), np.arange(A.shape[1]), np.arange(A.shape[2]), indexing='ij')
sns.set_style('whitegrid')
ax = sns.boxplot(x=dim1.ravel(), y=A.ravel(), hue=dim3.ravel(), palette='spring')
ax.set_xticklabels(["alpha", "beta", "gamma", "delta", "epsilon", "zeta", "eta", "theta", "iota", "kappa"])
ax.legend(handles=ax.legend_.legendHandles, labels=['2019-2020', '2020-2021'], title='Year')
sns.despine()
plt.tight_layout()
plt.show()
我正在尝试(但失败了)从维度为 3 的 numpy 数组开始获取嵌套箱线图,例如 A = np.random.uniform(size = (4,100,2))
。
我指的那种情节如下图所示,来自seabornboxplot docs.
您可以使用 np.meshgrid()
生成 3 列索引 3D 数组。解开这些数组使它们适合作为 seaborn 的输入。或者,可以将这些数组转换为数据框,这有助于自动生成标签。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
A = np.random.normal(0.02, 1, size=(4, 100, 2)).reshape(-1).cumsum().reshape(4, -1, 2)
x_names = ['A', 'B', 'C', 'D']
hue_names = ['x', 'y']
dim1, dim2, dim3 = np.meshgrid(x_names, np.arange(A.shape[1]), hue_names, indexing='ij')
sns.boxplot(x=dim1.ravel(), y=A.ravel(), hue=dim3.ravel())
plt.tight_layout()
plt.show()
要创建数据框,代码可能如下所示。请注意,箱线图不需要明确的数字第二维。
df = pd.DataFrame({'dim1': dim1.ravel(),
'dim2': dim2.ravel(),
'dim3': dim3.ravel(),
'A': A.ravel()})
# some tests to be sure that the 3D array has been interpreted well
assert (A[0, :, 0].sum() == df[(df['dim1'] == 'A') & (df['dim3'] == 'x')]['values'].sum())
assert (A[2, :, 1].sum() == df[(df['dim1'] == 'C') & (df['dim3'] == 'y')]['values'].sum())
sns.boxplot(data=df, x='dim1', y='A', hue='dim3')
如果数组或名称很长,完全使用数字会占用更少的内存并加快速度:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
A = np.random.normal(0.01, 1, size=(10, 1000, 2)).reshape(-1).cumsum().reshape(10, -1, 2)
dim1, dim2, dim3 = np.meshgrid(np.arange(A.shape[0]), np.arange(A.shape[1]), np.arange(A.shape[2]), indexing='ij')
sns.set_style('whitegrid')
ax = sns.boxplot(x=dim1.ravel(), y=A.ravel(), hue=dim3.ravel(), palette='spring')
ax.set_xticklabels(["alpha", "beta", "gamma", "delta", "epsilon", "zeta", "eta", "theta", "iota", "kappa"])
ax.legend(handles=ax.legend_.legendHandles, labels=['2019-2020', '2020-2021'], title='Year')
sns.despine()
plt.tight_layout()
plt.show()