使用多索引数据框生成复杂的 seaborn 点图

Question

示例数据：

table1
   c1  c2
r1 1   3
r2 2   2
r3 3   1

table2
   c1  c2
r1 4   6
r2 5   5
r3 6   4

table3
  c1  c2
r1 7  9
r2 8  8
r3 9  7

我已将数据整理成如下所示的数据框，其中行是分析类别，上层列是被分析的个体，第二层是重复数据。

   table1    table2    table3
   r1 r2 r3  r1 r2 r3  r1 r2 r3
c1  1 2 3     4 5 6     7 8 9
c2  3 2 1     6 5 4     9 8 7

我想把它变成一个点图，其中每个重复的平均值是点，其余值用于创建置信区间，并为每个 table 绘制一条线。换句话说，我希望传递给 pointplot 的值是 x=[table1,table2,table3], y=mean(all_r_values), hue=[c1, c2]

我不知道该怎么做，或者如何将我的 table 重塑为 suitable 形式。

Answer 1

Seaborn 更喜欢长（整洁）格式的数据，您可以在 the documentation:

中了解更多信息

It is easiest and best to invoke these functions with a DataFrame that is in “tidy” format, although the lower-level functions also accept wide-form DataFrames or simple vectors of observations.

本质上，这意味着您希望尽可能多的信息包含在数据框的行中，而不是列中。在您的情况下，您希望将数据转换为这种格式：

rep  table    c       value
r1  table1    c1      1
r2  table1    c1      2
r3  table1    c1      3
...

我复制了你的样本数据并稍作修改得到这个：

rep c1 c2 table
r1 1  3 table1
r2 2  2 table1
r3 3  1 table1
r1 4  6 table2
r2 5  5 table2
r3 6  4 table2
r1 7  9 table3
r2 8  8 table3
r3 9  7 table3

复制到剪贴板并通过

读入pandas

import pandas as pd
import seaborn as sns

df = pd.read_clipboard()

然后你可以 "melt" 将数据转换成长格式，然后用 Seaborn 绘制它：

df_long = df.melt(id_vars=['rep', 'table'], var_name='c')
sns.pointplot(x='table', y='value', hue='c', data=df_long, join=False, dodge=0.2)

要从（和进入）您的分层列格式开始有点混乱，但可以通过

完成

# Get sample data into the hierarchical column format
df_long_temp = df.melt(id_vars=['rep', 'table'], value_vars=['c1', 'c2'], var_name='c')
df_multi_cols = df_long_temp.set_index(['table', 'rep', 'c']).unstack(level=[0,1])

# Reshape from hierarchical column to long-form data
df_long = df_multi_cols.stack(level=[1,2]).reset_index()
sns.pointplot(x='table', y='value', hue='c', data=df_long, join=False, dodge=0.2)

使用多索引数据框生成复杂的 seaborn 点图

Using a multiindex dataframe to produce a complicated seaborn pointplot

python

pandas

seaborn