Pandas MultiIndex：选择一个只知道第二个索引的列？

Question

我正在使用以下 DataFrame：

   age  height  weight  shoe_size
0  8.0     6.0     2.0        1.0
1  8.0     NaN     2.0        1.0
2  6.0     1.0     4.0        NaN
3  5.0     1.0     NaN        0.0
4  5.0     NaN     1.0        NaN
5  3.0     0.0     1.0        0.0

我用这种方式在df中添加了另一个header：

zipped = list(zip(df.columns, ["RHS", "height", "weight", "shoe_size"]))

df.columns = pd.MultiIndex.from_tuples(zipped)

所以这是新的 DataFrame：

   age height weight shoe_size
   RHS height weight shoe_size
0  8.0    6.0    2.0       1.0
1  8.0    NaN    2.0       1.0
2  6.0    1.0    4.0       NaN
3  5.0    1.0    NaN       0.0
4  5.0    NaN    1.0       NaN
5  3.0    0.0    1.0       0.0

现在我知道如何 select 第一列，通过使用相应的元组 ("age", "RHS"):

df[("age", "RHS")]

但我想知道如何仅使用第二个索引 "RHS" 来做到这一点。理想情况下是这样的：

df[(any, "RHS")]

Answer 1

您将 slice(None) 作为第一个参数传递给 .loc，前提是您首先使用 df.sort_index:

对列进行排序

In [325]: df.sort_index(1).loc[:, (slice(None), 'RHS')]
Out[325]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

您还可以将 pd.IndexSlice 与 df.loc 一起使用：

In [332]: idx = pd.IndexSlice

In [333]: df.sort_index(1).loc[:, idx[:, 'RHS']]
Out[333]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

使用切片器，您无需显式传递 slice(None)，因为 IndexSlice 会为您完成。

如果您不对列进行排序，您会得到：

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

如果您在第二级中有多个 RHS 列，将返回所有这些列。

Answer 2

你可以使用 get_level_values

In [700]: df.loc[:, df.columns.get_level_values(1) == 'RHS']
Out[700]:
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

Pandas MultiIndex：选择一个只知道第二个索引的列？

Pandas MultiIndex: Selecting a column knowing only the second index?

python

multi-index

dataframe

pandas