在多索引情况下查找最大值的 Dataframe 列的出现

Question

我有一组数据，我正在尝试评估每个参数的影响。为此，我的第一个想法是尝试计算我的参数值在锁定所有其他参数时产生最佳结果的概率，或者更一般地说是处于最佳 x%。让我们看一个例子，让它更清楚：

我的数据看起来像这样（但有更多层次）：

import pandas as pd
import numpy as np

iterables = [['a','b','c'], [1,2,3]]
np.random.seed(123)

columns_index = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(data= np.random.rand(2,9), columns = columns_index, index=['feature1', 'feature2'])

这应该会产生以下结果：

first            a                             b                      \
second           1         2         3         1         2         3   
feature1  0.696469  0.286139  0.226851  0.551315  0.719469  0.423106   
feature2  0.392118  0.343178  0.729050  0.438572  0.059678  0.398044   
first            c                      
second           1         2         3  
feature1  0.980764  0.684830  0.480932  
feature2  0.737995  0.182492  0.175452

现在，如果我对 'feature2' 感兴趣，并想检查 'first' 的影响，我可以这样做：

df.loc['feature2'].groupby('second').max()
Out[272]: 
second
1    0.737995
2    0.343178
3    0.729050

现在，问题是，我怎样才能得到以下内容：

最大值是通过 :

获得的

'first' = c 对于 'second'= 1
'first' = 一个 'second'= 2
'first' = 一个 'second'= 3

所以我想计算：一：66.66% 乙：0% c : 33.33%

希望这已经足够清楚了。如果您有想法，我也很想知道有什么更好的想法来检查不同参数的影响。

谢谢！

Answer 1

使用.idxmax获取索引即

df.loc['feature2'].groupby(level=1).idxmax()

second
1    (c, 1)
2    (a, 2)
3    (a, 3)

Answer 2

或者你可以试试这个..

df.stack().loc['feature2'].stack().groupby(level='second').apply(lambda x : x[x==x.max()])
Out[805]: 
second  second  first
1       1       c        0.737995
2       2       a        0.343178
3       3       a        0.729050

在多索引情况下查找最大值的 Dataframe 列的出现

finding occurences of Dataframe column of max value in multi-index case

python

multi-index

dataframe

pandas