使用 IndexSlice 过滤具有 Pandas 的 MultiIndex 数据帧

Using IndexSlice to filter MultiIndex Dataframes with Pandas

问题:如何过滤行,以便我 return 仅注入不等于 0 或 NaN 的行并且不丢失值其他列?

我有一个使用以下代码创建的数据框:

import pandas as pd

df=pd.DataFrame(
               [
               [5777, 100, 5385, 200, 5419, 4887, 100, 200],
               [4849, 0, 4539, 0, 3381, 0, 0, ],
               [4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
               [4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
               [5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
               ],
               index=pd.to_datetime(['2010-01-01',
                                     '2010-01-02',
                                     '2010-01-03',
                                     '2010-01-04',
                                     '2010-01-05']),
               columns=pd.MultiIndex.from_tuples(
                                                [('Portfolio A', 'GBP', 'amount'),
                                                 ('Portfolio A', 'GBP', 'injection'),
                                                 ('Portfolio B', 'EUR', 'amount'),                                           ('Portfolio B', 'EUR', 'injection'),
                                                 ('Portfolio C', 'USD', 'amount'),                                           ('Portfolio C', 'USD', 'injection'),
                                                 ('Portfolio D', 'JPY', 'amount'),                                           ('Portfolio D', 'JPY', 'injection')])
                                   ).sortlevel(axis=1)

接下来我可以用数据切片创建一个 DataFrame(在本例中是所有数据)

df1=df.loc[pd.IndexSlice[:], pd.IndexSlice[:,:, ['amount', 'injection']]]

接下来创建一个新的DataFrame,其中注入是!= 0

df2=df1[df1.loc[pd.IndexSlice[:], pd.IndexSlice[:, :, 'injection']]!=0]

问题: 为什么这会将 'amount' 列中的所有值重置为 NaN?

一旦金额可用,下一步就是删除所有 NaN 的行

df3=df2.dropna(axis=0, how='all', thresh=None, subset=None, inplace=False)

期望的输出是跨行索引的所有数据:

2010-01-01
2010-01-03
2010-01-04
2010-01-05

我认为您需要添加 fillna with any for check at least one True values if need boolean indexing,它与掩码一起使用,如 boolean Series:

print (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0)
           Portfolio A Portfolio B Portfolio C Portfolio D
                   GBP         EUR         USD         JPY
             injection   injection   injection   injection
2010-01-01        True        True        True        True
2010-01-02       False       False       False       False
2010-01-03       False       False        True       False
2010-01-04        True        True        True        True
2010-01-05       False       False        True       False

mask = (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0).any(axis=1)
print (mask)
2010-01-01     True
2010-01-02    False
2010-01-03     True
2010-01-04     True
2010-01-05     True
dtype: bool

print (df1[mask])
           Portfolio A           Portfolio B           Portfolio C            \
                   GBP                   EUR                   USD             
                amount injection      amount injection      amount injection   
2010-01-01        5777       100        5385       200        5419      4887   
2010-01-03        4971         0        3824         0        4645      3424   
2010-01-04        4827       200        3459       300        4552      3153   
2010-01-05        5207         0        3670         0        4876      3358   

           Portfolio D            
                   JPY            
                amount injection  
2010-01-01         100     200.0  
2010-01-03           0       0.0  
2010-01-04         100     200.0  
2010-01-05           0       0.0  

如果将掩码用作 boolean DataFrame 得到 NaN 其中 False 值。