使用 IndexSlice 过滤具有 Pandas 的 MultiIndex 数据帧
Using IndexSlice to filter MultiIndex Dataframes with Pandas
问题:如何过滤行,以便我 return 仅注入不等于 0 或 NaN 的行并且不丢失值其他列?
我有一个使用以下代码创建的数据框:
import pandas as pd
df=pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'),
('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio C', 'USD', 'amount'), ('Portfolio C', 'USD', 'injection'),
('Portfolio D', 'JPY', 'amount'), ('Portfolio D', 'JPY', 'injection')])
).sortlevel(axis=1)
接下来我可以用数据切片创建一个 DataFrame(在本例中是所有数据)
df1=df.loc[pd.IndexSlice[:], pd.IndexSlice[:,:, ['amount', 'injection']]]
接下来创建一个新的DataFrame,其中注入是!= 0
df2=df1[df1.loc[pd.IndexSlice[:], pd.IndexSlice[:, :, 'injection']]!=0]
问题: 为什么这会将 'amount' 列中的所有值重置为 NaN?
一旦金额可用,下一步就是删除所有 NaN 的行
df3=df2.dropna(axis=0, how='all', thresh=None, subset=None, inplace=False)
期望的输出是跨行索引的所有数据:
2010-01-01
2010-01-03
2010-01-04
2010-01-05
我认为您需要添加 fillna
with any
for check at least one True
values if need boolean indexing
,它与掩码一起使用,如 boolean Series
:
print (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0)
Portfolio A Portfolio B Portfolio C Portfolio D
GBP EUR USD JPY
injection injection injection injection
2010-01-01 True True True True
2010-01-02 False False False False
2010-01-03 False False True False
2010-01-04 True True True True
2010-01-05 False False True False
mask = (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0).any(axis=1)
print (mask)
2010-01-01 True
2010-01-02 False
2010-01-03 True
2010-01-04 True
2010-01-05 True
dtype: bool
print (df1[mask])
Portfolio A Portfolio B Portfolio C \
GBP EUR USD
amount injection amount injection amount injection
2010-01-01 5777 100 5385 200 5419 4887
2010-01-03 4971 0 3824 0 4645 3424
2010-01-04 4827 200 3459 300 4552 3153
2010-01-05 5207 0 3670 0 4876 3358
Portfolio D
JPY
amount injection
2010-01-01 100 200.0
2010-01-03 0 0.0
2010-01-04 100 200.0
2010-01-05 0 0.0
如果将掩码用作 boolean DataFrame
得到 NaN
其中 False
值。
问题:如何过滤行,以便我 return 仅注入不等于 0 或 NaN 的行并且不丢失值其他列?
我有一个使用以下代码创建的数据框:
import pandas as pd
df=pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'),
('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio C', 'USD', 'amount'), ('Portfolio C', 'USD', 'injection'),
('Portfolio D', 'JPY', 'amount'), ('Portfolio D', 'JPY', 'injection')])
).sortlevel(axis=1)
接下来我可以用数据切片创建一个 DataFrame(在本例中是所有数据)
df1=df.loc[pd.IndexSlice[:], pd.IndexSlice[:,:, ['amount', 'injection']]]
接下来创建一个新的DataFrame,其中注入是!= 0
df2=df1[df1.loc[pd.IndexSlice[:], pd.IndexSlice[:, :, 'injection']]!=0]
问题: 为什么这会将 'amount' 列中的所有值重置为 NaN?
一旦金额可用,下一步就是删除所有 NaN 的行
df3=df2.dropna(axis=0, how='all', thresh=None, subset=None, inplace=False)
期望的输出是跨行索引的所有数据:
2010-01-01
2010-01-03
2010-01-04
2010-01-05
我认为您需要添加 fillna
with any
for check at least one True
values if need boolean indexing
,它与掩码一起使用,如 boolean Series
:
print (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0)
Portfolio A Portfolio B Portfolio C Portfolio D
GBP EUR USD JPY
injection injection injection injection
2010-01-01 True True True True
2010-01-02 False False False False
2010-01-03 False False True False
2010-01-04 True True True True
2010-01-05 False False True False
mask = (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0).any(axis=1)
print (mask)
2010-01-01 True
2010-01-02 False
2010-01-03 True
2010-01-04 True
2010-01-05 True
dtype: bool
print (df1[mask])
Portfolio A Portfolio B Portfolio C \
GBP EUR USD
amount injection amount injection amount injection
2010-01-01 5777 100 5385 200 5419 4887
2010-01-03 4971 0 3824 0 4645 3424
2010-01-04 4827 200 3459 300 4552 3153
2010-01-05 5207 0 3670 0 4876 3358
Portfolio D
JPY
amount injection
2010-01-01 100 200.0
2010-01-03 0 0.0
2010-01-04 100 200.0
2010-01-05 0 0.0
如果将掩码用作 boolean DataFrame
得到 NaN
其中 False
值。