从 pandas df 核心面板中分割特定记录
slicing specific records from a pandas df core panel
我有一个 pandas 数据框核心面板 (data_r3000),其中包含多个工业部门的库存数据...
{'capital_goods': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 13820 (major_axis) x 423 (minor_axis)
Items axis: OPEN to ADJ_CLOSE
Major_axis axis: 1962-01-02 00:00:00 to 2016-11-18 00:00:00
Minor_axis axis: A to ZEUS, 'consumer': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 11832 (major_axis) x 94 (minor_axis)
Items axis: OPEN to ADJ_CLOSE
Major_axis axis: 1970-01-02 00:00:00 to 2016-11-18 00:00:00
Minor_axis axis: ABG to WSO, 'consumer_non_durables': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 13819 (major_axis) x 138 (minor_axis)
等我隔离了一个扇区,我想在其中对 df 中的某些值进行一些修改。
x = data_r3000['capital_goods'].to_frame().unstack(level=1)
这会产生以下 df:
我在 pandas 中使用多索引的经验很少,而且我在为 'AA' 隔离 'CLOSE' 和 'ADJ_CLOSE' 记录时遇到了问题。我如何隔离这些记录,以便我创建一个 AA_df 来包含 OPEN 和 ADJ_CLOSE 的计时器系列?
我已经尝试 x.xs(['CLOSE','ADJ_CLOSE'], axis=1),
正确隔离了我正在寻找的两个功能,但我仍然不知道如何隔离 'AA'。
谢谢
我想你可以使用 slicers:
idx = pd.IndexSlice
print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])
或:
print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])
样本:
cols = pd.MultiIndex.from_product((['ADJ','ADJ_CLOSE', 'CLOSE'],
['A','AA','AEPI']))
df = pd.DataFrame(np.arange(27).reshape(3,9),columns=cols)
print (df)
ADJ ADJ_CLOSE CLOSE
A AA AEPI A AA AEPI A AA AEPI
0 0 1 2 3 4 5 6 7 8
1 9 10 11 12 13 14 15 16 17
2 18 19 20 21 22 23 24 25 26
idx = pd.IndexSlice
print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])
ADJ_CLOSE CLOSE
AA AA
0 4 7
1 13 16
2 22 25
print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])
ADJ_CLOSE CLOSE
AA AA
0 4 7
1 13 16
2 22 25
Panel
的解决方案:
np.random.seed(1234)
rng = pd.date_range('1/1/2013',periods=10,freq='D')
data = np.random.randn(10, 4)
cols = ['A','AA','AAON','ABAX']
df1, df2, df3 = pd.DataFrame(data, rng, cols),
pd.DataFrame(data, rng, cols),
pd.DataFrame(data, rng, cols)
pf = pd.Panel({'OPEN':df1,'ADJ':df2,'ADJ_CLOSE':df3});pf
print (pf)
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 10 (major_axis) x 4 (minor_axis)
Items axis: ADJ to OPEN
Major_axis axis: 2013-01-01 00:00:00 to 2013-01-10 00:00:00
Minor_axis axis: A to ABAX
print (pf.loc[['OPEN', 'ADJ_CLOSE'], :,'AA'])
OPEN ADJ_CLOSE
2013-01-01 -1.190976 -1.190976
2013-01-02 0.887163 0.887163
2013-01-03 -2.242685 -2.242685
2013-01-04 -2.021255 -2.021255
2013-01-05 0.289092 0.289092
2013-01-06 -0.655969 -0.655969
2013-01-07 -0.469305 -0.469305
2013-01-08 1.058969 1.058969
2013-01-09 1.045938 1.045938
2013-01-10 -0.322795 -0.322795
我有一个 pandas 数据框核心面板 (data_r3000),其中包含多个工业部门的库存数据...
{'capital_goods': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 13820 (major_axis) x 423 (minor_axis)
Items axis: OPEN to ADJ_CLOSE
Major_axis axis: 1962-01-02 00:00:00 to 2016-11-18 00:00:00
Minor_axis axis: A to ZEUS, 'consumer': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 11832 (major_axis) x 94 (minor_axis)
Items axis: OPEN to ADJ_CLOSE
Major_axis axis: 1970-01-02 00:00:00 to 2016-11-18 00:00:00
Minor_axis axis: ABG to WSO, 'consumer_non_durables': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 13819 (major_axis) x 138 (minor_axis)
等我隔离了一个扇区,我想在其中对 df 中的某些值进行一些修改。
x = data_r3000['capital_goods'].to_frame().unstack(level=1)
这会产生以下 df:
我在 pandas 中使用多索引的经验很少,而且我在为 'AA' 隔离 'CLOSE' 和 'ADJ_CLOSE' 记录时遇到了问题。我如何隔离这些记录,以便我创建一个 AA_df 来包含 OPEN 和 ADJ_CLOSE 的计时器系列?
我已经尝试 x.xs(['CLOSE','ADJ_CLOSE'], axis=1),
正确隔离了我正在寻找的两个功能,但我仍然不知道如何隔离 'AA'。
谢谢
我想你可以使用 slicers:
idx = pd.IndexSlice
print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])
或:
print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])
样本:
cols = pd.MultiIndex.from_product((['ADJ','ADJ_CLOSE', 'CLOSE'],
['A','AA','AEPI']))
df = pd.DataFrame(np.arange(27).reshape(3,9),columns=cols)
print (df)
ADJ ADJ_CLOSE CLOSE
A AA AEPI A AA AEPI A AA AEPI
0 0 1 2 3 4 5 6 7 8
1 9 10 11 12 13 14 15 16 17
2 18 19 20 21 22 23 24 25 26
idx = pd.IndexSlice
print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])
ADJ_CLOSE CLOSE
AA AA
0 4 7
1 13 16
2 22 25
print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])
ADJ_CLOSE CLOSE
AA AA
0 4 7
1 13 16
2 22 25
Panel
的解决方案:
np.random.seed(1234)
rng = pd.date_range('1/1/2013',periods=10,freq='D')
data = np.random.randn(10, 4)
cols = ['A','AA','AAON','ABAX']
df1, df2, df3 = pd.DataFrame(data, rng, cols),
pd.DataFrame(data, rng, cols),
pd.DataFrame(data, rng, cols)
pf = pd.Panel({'OPEN':df1,'ADJ':df2,'ADJ_CLOSE':df3});pf
print (pf)
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 10 (major_axis) x 4 (minor_axis)
Items axis: ADJ to OPEN
Major_axis axis: 2013-01-01 00:00:00 to 2013-01-10 00:00:00
Minor_axis axis: A to ABAX
print (pf.loc[['OPEN', 'ADJ_CLOSE'], :,'AA'])
OPEN ADJ_CLOSE
2013-01-01 -1.190976 -1.190976
2013-01-02 0.887163 0.887163
2013-01-03 -2.242685 -2.242685
2013-01-04 -2.021255 -2.021255
2013-01-05 0.289092 0.289092
2013-01-06 -0.655969 -0.655969
2013-01-07 -0.469305 -0.469305
2013-01-08 1.058969 1.058969
2013-01-09 1.045938 1.045938
2013-01-10 -0.322795 -0.322795