Pandas 列多索引相互减去列
Pandas Column Multiindex Subtracting Columns from each other
pandas 数据框:
构造函数:
c = pd.MultiIndex.from_product([['AAPL','AMZN'],['price','custom']])
i = pd.date_range(start='2017-01-01',end='2017-01-6')
df1 = pd.DataFrame(index=i,columns=c)
df1.loc[:,('AAPL','price')] = list(range(51,57))
df1.loc[:,('AMZN','price')] = list(range(101,107))
df1.loc[:,('AAPL','custom')] = list(range(1,7))
df1.loc[:,('AMZN','custom')] = list(range(17,23))
df1.index.set_names('Dates',inplace=True)
df1.sort_index(axis=1,level=0,inplace=True) # needed for pd.IndexSlice[]
df1
产生:(不知道如何格式化 Jupyter Notebook 的输出)
AAPL AMZN
custom price custom price
Dates
2017-01-01 1 51 17 101
2017-01-02 2 52 18 102
2017-01-03 3 53 19 103
2017-01-04 4 54 20 104
2017-01-05 5 55 21 105
2017-01-06 6 56 22 106
问题:
如何在 MultiIndex 的第 2 级创建第 3 列,即 price
和 custom
之间的差异?这应该针对每个顶部列级别单独计算,即针对 AAPL 和 AMZN 单独计算。
尝试的解决方案:
我尝试以两种方式使用 pd.IndexSlice
,两者都给我 NaNs
:
df1.loc[:,pd.IndexSlice[:,'price']].sub(df1.loc[:,pd.IndexSlice[:,'custom']])
df1.loc[:,pd.IndexSlice[:,'price']] - df1.loc[:,pd.IndexSlice[:,'custom']]
Returns:
AAPL AMZN
custom price custom price
Dates
2017-01-01 NaN NaN NaN NaN
2017-01-02 NaN NaN NaN NaN
2017-01-03 NaN NaN NaN NaN
2017-01-04 NaN NaN NaN NaN
2017-01-05 NaN NaN NaN NaN
2017-01-06 NaN NaN NaN NaN
如何添加具有差异的第三列?
谢谢。
您可以考虑减去以下值:
df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values
要加入它,您可以使用 pd.concat
:
In [221]: df2 = (df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values)\
.rename(columns={'price' : 'new'})
In [222]: pd.concat([df1, df2], axis=1)
Out[222]:
AAPL AMZN AAPL AMZN
custom price custom price new new
Dates
2017-01-01 1 51 17 101 50 84
2017-01-02 2 52 18 102 50 84
2017-01-03 3 53 19 103 50 84
2017-01-04 4 54 20 104 50 84
2017-01-05 5 55 21 105 50 84
2017-01-06 6 56 22 106 50 84
您可以按值减去,然后重命名并最后加入原始值:
a = df1.loc[:,pd.IndexSlice[:,'price']].sub(df1.loc[:,pd.IndexSlice[:,'custom']].values, 1)
.rename(columns={'price':'sub'})
df1 = df1.join(a).sort_index(axis=1)
print (df1)
AAPL AMZN
custom price sub custom price sub
Dates
2017-01-01 1 51 50 17 101 84
2017-01-02 2 52 50 18 102 84
2017-01-03 3 53 50 19 103 84
2017-01-04 4 54 50 20 104 84
2017-01-05 5 55 50 21 105 84
2017-01-06 6 56 50 22 106 84
pandas 数据框:
构造函数:
c = pd.MultiIndex.from_product([['AAPL','AMZN'],['price','custom']])
i = pd.date_range(start='2017-01-01',end='2017-01-6')
df1 = pd.DataFrame(index=i,columns=c)
df1.loc[:,('AAPL','price')] = list(range(51,57))
df1.loc[:,('AMZN','price')] = list(range(101,107))
df1.loc[:,('AAPL','custom')] = list(range(1,7))
df1.loc[:,('AMZN','custom')] = list(range(17,23))
df1.index.set_names('Dates',inplace=True)
df1.sort_index(axis=1,level=0,inplace=True) # needed for pd.IndexSlice[]
df1
产生:(不知道如何格式化 Jupyter Notebook 的输出)
AAPL AMZN
custom price custom price
Dates
2017-01-01 1 51 17 101
2017-01-02 2 52 18 102
2017-01-03 3 53 19 103
2017-01-04 4 54 20 104
2017-01-05 5 55 21 105
2017-01-06 6 56 22 106
问题:
如何在 MultiIndex 的第 2 级创建第 3 列,即 price
和 custom
之间的差异?这应该针对每个顶部列级别单独计算,即针对 AAPL 和 AMZN 单独计算。
尝试的解决方案:
我尝试以两种方式使用 pd.IndexSlice
,两者都给我 NaNs
:
df1.loc[:,pd.IndexSlice[:,'price']].sub(df1.loc[:,pd.IndexSlice[:,'custom']])
df1.loc[:,pd.IndexSlice[:,'price']] - df1.loc[:,pd.IndexSlice[:,'custom']]
Returns:
AAPL AMZN
custom price custom price
Dates
2017-01-01 NaN NaN NaN NaN
2017-01-02 NaN NaN NaN NaN
2017-01-03 NaN NaN NaN NaN
2017-01-04 NaN NaN NaN NaN
2017-01-05 NaN NaN NaN NaN
2017-01-06 NaN NaN NaN NaN
如何添加具有差异的第三列?
谢谢。
您可以考虑减去以下值:
df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values
要加入它,您可以使用 pd.concat
:
In [221]: df2 = (df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values)\
.rename(columns={'price' : 'new'})
In [222]: pd.concat([df1, df2], axis=1)
Out[222]:
AAPL AMZN AAPL AMZN
custom price custom price new new
Dates
2017-01-01 1 51 17 101 50 84
2017-01-02 2 52 18 102 50 84
2017-01-03 3 53 19 103 50 84
2017-01-04 4 54 20 104 50 84
2017-01-05 5 55 21 105 50 84
2017-01-06 6 56 22 106 50 84
您可以按值减去,然后重命名并最后加入原始值:
a = df1.loc[:,pd.IndexSlice[:,'price']].sub(df1.loc[:,pd.IndexSlice[:,'custom']].values, 1)
.rename(columns={'price':'sub'})
df1 = df1.join(a).sort_index(axis=1)
print (df1)
AAPL AMZN
custom price sub custom price sub
Dates
2017-01-01 1 51 50 17 101 84
2017-01-02 2 52 50 18 102 84
2017-01-03 3 53 50 19 103 84
2017-01-04 4 54 50 20 104 84
2017-01-05 5 55 50 21 105 84
2017-01-06 6 56 50 22 106 84