Pandas 多索引
Pandas multi index
我目前有一个 pandas 这种结构的多索引(代码和字段是索引):
value
ticker field
DE0001141174 Govt CASH_FLOW_DATE 2000-11-21
CASH_FLOW_AMOUNT 51250
PRINCIPAL_AMOUNT 1e+06
DE0001141232 Govt CASH_FLOW_DATE 2000-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2001-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2002-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 1e+06
DE0001141380 Govt CASH_FLOW_DATE 2002-08-18
CASH_FLOW_AMOUNT 67808.2
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2003-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2004-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2005-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2006-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 1e+06
我希望它被转换成这样的结构,其中代码和 CASH_FLOW_DATE 是索引:
ticker CASH_FLOW_DATE CASH_FLOW_AMOUNT PRINCIPAL_AMOUNT
DE0001141174 Govt 2000-11-21 51250 1e+06
DE0001141232 Govt 2000-05-17 45000 0
2001-05-17 45000 0
2002-05-17 45000 1e+06
DE0001141380 Govt 2002-08-18 67808.2 0
2003-08-18 45000 0
2004-08-18 45000 0
2005-08-18 45000 0
2006-08-18 45000 0
我想问题是 python/pandas 无法自然地识别出 'CASH_FLOW_DATE' 下面的两行与该值相关。
我想我可以用很多丑陋的循环来做到这一点,但我想知道是否有更多的 pythonic 方法来做到这一点。
你需要cumcount
for new level of index which is appended by set_index
to original index and then call unstack
:
df = df.set_index(df.groupby(level=[0,1]).cumcount(), append=True)
df = df['value'].unstack(level=1, fill_value=0).reset_index(level=1, drop=True).reset_index()
print (df)
field ticker CASH_FLOW_AMOUNT CASH_FLOW_DATE PRINCIPAL_AMOUNT
0 DE0001141174 Govt 51250 2000-11-21 1e+06
1 DE0001141232 Govt 45000 2000-05-17 0
2 DE0001141232 Govt 45000 2001-05-17 0
3 DE0001141232 Govt 45000 2002-05-17 1e+06
4 DE0001141380 Govt 67808.2 2002-08-18 0
5 DE0001141380 Govt 45000 2003-08-18 0
6 DE0001141380 Govt 45000 2004-08-18 0
7 DE0001141380 Govt 45000 2005-08-18 0
8 DE0001141380 Govt 45000 2006-08-18 1e+06
我目前有一个 pandas 这种结构的多索引(代码和字段是索引):
value
ticker field
DE0001141174 Govt CASH_FLOW_DATE 2000-11-21
CASH_FLOW_AMOUNT 51250
PRINCIPAL_AMOUNT 1e+06
DE0001141232 Govt CASH_FLOW_DATE 2000-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2001-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2002-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 1e+06
DE0001141380 Govt CASH_FLOW_DATE 2002-08-18
CASH_FLOW_AMOUNT 67808.2
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2003-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2004-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2005-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2006-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 1e+06
我希望它被转换成这样的结构,其中代码和 CASH_FLOW_DATE 是索引:
ticker CASH_FLOW_DATE CASH_FLOW_AMOUNT PRINCIPAL_AMOUNT
DE0001141174 Govt 2000-11-21 51250 1e+06
DE0001141232 Govt 2000-05-17 45000 0
2001-05-17 45000 0
2002-05-17 45000 1e+06
DE0001141380 Govt 2002-08-18 67808.2 0
2003-08-18 45000 0
2004-08-18 45000 0
2005-08-18 45000 0
2006-08-18 45000 0
我想问题是 python/pandas 无法自然地识别出 'CASH_FLOW_DATE' 下面的两行与该值相关。 我想我可以用很多丑陋的循环来做到这一点,但我想知道是否有更多的 pythonic 方法来做到这一点。
你需要cumcount
for new level of index which is appended by set_index
to original index and then call unstack
:
df = df.set_index(df.groupby(level=[0,1]).cumcount(), append=True)
df = df['value'].unstack(level=1, fill_value=0).reset_index(level=1, drop=True).reset_index()
print (df)
field ticker CASH_FLOW_AMOUNT CASH_FLOW_DATE PRINCIPAL_AMOUNT
0 DE0001141174 Govt 51250 2000-11-21 1e+06
1 DE0001141232 Govt 45000 2000-05-17 0
2 DE0001141232 Govt 45000 2001-05-17 0
3 DE0001141232 Govt 45000 2002-05-17 1e+06
4 DE0001141380 Govt 67808.2 2002-08-18 0
5 DE0001141380 Govt 45000 2003-08-18 0
6 DE0001141380 Govt 45000 2004-08-18 0
7 DE0001141380 Govt 45000 2005-08-18 0
8 DE0001141380 Govt 45000 2006-08-18 1e+06