数据框中过去可能丢失的月份的值

value from a past, potentially missing month in dataframe

假设我有一个如下所示的 DataFrame:

Month,   Gender, State, Value
2010-01, M,      S1,    10
2010-02, M,      S1,    20
2010-05, M,      S1,    26
2010-03, F,      S2,    11

我想为上个月(或过去 X 个月)的给定性别和州添加另一列如果存在,即:

Month,   Gender, State, Value, Last Value
2010-01, M,      S1,    10,    NaN
2010-02, M,      S1,    20,    10 
2010-05, M,      S1,    26,    NaN (there is no 2010-04 for M, S1)
2010-03, F,      S2,    11,    NaN

我知道我必须 groupby(['Gender', 'State']) 但是 shift() 不起作用,因为它只是按行数移动数据,它不知道周期本身(如果缺少一个月) .

我找到了这样做的方法,虽然不太高兴:

full_index = []
for g in all_genders:
  for s in all_states:
    for m in all_months:
      full_index.append((g, s, m))
df = df.set_index(['Gender', 'State', 'Month'])
df = df.reindex(full_index) # fill in all missing values

所以基本上,我们不处理数据中缺失的行,而是创建缺失的行,shift() 将按预期工作。

即:

df['Last Value'] = df.shift(1).Value
...
df = df.reset_index() # go back to tabular format from this hierarchy