数据框中过去可能丢失的月份的值

Question

假设我有一个如下所示的 DataFrame：

Month,   Gender, State, Value
2010-01, M,      S1,    10
2010-02, M,      S1,    20
2010-05, M,      S1,    26
2010-03, F,      S2,    11

我想为上个月（或过去 X 个月）的给定性别和州添加另一列如果存在，即：

Month,   Gender, State, Value, Last Value
2010-01, M,      S1,    10,    NaN
2010-02, M,      S1,    20,    10 
2010-05, M,      S1,    26,    NaN (there is no 2010-04 for M, S1)
2010-03, F,      S2,    11,    NaN

我知道我必须 groupby(['Gender', 'State']) 但是 shift() 不起作用，因为它只是按行数移动数据，它不知道周期本身（如果缺少一个月） .

Answer 1

我找到了这样做的方法，虽然不太高兴：

full_index = []
for g in all_genders:
  for s in all_states:
    for m in all_months:
      full_index.append((g, s, m))
df = df.set_index(['Gender', 'State', 'Month'])
df = df.reindex(full_index) # fill in all missing values

所以基本上，我们不处理数据中缺失的行，而是创建缺失的行，shift() 将按预期工作。

即：

df['Last Value'] = df.shift(1).Value
...
df = df.reset_index() # go back to tabular format from this hierarchy

数据框中过去可能丢失的月份的值

value from a past, potentially missing month in dataframe

python

time-series

pandas