重新采样 10D 但直到月末

Question

我想对频率为 10D 的 DataFrame 进行重新采样，但总是在月底切割最后十年。 ES:

print(df)
            data
index
2010-01-01  145.08
2010-01-02  143.69
2010-01-03  101.06
2010-01-04  57.63
2010-01-05  65.46
...
2010-02-24  48.06
2010-02-25  87.41
2010-02-26  71.97
2010-02-27  73.1
2010-02-28  41.43

应用类似 df.resample('10DM').mean()

           data
index
2010-01-10  97.33
2010-01-20  58.58
2010-01-31  41.43
2010-02-10  35.17
2010-02-20  32.44
2010-02-28  55.44

请注意，第 1 和第 2 个十年是正常的 10D 重采样，但第 3 个可以是基于月份和年份的 8-9-10-11 天。

提前致谢。

Answer 1

样本数据（方便查）：

# df = pd.DataFrame({"value": np.arange(1, len(dti)+1)}, index=dti)
>>> df
            value
2010-01-01      1
2010-01-02      2
2010-01-03      3
2010-01-04      4
2010-01-05      5
...
2010-02-24     55
2010-02-25     56
2010-02-26     57
2010-02-27     58
2010-02-28     59

您需要按（天、月、年）创建组：

grp = df.groupby([pd.cut(df.index.day, [0, 10, 20, 31]),
                  pd.Grouper(freq='M'),
                  pd.Grouper(freq='Y')])

现在您可以计算每组的平均值：

out = grp['value'].apply(lambda x: (x.index.max(), x.mean())).apply(pd.Series) \
                  .reset_index(drop=True).rename(columns={0:'date', 1:'value'}) \
                  .set_index('date').sort_index()

输出结果：

>>> out
            value
date
2010-01-10    5.5
2010-01-20   15.5
2010-01-31   26.0
2010-02-10   36.5
2010-02-20   46.5
2010-02-28   55.5

重新采样 10D 但直到月末

Resample 10D but until end of months

pandas

pandas-resample