如何从当前值中减去过去日历周的平均值?

How to subtract the mean of past calendar weeks from the current value?

我有一个具有以下形状的数据框 df_pct_Max:

    Date    Value1  Value2
01.01.2015   5        6
08.01.2015   3        2
...          ...      ...
28.01.2017   7        8

我想计算每个日历周的平均值,然后从日历周的实际值中减去它。

我创建了一个数据框,每个日历周的平均值如下:

df_weekly_avg_Max = df_pct_Max.groupby(df_pct_Max.index.week).mean()

这导致数据帧 df_weekly_avg_Max:

    KW  Value1  Value2
      1   3.5   4.3
      2    4    3
      …    …    …
     52    8.33  6.2

现在我正在尝试从 df_pct_Max 中减去 df_weekly_avg_Max,并希望在日历周之前完成此操作。

我尝试添加一列 'KW' 然后

dfresult = df_pct_Max.sub(df_weekly_avg_Max, axis='KW')

但我在那里遇到错误。

是否还有一种滚动方式(从 2015 年第 1 日历周和 2016 年日历周减去过去 3 年第 1 日历周的平均值...)? 有人可以帮忙解决这个问题吗?

这个答案不干净,因为它没有很好地利用 pandas,但我也不认为它会很慢(取决于你的数据帧有多大),基本思想是建立一个每天重复一次的均值列表,这样你就可以简单地减去。

代码:

from collections import Counter
import pandas as pd
import numpy as np

#Build up example data frame
num_days = 15
dates = pd.date_range('1/1/2015', periods=num_days, freq='D')
val1s = np.random.random_integers(1, 30, num_days)
val2s = np.random.random_integers(1, 30, num_days)

df_pct_MAX = pd.DataFrame({'Date':dates, 'Value1':val1s, 'Value2':val2s})
df_pct_MAX['Day'] = df_pct_MAX['Date'].dt.weekday_name
df_pct_MAX['Week'] = df_pct_MAX['Date'].dt.week

#OPs logic to get means
df_weekly_avg_Max = df_pct_MAX.groupby(df_pct_MAX['Week']).mean()

#Build up a list of the means repeated once for each day in that week
mean_fields = ['Value1','Value2'] #<-- only hardcoded portion
means_dict = {k:list(df_weekly_avg_Max[k]) for k in mean_fields} #<-- convert means into lists keyed by field
week_counts = Counter(df_pct_MAX['Week']).values() #<-- count how many days are represented in each week

#Build up a dict keyed by field with the means repeated the correct number of times
means = {k:[means_dict[k][i] for i,count in enumerate(week_counts)
         for x in range(count)] for k in mean_fields}

#Assign a new column to the means for each field (not necessary, just to show done correctly)
for k in mean_fields:
    df_pct_MAX[k+'Mean'] = means[k]

print(df_pct_MAX)

输出:

         Date  Value1  Value2        Day  Week  Value1Mean  Value2Mean
0  2015-01-01      12      19   Thursday     1    9.000000   19.250000
1  2015-01-02      15      27     Friday     1    9.000000   19.250000
2  2015-01-03       2      30   Saturday     1    9.000000   19.250000
3  2015-01-04       7       1     Sunday     1    9.000000   19.250000
4  2015-01-05       6      20     Monday     2   17.571429   14.142857
5  2015-01-06       9      24    Tuesday     2   17.571429   14.142857
6  2015-01-07      25      17  Wednesday     2   17.571429   14.142857
7  2015-01-08      22       8   Thursday     2   17.571429   14.142857
8  2015-01-09      30       7     Friday     2   17.571429   14.142857
9  2015-01-10      10       1   Saturday     2   17.571429   14.142857
10 2015-01-11      21      22     Sunday     2   17.571429   14.142857
11 2015-01-12      23      29     Monday     3   23.750000   19.750000
12 2015-01-13      23      16    Tuesday     3   23.750000   19.750000
13 2015-01-14      21      17  Wednesday     3   23.750000   19.750000
14 2015-01-15      28      17   Thursday     3   23.750000   19.750000

我找到了整个数据框的解决方案。 我为日历周添加了一列 'KW',然后使用 lambda 函数对其执行 groupby,该函数从日历周“1”的当前值中减去日历周“1”的平均值...

df_pct_Max ['KW']     = df_pct_Max.index.week
dfresult = df_pct_Max.groupby(by='KW').transform(lambda x: x-x.mean())

这对我有用。

如果能够调整平均值的时间范围会更好,例如我从当前日历周“1”值中减去过去 3 年左右日历周的平均值。但这看起来相当复杂,这个解决方案适用于当前的分析。