如何从当前值中减去过去日历周的平均值?
How to subtract the mean of past calendar weeks from the current value?
我有一个具有以下形状的数据框 df_pct_Max:
Date Value1 Value2
01.01.2015 5 6
08.01.2015 3 2
... ... ...
28.01.2017 7 8
我想计算每个日历周的平均值,然后从日历周的实际值中减去它。
我创建了一个数据框,每个日历周的平均值如下:
df_weekly_avg_Max = df_pct_Max.groupby(df_pct_Max.index.week).mean()
这导致数据帧 df_weekly_avg_Max:
KW Value1 Value2
1 3.5 4.3
2 4 3
… … …
52 8.33 6.2
现在我正在尝试从 df_pct_Max 中减去 df_weekly_avg_Max,并希望在日历周之前完成此操作。
我尝试添加一列 'KW' 然后
dfresult = df_pct_Max.sub(df_weekly_avg_Max, axis='KW')
但我在那里遇到错误。
是否还有一种滚动方式(从 2015 年第 1 日历周和 2016 年日历周减去过去 3 年第 1 日历周的平均值...)?
有人可以帮忙解决这个问题吗?
这个答案不干净,因为它没有很好地利用 pandas,但我也不认为它会很慢(取决于你的数据帧有多大),基本思想是建立一个每天重复一次的均值列表,这样你就可以简单地减去。
代码:
from collections import Counter
import pandas as pd
import numpy as np
#Build up example data frame
num_days = 15
dates = pd.date_range('1/1/2015', periods=num_days, freq='D')
val1s = np.random.random_integers(1, 30, num_days)
val2s = np.random.random_integers(1, 30, num_days)
df_pct_MAX = pd.DataFrame({'Date':dates, 'Value1':val1s, 'Value2':val2s})
df_pct_MAX['Day'] = df_pct_MAX['Date'].dt.weekday_name
df_pct_MAX['Week'] = df_pct_MAX['Date'].dt.week
#OPs logic to get means
df_weekly_avg_Max = df_pct_MAX.groupby(df_pct_MAX['Week']).mean()
#Build up a list of the means repeated once for each day in that week
mean_fields = ['Value1','Value2'] #<-- only hardcoded portion
means_dict = {k:list(df_weekly_avg_Max[k]) for k in mean_fields} #<-- convert means into lists keyed by field
week_counts = Counter(df_pct_MAX['Week']).values() #<-- count how many days are represented in each week
#Build up a dict keyed by field with the means repeated the correct number of times
means = {k:[means_dict[k][i] for i,count in enumerate(week_counts)
for x in range(count)] for k in mean_fields}
#Assign a new column to the means for each field (not necessary, just to show done correctly)
for k in mean_fields:
df_pct_MAX[k+'Mean'] = means[k]
print(df_pct_MAX)
输出:
Date Value1 Value2 Day Week Value1Mean Value2Mean
0 2015-01-01 12 19 Thursday 1 9.000000 19.250000
1 2015-01-02 15 27 Friday 1 9.000000 19.250000
2 2015-01-03 2 30 Saturday 1 9.000000 19.250000
3 2015-01-04 7 1 Sunday 1 9.000000 19.250000
4 2015-01-05 6 20 Monday 2 17.571429 14.142857
5 2015-01-06 9 24 Tuesday 2 17.571429 14.142857
6 2015-01-07 25 17 Wednesday 2 17.571429 14.142857
7 2015-01-08 22 8 Thursday 2 17.571429 14.142857
8 2015-01-09 30 7 Friday 2 17.571429 14.142857
9 2015-01-10 10 1 Saturday 2 17.571429 14.142857
10 2015-01-11 21 22 Sunday 2 17.571429 14.142857
11 2015-01-12 23 29 Monday 3 23.750000 19.750000
12 2015-01-13 23 16 Tuesday 3 23.750000 19.750000
13 2015-01-14 21 17 Wednesday 3 23.750000 19.750000
14 2015-01-15 28 17 Thursday 3 23.750000 19.750000
我找到了整个数据框的解决方案。
我为日历周添加了一列 'KW',然后使用 lambda 函数对其执行 groupby,该函数从日历周“1”的当前值中减去日历周“1”的平均值...
df_pct_Max ['KW'] = df_pct_Max.index.week
dfresult = df_pct_Max.groupby(by='KW').transform(lambda x: x-x.mean())
这对我有用。
如果能够调整平均值的时间范围会更好,例如我从当前日历周“1”值中减去过去 3 年左右日历周的平均值。但这看起来相当复杂,这个解决方案适用于当前的分析。
我有一个具有以下形状的数据框 df_pct_Max:
Date Value1 Value2
01.01.2015 5 6
08.01.2015 3 2
... ... ...
28.01.2017 7 8
我想计算每个日历周的平均值,然后从日历周的实际值中减去它。
我创建了一个数据框,每个日历周的平均值如下:
df_weekly_avg_Max = df_pct_Max.groupby(df_pct_Max.index.week).mean()
这导致数据帧 df_weekly_avg_Max:
KW Value1 Value2
1 3.5 4.3
2 4 3
… … …
52 8.33 6.2
现在我正在尝试从 df_pct_Max 中减去 df_weekly_avg_Max,并希望在日历周之前完成此操作。
我尝试添加一列 'KW' 然后
dfresult = df_pct_Max.sub(df_weekly_avg_Max, axis='KW')
但我在那里遇到错误。
是否还有一种滚动方式(从 2015 年第 1 日历周和 2016 年日历周减去过去 3 年第 1 日历周的平均值...)? 有人可以帮忙解决这个问题吗?
这个答案不干净,因为它没有很好地利用 pandas,但我也不认为它会很慢(取决于你的数据帧有多大),基本思想是建立一个每天重复一次的均值列表,这样你就可以简单地减去。
代码:
from collections import Counter
import pandas as pd
import numpy as np
#Build up example data frame
num_days = 15
dates = pd.date_range('1/1/2015', periods=num_days, freq='D')
val1s = np.random.random_integers(1, 30, num_days)
val2s = np.random.random_integers(1, 30, num_days)
df_pct_MAX = pd.DataFrame({'Date':dates, 'Value1':val1s, 'Value2':val2s})
df_pct_MAX['Day'] = df_pct_MAX['Date'].dt.weekday_name
df_pct_MAX['Week'] = df_pct_MAX['Date'].dt.week
#OPs logic to get means
df_weekly_avg_Max = df_pct_MAX.groupby(df_pct_MAX['Week']).mean()
#Build up a list of the means repeated once for each day in that week
mean_fields = ['Value1','Value2'] #<-- only hardcoded portion
means_dict = {k:list(df_weekly_avg_Max[k]) for k in mean_fields} #<-- convert means into lists keyed by field
week_counts = Counter(df_pct_MAX['Week']).values() #<-- count how many days are represented in each week
#Build up a dict keyed by field with the means repeated the correct number of times
means = {k:[means_dict[k][i] for i,count in enumerate(week_counts)
for x in range(count)] for k in mean_fields}
#Assign a new column to the means for each field (not necessary, just to show done correctly)
for k in mean_fields:
df_pct_MAX[k+'Mean'] = means[k]
print(df_pct_MAX)
输出:
Date Value1 Value2 Day Week Value1Mean Value2Mean
0 2015-01-01 12 19 Thursday 1 9.000000 19.250000
1 2015-01-02 15 27 Friday 1 9.000000 19.250000
2 2015-01-03 2 30 Saturday 1 9.000000 19.250000
3 2015-01-04 7 1 Sunday 1 9.000000 19.250000
4 2015-01-05 6 20 Monday 2 17.571429 14.142857
5 2015-01-06 9 24 Tuesday 2 17.571429 14.142857
6 2015-01-07 25 17 Wednesday 2 17.571429 14.142857
7 2015-01-08 22 8 Thursday 2 17.571429 14.142857
8 2015-01-09 30 7 Friday 2 17.571429 14.142857
9 2015-01-10 10 1 Saturday 2 17.571429 14.142857
10 2015-01-11 21 22 Sunday 2 17.571429 14.142857
11 2015-01-12 23 29 Monday 3 23.750000 19.750000
12 2015-01-13 23 16 Tuesday 3 23.750000 19.750000
13 2015-01-14 21 17 Wednesday 3 23.750000 19.750000
14 2015-01-15 28 17 Thursday 3 23.750000 19.750000
我找到了整个数据框的解决方案。 我为日历周添加了一列 'KW',然后使用 lambda 函数对其执行 groupby,该函数从日历周“1”的当前值中减去日历周“1”的平均值...
df_pct_Max ['KW'] = df_pct_Max.index.week
dfresult = df_pct_Max.groupby(by='KW').transform(lambda x: x-x.mean())
这对我有用。
如果能够调整平均值的时间范围会更好,例如我从当前日历周“1”值中减去过去 3 年左右日历周的平均值。但这看起来相当复杂,这个解决方案适用于当前的分析。