RcppRoll 或 CumSum 滞后于动态 window
RcppRoll or CumSum to lag with dynamic window
对于以下问题,必须有一个简单的、可能的递归解决方案。如果有人可以提供帮助,我将不胜感激:
我使用 data.table 和 RcppRoll 来计算每个产品在过去 26 周内合格周内的每周销售额。对于 26 的 window,只要当前周的#> 26,这就可以正常工作。但是,当当前周的#<= 26 时,我想使用大小为 26 的 window, 25、……等等。
公式为:基线销售额 = 26 周(或更少)的销售额总和(本周之前,仅限合格周数)除以合格周数
下面是一些创建测试数据的代码:
library("data.table")
library("RcppRoll")
products <- seq(1:10) #grouping variable
weeks <- seq(1:100) #weeks
sales <- round(rchisq(1000, 2),0) #sales
countweek <- round(runif(1000, 0,1),0) #1, if qualified weeks
data <- as.data.table(cbind(merge(weeks,products,all=T),sales,countweek))
names(data) <- c("week","product","sales","countweek")
data <- data[order(product,week)]
data[,pastsales:=shift(RcppRoll::roll_sumr(sales*countweek,26L,fill=0),1L,0,"lag"),by=.(product)]
data[,rollweekcount:=shift(RcppRoll::roll_sumr(countweek,26L,fill=0),1L,0,"lag"),by=.(product)]
data[,baseline:=pastsales/rollweekcount]
您可以看到产品 1 在第 26 周行的中断。在第 26 行之后,我得到了想要的结果:
> data[product == 1]
week product sales countweek pastsales rollweekcount baseline
...
20: 20 1 1 0 0 0 NaN
21: 21 1 2 0 0 0 NaN
22: 22 1 1 1 0 0 NaN
23: 23 1 0 0 0 0 NaN
24: 24 1 3 1 0 0 NaN
25: 25 1 5 1 0 0 NaN
26: 26 1 5 1 0 0 NaN
27: 27 1 1 1 44 13 3.384615
28: 28 1 0 1 45 14 3.214286
29: 29 1 5 0 44 14 3.142857
30: 30 1 0 1 44 14 3.142857
31: 31 1 3 1 44 14 3.142857
32: 32 1 4 0 42 14 3.000000
...
您需要 "adaptive" window 宽度。不确定 RcppRoll,但 data.table 的较新版本有 frollsum
可以做到这一点
data[, pastsales := shift(frollsum(sales*countweek, pmin(1:.N, 26L), adaptive = TRUE),
1L, 0, "lag"),
by = .(product)]
data[, rollweekcount := shift(frollsum(countweek, pmin(1:.N, 26L), adaptive = TRUE),
1L, 0, "lag"),
by = .(product)]
对于以下问题,必须有一个简单的、可能的递归解决方案。如果有人可以提供帮助,我将不胜感激:
我使用 data.table 和 RcppRoll 来计算每个产品在过去 26 周内合格周内的每周销售额。对于 26 的 window,只要当前周的#> 26,这就可以正常工作。但是,当当前周的#<= 26 时,我想使用大小为 26 的 window, 25、……等等。
公式为:基线销售额 = 26 周(或更少)的销售额总和(本周之前,仅限合格周数)除以合格周数
下面是一些创建测试数据的代码:
library("data.table")
library("RcppRoll")
products <- seq(1:10) #grouping variable
weeks <- seq(1:100) #weeks
sales <- round(rchisq(1000, 2),0) #sales
countweek <- round(runif(1000, 0,1),0) #1, if qualified weeks
data <- as.data.table(cbind(merge(weeks,products,all=T),sales,countweek))
names(data) <- c("week","product","sales","countweek")
data <- data[order(product,week)]
data[,pastsales:=shift(RcppRoll::roll_sumr(sales*countweek,26L,fill=0),1L,0,"lag"),by=.(product)]
data[,rollweekcount:=shift(RcppRoll::roll_sumr(countweek,26L,fill=0),1L,0,"lag"),by=.(product)]
data[,baseline:=pastsales/rollweekcount]
您可以看到产品 1 在第 26 周行的中断。在第 26 行之后,我得到了想要的结果:
> data[product == 1]
week product sales countweek pastsales rollweekcount baseline
...
20: 20 1 1 0 0 0 NaN
21: 21 1 2 0 0 0 NaN
22: 22 1 1 1 0 0 NaN
23: 23 1 0 0 0 0 NaN
24: 24 1 3 1 0 0 NaN
25: 25 1 5 1 0 0 NaN
26: 26 1 5 1 0 0 NaN
27: 27 1 1 1 44 13 3.384615
28: 28 1 0 1 45 14 3.214286
29: 29 1 5 0 44 14 3.142857
30: 30 1 0 1 44 14 3.142857
31: 31 1 3 1 44 14 3.142857
32: 32 1 4 0 42 14 3.000000
...
您需要 "adaptive" window 宽度。不确定 RcppRoll,但 data.table 的较新版本有 frollsum
可以做到这一点
data[, pastsales := shift(frollsum(sales*countweek, pmin(1:.N, 26L), adaptive = TRUE),
1L, 0, "lag"),
by = .(product)]
data[, rollweekcount := shift(frollsum(countweek, pmin(1:.N, 26L), adaptive = TRUE),
1L, 0, "lag"),
by = .(product)]