计算截至特定日期的每个唯一值的平均值

Question

我的例子的数据。

date1 = seq(as.Date("2019/01/01"), by = "month", length.out = 48)
date2 = seq(as.Date("2019/02/01"), by = "month", length.out = 48)
date3 = seq(as.Date("2019/02/01"), by = "month", length.out = 48)
date4 = seq(as.Date("2019/02/01"), by = "month", length.out = 48)
date = c(date1,date2,date3,date4)



subproducts1=rep("1",48)
subproducts2=rep("2",48)
subproductsx=rep("x",48)
subproductsy=rep("y",48)

b1 <- c(rnorm(48,5))
b2 <- c(rnorm(48,5))
b3 <-c(rnorm(48,5) )
b4 <- c(rnorm(48,5))

dfone <- data.frame(
                "date"= date,
               
                "subproduct"= 
                  c(subproducts1,subproducts2,subproductsx,subproductsy),
                "actuals"= c(b1,b2,b3,b4))

这会为日期 2、3、4 创建值为 0 的 2019 年 1 月。

 dfone <-dfone %>%
 complete(date = seq.Date(from = min(date), to = as.Date('2021-06-01'), by = 'month'), 
       nesting(subproduct), fill = list(actuals = 0))

问题：这会计算每个独特子产品的均值，并将 0 替换为每个子产品的均值，但我如何设置硬截止值，以便均值仅基于 2019 年 1 月至 2020 年 12 月而不是 1 月2019 年到 2022 年 12 月？

library(dplyr)
dfone_new <- dfone %>%
     group_by(subproduct)  %>%
     mutate(actuals = replace(actuals, actuals == 0, 
         mean(actuals[actuals != 0], na.rm = TRUE))) %>%
     ungroup

Answer 1

在对 'actuals' 进行子集化时，我们可能还需要一个逻辑表达式，即在计算 mean[= 时，'date' 应该是 between 2019 年一月和 2020 年十二月13=]

library(dplyr)
library(tidyr)
dfone %>%
     group_by(subproduct)  %>%
     mutate(actuals = replace(actuals, actuals == 0, 
         mean(actuals[actuals != 0  & 
    between(date, as.Date("2019-01-01"), as.Date("2020-12-31"))], 
         na.rm = TRUE)))

计算截至特定日期的每个唯一值的平均值

Calculate Mean for Each Unique Value up to a certain date

r

data-manipulation

dplyr