为 R 中的每一行和 ID 计算过去 X 个月的中值

Question

考虑到实际值，我需要为每个 ID 创建一个新列，其中包含过去 6 个月（180 天）的中值。如果没有信息或者之前的记录>6个月，则中值必须是该行的值。

输入数据

我有这个：

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date")), .Names = c("id", "value", "date"), row.names = c(NA, -11L), class = "data.frame")

我要实现的是：

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date"), median = c(956,986,995,995,990,700,600,797.5,956,975.5, 986)), .Names = c("id", "value", "date", "median"), row.names = c(NA, -11L), class = "data.frame")

我尝试使用 zoo 包中的 rollaplyr 和 rollmeadian 来遵循此 post 中提供的答案 Finding Cumulative Sum In R Using Conditions

但是我没有得到好的结果

先谢谢你

Answer 1

试试这个解决方案：

使用函数 split:

将 data.frame 拆分为 id

list_df<-split(df,f=df$id)

函数在 date 条件下提供单个 id 值的中位数：

f_median<-function(i,db)
{
  return(median(db[as.POSIXct(db[,"date"])>=as.POSIXct(db[i,"date"]-180) & as.POSIXct(db[,"date"])<=as.POSIXct(db[i,"date"]),"value"]))
}

迭代拆分data.frame：

f<-function(db)
{
   return(sapply(rep(1:nrow(db)),f_median,db))
}

你想要的输出

 median<-unlist(lapply(list_df,f))
 cbind(df,median)
   id value       date median
1   1   956 2012-09-18  956.0
2   2   986 2016-10-01  986.0
31  3   995 2000-01-09  995.0
32  3   995 2000-04-21  995.0
33  3   986 2000-10-13  990.5
41  4   700 2010-01-01  700.0
42  4   600 2011-10-01  600.0
43  4   995 2012-01-03  797.5
44  4   956 2012-01-05  956.0
45  4  1000 2012-02-10  975.5
46  4   986 2014-01-09  986.0

为 R 中的每一行和 ID 计算过去 X 个月的中值

Calculate median value over the past X months, for each row and ID in R

time

r

median

zoo

dplyr