计算变化是否大于 0.25

Calculate if the change is greater than .25

考虑这个data.table

library(data.table)
dt1 <- data.table(id = c("01","01","01","01","01", "02","02","02",",02","02"),
                 change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3))

如果每个 id 的行之间的变化大于 change_total 的 25% (change_greater_than_25_percent),然后如果那些“是”大于每行的 25%,我该如何计算ID。 看起来像这样

dt2 <- data.table(id = c("01","01","01","01", "01","02","02","02","02","02"),
                 change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3),
                 change_greater_than_25_percent = c("no","yes","no","no","no","no","yes","no","no","no"),
                 change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id = c("no","yes","yes","yes","no","no","yes","yes","yes","no"))

我能够重现第 change_greater_than_25_percent 列的预期结果:

library(data.table)
library(magrittr) # piping used to improve readability
dt1[, change_greater_than_25_percent := 
      (change_total / shift(change_total, fill = Inf) - 1. > 0.25) %>%  
      fifelse("yes", "no"), by = id][]
    id change_total change_greater_than_25_percent
 1: 01          0.0                             no
 2: 01          0.9                            yes
 3: 01         -0.1                             no
 4: 01          0.8                             no
 5: 01          0.3                             no
 6: 02          0.0                             no
 7: 02          0.9                            yes
 8: 02         -0.1                             no
 9: 02          0.8                             no
10: 02          0.3                             no

不幸的是,我不明白列 change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id

的确切定义

and then if those "yes" are greater than 25 percent by row per id

编辑 1

如果我理解正确,第二列表示 change_greater_than_25_percent 累积平均值 的位置(取为 [=18 的 0、1 值) =], "yes") 大于 0.25.

如果将 "no""yes" 分别替换为逻辑值 FALSETRUE,这将更容易计算,因为根据 help("TRUE")

Logical vectors are coerced to integer vectors in contexts where a numerical value is required, with TRUE being mapped to 1L, FALSE to 0L and NA to NA_integer_.

library(data.table)
dt1[, change_greater_than_25_percent := 
      change_total / shift(change_total, fill = Inf) - 1. > 0.25, by = id][
        , change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id := 
          cumsum(change_greater_than_25_percent) / (1:.N) >= 0.25, by = id][]
    id change_total change_greater_than_25_percent change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id
 1: 01          0.0                          FALSE                                                                  FALSE
 2: 01          0.9                           TRUE                                                                   TRUE
 3: 01         -0.1                          FALSE                                                                   TRUE
 4: 01          0.8                          FALSE                                                                   TRUE
 5: 01          0.3                          FALSE                                                                  FALSE
 6: 02          0.0                          FALSE                                                                  FALSE
 7: 02          0.9                           TRUE                                                                   TRUE
 8: 02         -0.1                          FALSE                                                                   TRUE
 9: 02          0.8                          FALSE                                                                   TRUE
10: 02          0.3                          FALSE                                                                  FALSE

备注

  1. 为了重现预期结果 大于或等于 25% (>= 0.25) 必须使用而不是 大于 25百分比 (> 0.25)。我没有修改已经很长的列名。
  2. 使用逻辑值代替字符简化了计算,因此我们不需要magrittr管道.
  3. data.table 使用链接
  4. 为了计算累积平均值dplyr::cummean()自适应 frollmean()可以交替使用。

编辑 2

以上代码要求 dt1 为 class data.table。如需确认,请致电

setDT(dt1)

事先。