计算变化是否大于 0.25

Question

考虑这个data.table

library(data.table)
dt1 <- data.table(id = c("01","01","01","01","01", "02","02","02",",02","02"),
                 change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3))

如果每个 id 的行之间的变化大于 change_total 的 25% (change_greater_than_25_percent)，然后如果那些“是”大于每行的 25%，我该如何计算ID。看起来像这样

dt2 <- data.table(id = c("01","01","01","01", "01","02","02","02","02","02"),
                 change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3),
                 change_greater_than_25_percent = c("no","yes","no","no","no","no","yes","no","no","no"),
                 change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id = c("no","yes","yes","yes","no","no","yes","yes","yes","no"))

Answer 1

我能够重现第 change_greater_than_25_percent 列的预期结果：

library(data.table)
library(magrittr) # piping used to improve readability
dt1[, change_greater_than_25_percent := 
      (change_total / shift(change_total, fill = Inf) - 1. > 0.25) %>%  
      fifelse("yes", "no"), by = id][]

    id change_total change_greater_than_25_percent
 1: 01          0.0                             no
 2: 01          0.9                            yes
 3: 01         -0.1                             no
 4: 01          0.8                             no
 5: 01          0.3                             no
 6: 02          0.0                             no
 7: 02          0.9                            yes
 8: 02         -0.1                             no
 9: 02          0.8                             no
10: 02          0.3                             no

不幸的是，我不明白列 change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id

的确切定义

and then if those "yes" are greater than 25 percent by row per id

编辑 1

如果我理解正确，第二列表示 change_greater_than_25_percent 的 累积平均值 的位置（取为 [=18 的 0、1 值） =], "yes") 大于 0.25.

如果将 "no" 和 "yes" 分别替换为逻辑值 FALSE 和 TRUE，这将更容易计算，因为根据 help("TRUE")

Logical vectors are coerced to integer vectors in contexts where a numerical value is required, with TRUE being mapped to 1L, FALSE to 0L and NA to NA_integer_.

library(data.table)
dt1[, change_greater_than_25_percent := 
      change_total / shift(change_total, fill = Inf) - 1. > 0.25, by = id][
        , change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id := 
          cumsum(change_greater_than_25_percent) / (1:.N) >= 0.25, by = id][]

    id change_total change_greater_than_25_percent change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id
 1: 01          0.0                          FALSE                                                                  FALSE
 2: 01          0.9                           TRUE                                                                   TRUE
 3: 01         -0.1                          FALSE                                                                   TRUE
 4: 01          0.8                          FALSE                                                                   TRUE
 5: 01          0.3                          FALSE                                                                  FALSE
 6: 02          0.0                          FALSE                                                                  FALSE
 7: 02          0.9                           TRUE                                                                   TRUE
 8: 02         -0.1                          FALSE                                                                   TRUE
 9: 02          0.8                          FALSE                                                                   TRUE
10: 02          0.3                          FALSE                                                                  FALSE

备注

为了重现预期结果 大于或等于 25% (>= 0.25) 必须使用而不是 大于 25百分比 (> 0.25)。我没有修改已经很长的列名。
使用逻辑值代替字符简化了计算，因此我们不需要magrittr管道.
data.table 使用链接。
为了计算累积平均值，dplyr::cummean()或自适应 frollmean()可以交替使用。

编辑 2

以上代码要求 dt1 为 class data.table。如需确认，请致电

setDT(dt1)

事先。

计算变化是否大于 0.25

Calculate if the change is greater than .25

r

data.table

编辑 1

备注

编辑 2