计算变化是否大于 0.25
Calculate if the change is greater than .25
考虑这个data.table
library(data.table)
dt1 <- data.table(id = c("01","01","01","01","01", "02","02","02",",02","02"),
change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3))
如果每个 id 的行之间的变化大于 change_total
的 25% (change_greater_than_25_percent
),然后如果那些“是”大于每行的 25%,我该如何计算ID。
看起来像这样
dt2 <- data.table(id = c("01","01","01","01", "01","02","02","02","02","02"),
change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3),
change_greater_than_25_percent = c("no","yes","no","no","no","no","yes","no","no","no"),
change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id = c("no","yes","yes","yes","no","no","yes","yes","yes","no"))
我能够重现第 change_greater_than_25_percent
列的预期结果:
library(data.table)
library(magrittr) # piping used to improve readability
dt1[, change_greater_than_25_percent :=
(change_total / shift(change_total, fill = Inf) - 1. > 0.25) %>%
fifelse("yes", "no"), by = id][]
id change_total change_greater_than_25_percent
1: 01 0.0 no
2: 01 0.9 yes
3: 01 -0.1 no
4: 01 0.8 no
5: 01 0.3 no
6: 02 0.0 no
7: 02 0.9 yes
8: 02 -0.1 no
9: 02 0.8 no
10: 02 0.3 no
不幸的是,我不明白列 change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id
的确切定义
and then if those "yes" are greater than 25 percent by row per id
编辑 1
如果我理解正确,第二列表示 change_greater_than_25_percent
的 累积平均值 的位置(取为 [=18 的 0、1 值) =], "yes"
) 大于 0.25.
如果将 "no"
和 "yes"
分别替换为逻辑值 FALSE
和 TRUE
,这将更容易计算,因为根据 help("TRUE")
Logical vectors are coerced to integer vectors in contexts where a numerical value is required, with TRUE
being mapped to 1L
, FALSE
to 0L
and NA
to NA_integer_
.
library(data.table)
dt1[, change_greater_than_25_percent :=
change_total / shift(change_total, fill = Inf) - 1. > 0.25, by = id][
, change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id :=
cumsum(change_greater_than_25_percent) / (1:.N) >= 0.25, by = id][]
id change_total change_greater_than_25_percent change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id
1: 01 0.0 FALSE FALSE
2: 01 0.9 TRUE TRUE
3: 01 -0.1 FALSE TRUE
4: 01 0.8 FALSE TRUE
5: 01 0.3 FALSE FALSE
6: 02 0.0 FALSE FALSE
7: 02 0.9 TRUE TRUE
8: 02 -0.1 FALSE TRUE
9: 02 0.8 FALSE TRUE
10: 02 0.3 FALSE FALSE
备注
- 为了重现预期结果 大于或等于 25% (
>= 0.25
) 必须使用而不是 大于 25百分比 (> 0.25
)。我没有修改已经很长的列名。
- 使用逻辑值代替字符简化了计算,因此我们不需要
magrittr
管道.
data.table
使用链接。
- 为了计算累积平均值,
dplyr::cummean()
或自适应 frollmean()
可以交替使用。
编辑 2
以上代码要求 dt1
为 class data.table
。如需确认,请致电
setDT(dt1)
事先。
考虑这个data.table
library(data.table)
dt1 <- data.table(id = c("01","01","01","01","01", "02","02","02",",02","02"),
change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3))
如果每个 id 的行之间的变化大于 change_total
的 25% (change_greater_than_25_percent
),然后如果那些“是”大于每行的 25%,我该如何计算ID。
看起来像这样
dt2 <- data.table(id = c("01","01","01","01", "01","02","02","02","02","02"),
change_total = c(.00,.90,-.10,.8,.3,.00,.90,-.10,.8,.3),
change_greater_than_25_percent = c("no","yes","no","no","no","no","yes","no","no","no"),
change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id = c("no","yes","yes","yes","no","no","yes","yes","yes","no"))
我能够重现第 change_greater_than_25_percent
列的预期结果:
library(data.table)
library(magrittr) # piping used to improve readability
dt1[, change_greater_than_25_percent :=
(change_total / shift(change_total, fill = Inf) - 1. > 0.25) %>%
fifelse("yes", "no"), by = id][]
id change_total change_greater_than_25_percent 1: 01 0.0 no 2: 01 0.9 yes 3: 01 -0.1 no 4: 01 0.8 no 5: 01 0.3 no 6: 02 0.0 no 7: 02 0.9 yes 8: 02 -0.1 no 9: 02 0.8 no 10: 02 0.3 no
不幸的是,我不明白列 change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id
and then if those "yes" are greater than 25 percent by row per id
编辑 1
如果我理解正确change_greater_than_25_percent
的 累积平均值 的位置(取为 [=18 的 0、1 值) =], "yes"
) 大于 0.25.
如果将 "no"
和 "yes"
分别替换为逻辑值 FALSE
和 TRUE
,这将更容易计算,因为根据 help("TRUE")
Logical vectors are coerced to integer vectors in contexts where a numerical value is required, with
TRUE
being mapped to1L
,FALSE
to0L
andNA
toNA_integer_
.
library(data.table)
dt1[, change_greater_than_25_percent :=
change_total / shift(change_total, fill = Inf) - 1. > 0.25, by = id][
, change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id :=
cumsum(change_greater_than_25_percent) / (1:.N) >= 0.25, by = id][]
id change_total change_greater_than_25_percent change_greater_than_25_percent_greaterthan_25_percent_ofthe_time_by_id 1: 01 0.0 FALSE FALSE 2: 01 0.9 TRUE TRUE 3: 01 -0.1 FALSE TRUE 4: 01 0.8 FALSE TRUE 5: 01 0.3 FALSE FALSE 6: 02 0.0 FALSE FALSE 7: 02 0.9 TRUE TRUE 8: 02 -0.1 FALSE TRUE 9: 02 0.8 FALSE TRUE 10: 02 0.3 FALSE FALSE
备注
- 为了重现预期结果 大于或等于 25% (
>= 0.25
) 必须使用而不是 大于 25百分比 (> 0.25
)。我没有修改已经很长的列名。 - 使用逻辑值代替字符简化了计算,因此我们不需要
magrittr
管道. data.table
使用链接。- 为了计算累积平均值,
dplyr::cummean()
或自适应frollmean()
可以交替使用。
编辑 2
以上代码要求 dt1
为 class data.table
。如需确认,请致电
setDT(dt1)
事先。