涉及滞后变量和另一列时如何使用dplyr mutate对列执行操作
How to use dplyr mutate to perform operation on a column when a lag variable and another column is involved
假设我有这样一个数据框
> dat
a b c
1 1 0.3321008 0.3321008
2 2 -0.2946729 NA
3 3 -0.1447266 NA
4 4 -0.9415429 NA
5 5 -1.0165080 NA
这里是 dput
structure(list(a = 1:5, b = c(0.332100835317822, -0.294672931641969,
-0.144726592564241, -0.941542877670977, -1.0165079846083), c = c(0.332100835317822,
NA, NA, NA, NA)), .Names = c("a", "b", "c"), row.names = c(NA,
-5L), class = "data.frame")
我想对 c
列执行操作,使得 c = lag(c)*b
(c
中的第一个元素除外
我可以使用如下所示的简单 for 循环来完成此操作
for(i in (1:4)){
dat$c[i+1] <- dat$c[i]*dat$b[i+1]
}
输出:
> dat
a b c
1 1 0.3321008 0.33210084
2 2 -0.2946729 -0.09786113
3 3 -0.1447266 0.01416311
4 4 -0.9415429 -0.01333517
5 5 -1.0165080 0.01355531
如何使用 dplyr mutate 执行此操作?或使用应用功能?
通过计算,我们可以避免迭代计算:
library(dplyr)
dat %>% mutate(c = cumprod(replace(b, 1, 1))*c[1])
假设我有这样一个数据框
> dat
a b c
1 1 0.3321008 0.3321008
2 2 -0.2946729 NA
3 3 -0.1447266 NA
4 4 -0.9415429 NA
5 5 -1.0165080 NA
这里是 dput
structure(list(a = 1:5, b = c(0.332100835317822, -0.294672931641969,
-0.144726592564241, -0.941542877670977, -1.0165079846083), c = c(0.332100835317822,
NA, NA, NA, NA)), .Names = c("a", "b", "c"), row.names = c(NA,
-5L), class = "data.frame")
我想对 c
列执行操作,使得 c = lag(c)*b
(c
我可以使用如下所示的简单 for 循环来完成此操作
for(i in (1:4)){
dat$c[i+1] <- dat$c[i]*dat$b[i+1]
}
输出:
> dat
a b c
1 1 0.3321008 0.33210084
2 2 -0.2946729 -0.09786113
3 3 -0.1447266 0.01416311
4 4 -0.9415429 -0.01333517
5 5 -1.0165080 0.01355531
如何使用 dplyr mutate 执行此操作?或使用应用功能?
通过计算,我们可以避免迭代计算:
library(dplyr)
dat %>% mutate(c = cumprod(replace(b, 1, 1))*c[1])