使用另一列作为支持向后和向前填充 "missing values" (NAs)
Filling "missing values" (NAs) backward and forward using another column as support
假设我有以下数据:
input = tibble::tibble(
group = c(rep("A", 5), rep("B", 5), rep("C", 5)),
value = c(10, 15, 17, NA, NA, NA, NA, 12, 16, 13, 12, NA, 15, NA, 19),
gr = c(0.1, 0.05, 0.03, 0.02, 0.05, 0.04, 0.02, 0.6, 0.03, 0.4, 0.01, 0.09, 0.05, -0.03, 0.04)
)
看起来像这样:
> input
# A tibble: 15 x 3
group value gr
<chr> <dbl> <dbl>
1 A 10 0.1
2 A 15 0.05
3 A 17 0.03
4 A NA 0.02
5 A NA 0.05
6 B NA 0.04
7 B NA 0.02
8 B 12 0.6
9 B 16 0.03
10 B 13 0.4
11 C 12 0.01
12 C NA 0.09
13 C 15 0.05
14 C NA -0.03
15 C 19 0.04
我想使用辅助变量(在本例中为 gr
)填充每个组的缺失值。对于每个group
,填充的方式应该是不同的。比如group
A,应该往前做,即value_filled = lag(value) * (1 + gr)
。同时,对于 group
B 应该向后进行,即 value_filled = lag(value) / (1 + gr)
。对于 group
C(在这种情况下,缺失值介于两者之间),需要向前填充。
期望的输出是这样的:
desired_output = tibble::tibble(
group = c(rep("A", 5), rep("B", 5), rep("C", 5)),
value = c(10, 15, 17, NA, NA, NA, NA, 12, 16, 13, 12, NA, 15, NA, 19),
gr = c(0.1, 0.05, 0.03, 0.02, 0.05, 0.04, 0.02, 0.6, 0.03, 0.4, 0.01, 0.09, 0.05, -0.03, 0.04),
value_filled = c(10, 15, 17, 17.3, 18.2, 7.3, 7.5.7, 12, 16, 13, 12, 13, 15, 14.5, 19)
)
> desired_output
# A tibble: 15 x 4
group value gr value_filled
<chr> <dbl> <dbl> <dbl>
1 A 10 0.1 10
2 A 15 0.05 15
3 A 17 0.03 17
4 A NA 0.02 17.3
5 A NA 0.05 18.2
6 B NA 0.04 7.3
7 B NA 0.02 7.5
8 B 12 0.6 12
9 B 16 0.03 16
10 B 13 0.4 13
11 C 12 0.01 12
12 C NA 0.09 13
13 C 15 0.05 15
14 C NA -0.03 14.5
15 C 19 0.04 19
我希望这可以在 dplyr 时尚中完成。
你可以做到;
library(tidyverse)
input %>%
group_by(group) %>%
mutate(v1 = unlist(accumulate2(value, tail(gr, -1), ~if(is.na(..2)) ..1*(1+..3) else ..2)),
v1 = rev(unlist(accumulate2(rev(v1), head(rev(gr), -1), ~if(is.na(..2)) ..1/(1+..3) else ..2))))
# A tibble: 15 x 4
# Groups: group [3]
group value gr v1
<chr> <dbl> <dbl> <dbl>
1 A 10 0.1 10
2 A 15 0.05 15
3 A 17 0.03 17
4 A NA 0.02 17.3
5 A NA 0.05 18.2
6 B NA 0.04 7.35
7 B NA 0.02 7.5
8 B 12 0.6 12
9 B 16 0.03 16
10 B 13 0.4 13
11 C 12 0.01 12
12 C NA 0.09 13.1
13 C 15 0.05 15
14 C NA -0.03 14.6
15 C 19 0.04 19
假设我有以下数据:
input = tibble::tibble(
group = c(rep("A", 5), rep("B", 5), rep("C", 5)),
value = c(10, 15, 17, NA, NA, NA, NA, 12, 16, 13, 12, NA, 15, NA, 19),
gr = c(0.1, 0.05, 0.03, 0.02, 0.05, 0.04, 0.02, 0.6, 0.03, 0.4, 0.01, 0.09, 0.05, -0.03, 0.04)
)
看起来像这样:
> input
# A tibble: 15 x 3
group value gr
<chr> <dbl> <dbl>
1 A 10 0.1
2 A 15 0.05
3 A 17 0.03
4 A NA 0.02
5 A NA 0.05
6 B NA 0.04
7 B NA 0.02
8 B 12 0.6
9 B 16 0.03
10 B 13 0.4
11 C 12 0.01
12 C NA 0.09
13 C 15 0.05
14 C NA -0.03
15 C 19 0.04
我想使用辅助变量(在本例中为 gr
)填充每个组的缺失值。对于每个group
,填充的方式应该是不同的。比如group
A,应该往前做,即value_filled = lag(value) * (1 + gr)
。同时,对于 group
B 应该向后进行,即 value_filled = lag(value) / (1 + gr)
。对于 group
C(在这种情况下,缺失值介于两者之间),需要向前填充。
期望的输出是这样的:
desired_output = tibble::tibble(
group = c(rep("A", 5), rep("B", 5), rep("C", 5)),
value = c(10, 15, 17, NA, NA, NA, NA, 12, 16, 13, 12, NA, 15, NA, 19),
gr = c(0.1, 0.05, 0.03, 0.02, 0.05, 0.04, 0.02, 0.6, 0.03, 0.4, 0.01, 0.09, 0.05, -0.03, 0.04),
value_filled = c(10, 15, 17, 17.3, 18.2, 7.3, 7.5.7, 12, 16, 13, 12, 13, 15, 14.5, 19)
)
> desired_output
# A tibble: 15 x 4
group value gr value_filled
<chr> <dbl> <dbl> <dbl>
1 A 10 0.1 10
2 A 15 0.05 15
3 A 17 0.03 17
4 A NA 0.02 17.3
5 A NA 0.05 18.2
6 B NA 0.04 7.3
7 B NA 0.02 7.5
8 B 12 0.6 12
9 B 16 0.03 16
10 B 13 0.4 13
11 C 12 0.01 12
12 C NA 0.09 13
13 C 15 0.05 15
14 C NA -0.03 14.5
15 C 19 0.04 19
我希望这可以在 dplyr 时尚中完成。
你可以做到;
library(tidyverse)
input %>%
group_by(group) %>%
mutate(v1 = unlist(accumulate2(value, tail(gr, -1), ~if(is.na(..2)) ..1*(1+..3) else ..2)),
v1 = rev(unlist(accumulate2(rev(v1), head(rev(gr), -1), ~if(is.na(..2)) ..1/(1+..3) else ..2))))
# A tibble: 15 x 4
# Groups: group [3]
group value gr v1
<chr> <dbl> <dbl> <dbl>
1 A 10 0.1 10
2 A 15 0.05 15
3 A 17 0.03 17
4 A NA 0.02 17.3
5 A NA 0.05 18.2
6 B NA 0.04 7.35
7 B NA 0.02 7.5
8 B 12 0.6 12
9 B 16 0.03 16
10 B 13 0.4 13
11 C 12 0.01 12
12 C NA 0.09 13.1
13 C 15 0.05 15
14 C NA -0.03 14.6
15 C 19 0.04 19