在 dplyr 中每隔 x 行创建区间变量
Create interval variable every x rows in dplyr
创建每 x 行后增加 x 个单位的新变量的最有效方法是什么?比如我有一个数据框:
d <- data.frame(group_var = c('a', 'b', 'c'),
y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21))
我想创建一个以 x 开头并每 x 行增加 x 的新变量,这样我就会得到这样的数据框:
d <- data.frame(group_var = c('a', 'b', 'c'),
y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21),
z = c(5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20,5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20,5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20))
此外,当我有余数时,就像之前的数据帧一样,我希望这些与之前的组一起排序(所以 y=11 将是 z=10)。请注意,我的目标数据框保留了与原始数据框相同的行数。
我们可以根据 'y' 上的 diff
创建一个分组变量,然后用 gl
创建 'z' 并乘以 5
library(dplyr)
library(tidyr)
d1 <- d %>%
group_by(grp = cumsum(c(TRUE, diff(y) < 0))) %>%
mutate(z = as.integer(gl(n(), 5, n())) * 5,
z = replace(z, ave(z, z, FUN = length) < 5, NA)) %>%
ungroup %>%
fill(z) %>%
select(-grp)
-输出
as.data.frame(d1)
group_var y z
1 a 1 5
2 b 2 5
3 c 3 5
4 a 4 5
5 b 5 5
6 c 6 10
7 a 7 10
8 b 8 10
9 c 9 10
10 a 10 10
11 b 11 15
12 c 12 15
13 a 13 15
14 b 14 15
15 c 15 15
16 a 16 20
17 b 17 20
18 c 18 20
19 a 19 20
20 b 20 20
21 c 21 20
22 a 1 5
23 b 2 5
24 c 3 5
25 a 4 5
26 b 5 5
27 c 6 10
28 a 7 10
29 b 8 10
30 c 9 10
31 a 10 10
32 b 11 15
33 c 12 15
34 a 13 15
35 b 14 15
36 c 15 15
37 a 16 20
38 b 17 20
39 c 18 20
40 a 19 20
41 b 20 20
42 c 21 20
43 a 1 5
44 b 2 5
45 c 3 5
46 a 4 5
47 b 5 5
48 c 6 10
49 a 7 10
50 b 8 10
51 c 9 10
52 a 10 10
53 b 11 15
54 c 12 15
55 a 13 15
56 b 14 15
57 c 15 15
58 a 16 20
59 b 17 20
60 c 18 20
61 a 19 20
62 b 20 20
63 c 21 20
创建每 x 行后增加 x 个单位的新变量的最有效方法是什么?比如我有一个数据框:
d <- data.frame(group_var = c('a', 'b', 'c'),
y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21))
我想创建一个以 x 开头并每 x 行增加 x 的新变量,这样我就会得到这样的数据框:
d <- data.frame(group_var = c('a', 'b', 'c'),
y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21),
z = c(5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20,5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20,5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20))
此外,当我有余数时,就像之前的数据帧一样,我希望这些与之前的组一起排序(所以 y=11 将是 z=10)。请注意,我的目标数据框保留了与原始数据框相同的行数。
我们可以根据 'y' 上的 diff
创建一个分组变量,然后用 gl
创建 'z' 并乘以 5
library(dplyr)
library(tidyr)
d1 <- d %>%
group_by(grp = cumsum(c(TRUE, diff(y) < 0))) %>%
mutate(z = as.integer(gl(n(), 5, n())) * 5,
z = replace(z, ave(z, z, FUN = length) < 5, NA)) %>%
ungroup %>%
fill(z) %>%
select(-grp)
-输出
as.data.frame(d1)
group_var y z
1 a 1 5
2 b 2 5
3 c 3 5
4 a 4 5
5 b 5 5
6 c 6 10
7 a 7 10
8 b 8 10
9 c 9 10
10 a 10 10
11 b 11 15
12 c 12 15
13 a 13 15
14 b 14 15
15 c 15 15
16 a 16 20
17 b 17 20
18 c 18 20
19 a 19 20
20 b 20 20
21 c 21 20
22 a 1 5
23 b 2 5
24 c 3 5
25 a 4 5
26 b 5 5
27 c 6 10
28 a 7 10
29 b 8 10
30 c 9 10
31 a 10 10
32 b 11 15
33 c 12 15
34 a 13 15
35 b 14 15
36 c 15 15
37 a 16 20
38 b 17 20
39 c 18 20
40 a 19 20
41 b 20 20
42 c 21 20
43 a 1 5
44 b 2 5
45 c 3 5
46 a 4 5
47 b 5 5
48 c 6 10
49 a 7 10
50 b 8 10
51 c 9 10
52 a 10 10
53 b 11 15
54 c 12 15
55 a 13 15
56 b 14 15
57 c 15 15
58 a 16 20
59 b 17 20
60 c 18 20
61 a 19 20
62 b 20 20
63 c 21 20