为什么 nest_by 不复制这个典型的 group_by & nest 管道?
why doesn't nest_by replicate this typical group_by & nest pipeline?
使用group_by
然后nest
的典型方法是估计一系列模型--
library(tidyverse)
mpg %>%
group_by(
manufacturer
) %>%
nest %>%
mutate(
mods = data %>%
map(
\(i)
lm(cty ~ displ, data = i)
)
)
returns
# A tibble: 15 x 3
# Groups: manufacturer [15]
manufacturer data mods
<chr> <list> <list>
1 audi <tibble [18 x 10]> <lm>
2 chevrolet <tibble [19 x 10]> <lm>
3 dodge <tibble [37 x 10]> <lm>
4 ford <tibble [25 x 10]> <lm>
但使用 nest_by
试图简洁会导致错误:
mpg %>%
nest_by(
manufacturer
) %>%
mutate(
mods = data %>%
map(
\(i)
lm(cty ~ displ, data = i)
)
)
错误:
Error: Problem with `mutate()` column `mods`.
i `mods = data %>% map(function(i) lm(cty ~ displ, data = i))`.
x 'data' must be a data.frame, environment, or list
i The error occurred in row 1.
如何使用 nest_by
复制 group_by
和 nest
的顺序使用?
我们可以在中间添加 ungroup
作为 nest_by
returns 与 rowwise
属性冲突 map
library(dplyr)
library(purrr)
out1 <- mpg %>%
nest_by(
manufacturer
) %>%
ungroup %>%
mutate(
mods = data %>%
map(
\(i)
lm(cty ~ displ, data = i)
)
)
-输出
out1
# A tibble: 15 x 3
manufacturer data mods
<chr> <list<tibble[,10]>> <list>
1 audi [18 × 10] <lm>
2 chevrolet [19 × 10] <lm>
3 dodge [37 × 10] <lm>
4 ford [25 × 10] <lm>
5 honda [9 × 10] <lm>
6 hyundai [14 × 10] <lm>
7 jeep [8 × 10] <lm>
8 land rover [4 × 10] <lm>
9 lincoln [3 × 10] <lm>
10 mercury [4 × 10] <lm>
11 nissan [13 × 10] <lm>
12 pontiac [5 × 10] <lm>
13 subaru [14 × 10] <lm>
14 toyota [34 × 10] <lm>
15 volkswagen [27 × 10] <lm>
此外,当我们有 nest_by
时,就不需要 map
即
out2 <- mpg %>%
nest_by(
manufacturer
) %>%
mutate(mods = list(lm(cty ~ displ, data = data)))
-输出
out2
# A tibble: 15 x 3
# Rowwise: manufacturer
manufacturer data mods
<chr> <list<tibble[,10]>> <list>
1 audi [18 × 10] <lm>
2 chevrolet [19 × 10] <lm>
3 dodge [37 × 10] <lm>
4 ford [25 × 10] <lm>
5 honda [9 × 10] <lm>
6 hyundai [14 × 10] <lm>
7 jeep [8 × 10] <lm>
8 land rover [4 × 10] <lm>
9 lincoln [3 × 10] <lm>
10 mercury [4 × 10] <lm>
11 nissan [13 × 10] <lm>
12 pontiac [5 × 10] <lm>
13 subaru [14 × 10] <lm>
14 toyota [34 × 10] <lm>
15 volkswagen [27 × 10] <lm>
输出相同,除了 call
参数
out1$mods[[1]]
Call:
lm(formula = cty ~ displ, data = i)
Coefficients:
(Intercept) displ
22.066 -1.751
> out2$mods[[1]]
Call:
lm(formula = cty ~ displ, data = data)
Coefficients:
(Intercept) displ
22.066 -1.751
> all.equal(out1$mods[[1]], out2$mods[[1]], check.attributes = FALSE)
[1] "Component “call”: target, current do not match when deparsed"
使用group_by
然后nest
的典型方法是估计一系列模型--
library(tidyverse)
mpg %>%
group_by(
manufacturer
) %>%
nest %>%
mutate(
mods = data %>%
map(
\(i)
lm(cty ~ displ, data = i)
)
)
returns
# A tibble: 15 x 3
# Groups: manufacturer [15]
manufacturer data mods
<chr> <list> <list>
1 audi <tibble [18 x 10]> <lm>
2 chevrolet <tibble [19 x 10]> <lm>
3 dodge <tibble [37 x 10]> <lm>
4 ford <tibble [25 x 10]> <lm>
但使用 nest_by
试图简洁会导致错误:
mpg %>%
nest_by(
manufacturer
) %>%
mutate(
mods = data %>%
map(
\(i)
lm(cty ~ displ, data = i)
)
)
错误:
Error: Problem with `mutate()` column `mods`.
i `mods = data %>% map(function(i) lm(cty ~ displ, data = i))`.
x 'data' must be a data.frame, environment, or list
i The error occurred in row 1.
如何使用 nest_by
复制 group_by
和 nest
的顺序使用?
我们可以在中间添加 ungroup
作为 nest_by
returns 与 rowwise
属性冲突 map
library(dplyr)
library(purrr)
out1 <- mpg %>%
nest_by(
manufacturer
) %>%
ungroup %>%
mutate(
mods = data %>%
map(
\(i)
lm(cty ~ displ, data = i)
)
)
-输出
out1
# A tibble: 15 x 3
manufacturer data mods
<chr> <list<tibble[,10]>> <list>
1 audi [18 × 10] <lm>
2 chevrolet [19 × 10] <lm>
3 dodge [37 × 10] <lm>
4 ford [25 × 10] <lm>
5 honda [9 × 10] <lm>
6 hyundai [14 × 10] <lm>
7 jeep [8 × 10] <lm>
8 land rover [4 × 10] <lm>
9 lincoln [3 × 10] <lm>
10 mercury [4 × 10] <lm>
11 nissan [13 × 10] <lm>
12 pontiac [5 × 10] <lm>
13 subaru [14 × 10] <lm>
14 toyota [34 × 10] <lm>
15 volkswagen [27 × 10] <lm>
此外,当我们有 nest_by
时,就不需要 map
即
out2 <- mpg %>%
nest_by(
manufacturer
) %>%
mutate(mods = list(lm(cty ~ displ, data = data)))
-输出
out2
# A tibble: 15 x 3
# Rowwise: manufacturer
manufacturer data mods
<chr> <list<tibble[,10]>> <list>
1 audi [18 × 10] <lm>
2 chevrolet [19 × 10] <lm>
3 dodge [37 × 10] <lm>
4 ford [25 × 10] <lm>
5 honda [9 × 10] <lm>
6 hyundai [14 × 10] <lm>
7 jeep [8 × 10] <lm>
8 land rover [4 × 10] <lm>
9 lincoln [3 × 10] <lm>
10 mercury [4 × 10] <lm>
11 nissan [13 × 10] <lm>
12 pontiac [5 × 10] <lm>
13 subaru [14 × 10] <lm>
14 toyota [34 × 10] <lm>
15 volkswagen [27 × 10] <lm>
输出相同,除了 call
参数
out1$mods[[1]]
Call:
lm(formula = cty ~ displ, data = i)
Coefficients:
(Intercept) displ
22.066 -1.751
> out2$mods[[1]]
Call:
lm(formula = cty ~ displ, data = data)
Coefficients:
(Intercept) displ
22.066 -1.751
> all.equal(out1$mods[[1]], out2$mods[[1]], check.attributes = FALSE)
[1] "Component “call”: target, current do not match when deparsed"