rlang:通过...传递多个组来收集()
rlang: pass multiple groups with ... to gather()
假设我想计算自定义函数中任意数量的组的 mean
、min
和 max
。
玩具数据是这样的:
library(tidyverse)
df <- tibble(
Gender = c("m", "f", "f", "m", "m",
"f", "f", "f", "m", "f"),
IQ = rnorm(10, 100, 15),
Other = runif(10),
Test = rnorm(10),
group2 = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B")
)
要为两个组(性别,组 2)实现此目的,我可以使用
df %>%
gather(Variable, Value, -c(Gender, group2)) %>%
group_by(Gender, group2, Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
可以与来自 rlang
和
的新 curly-curly
运算符集成
descriptive_by <- function(data, group1, group2) {
data %>%
gather(Variable, Value, -c({{ group1 }}, {{ group2 }})) %>%
group_by({{ group1 }}, {{ group2 }}, Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
}
通常,我会假设我可以用 ...
替换指定的组,但它似乎并不像那样工作
descriptive_by <- function(data, ...) {
data %>%
gather(Variable, Value, -c(...)) %>%
group_by(..., Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
}
因为它 returns 错误
Error in map_lgl(.x, .p, ...) : object 'Gender' not found
这是一种可能的解决方案,其中 ...
直接传递给 group_by
,而 gather
只收集数字列(因为我认为它永远不应该收集non-numeric 列独立于输入 ...
)。
library(tidyverse)
set.seed(1)
## data
df <- tibble(
Gender = c("m", "f", "f", "m", "m",
"f", "f", "f", "m", "f"),
IQ = rnorm(10, 100, 15),
Other = runif(10),
Test = rnorm(10),
group2 = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B")
)
## function
descriptive_by <- function(data, ...) {
data %>%
gather(Variable, Value, names(select_if(., is.numeric))) %>%
group_by(..., Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
}
descriptive_by(df, Gender, group2)
#> # A tibble: 12 x 6
#> # Groups: Gender, group2 [4]
#> Gender group2 Variable mean min max
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 f A IQ 95.1 87.5 103.
#> 2 f A Other 0.432 0.212 0.652
#> 3 f A Test 0.464 -0.0162 0.944
#> 4 f B IQ 100. 87.7 111.
#> 5 f B Other 0.281 0.0134 0.386
#> 6 f B Test 0.599 0.0746 0.919
#> 7 m A IQ 106. 90.6 124.
#> 8 m A Other 0.442 0.126 0.935
#> 9 m A Test 0.457 -0.0449 0.821
#> 10 m B IQ 109. 109. 109.
#> 11 m B Other 0.870 0.870 0.870
#> 12 m B Test -1.99 -1.99 -1.99
复杂的部分是弄清楚如何否定 NSE 变量(xxx
vs -xxx
)。这是我将如何处理它的示例:
desc_by <- function(dat, ...) {
drops <- lapply(enquos(...), function(d) call("-", d))
dat %>%
gather(var, val, !!!drops) %>%
group_by(...) %>%
summarise_at(vars(val), funs(min, mean, max))
}
desc_by(head(iris), Species, Petal.Width)
# A tibble: 2 x 5
# Groups: Species [1]
Species Petal.Width min mean max
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 0.2 1.3 3.18 5.1
2 setosa 0.4 1.7 3.67 5.4
您仍然必须使用 enquos
和 !!!
才能将 -
应用于每个变量,否则 ...
可用于分组等不变.因此,您根本不需要新的 "mustache"/curly-curly 运算符。
假设我想计算自定义函数中任意数量的组的 mean
、min
和 max
。
玩具数据是这样的:
library(tidyverse)
df <- tibble(
Gender = c("m", "f", "f", "m", "m",
"f", "f", "f", "m", "f"),
IQ = rnorm(10, 100, 15),
Other = runif(10),
Test = rnorm(10),
group2 = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B")
)
要为两个组(性别,组 2)实现此目的,我可以使用
df %>%
gather(Variable, Value, -c(Gender, group2)) %>%
group_by(Gender, group2, Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
可以与来自 rlang
和
curly-curly
运算符集成
descriptive_by <- function(data, group1, group2) {
data %>%
gather(Variable, Value, -c({{ group1 }}, {{ group2 }})) %>%
group_by({{ group1 }}, {{ group2 }}, Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
}
通常,我会假设我可以用 ...
替换指定的组,但它似乎并不像那样工作
descriptive_by <- function(data, ...) {
data %>%
gather(Variable, Value, -c(...)) %>%
group_by(..., Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
}
因为它 returns 错误
Error in map_lgl(.x, .p, ...) : object 'Gender' not found
这是一种可能的解决方案,其中 ...
直接传递给 group_by
,而 gather
只收集数字列(因为我认为它永远不应该收集non-numeric 列独立于输入 ...
)。
library(tidyverse)
set.seed(1)
## data
df <- tibble(
Gender = c("m", "f", "f", "m", "m",
"f", "f", "f", "m", "f"),
IQ = rnorm(10, 100, 15),
Other = runif(10),
Test = rnorm(10),
group2 = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B")
)
## function
descriptive_by <- function(data, ...) {
data %>%
gather(Variable, Value, names(select_if(., is.numeric))) %>%
group_by(..., Variable) %>%
summarise(mean = mean(Value),
min = min(Value),
max = max(Value))
}
descriptive_by(df, Gender, group2)
#> # A tibble: 12 x 6
#> # Groups: Gender, group2 [4]
#> Gender group2 Variable mean min max
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 f A IQ 95.1 87.5 103.
#> 2 f A Other 0.432 0.212 0.652
#> 3 f A Test 0.464 -0.0162 0.944
#> 4 f B IQ 100. 87.7 111.
#> 5 f B Other 0.281 0.0134 0.386
#> 6 f B Test 0.599 0.0746 0.919
#> 7 m A IQ 106. 90.6 124.
#> 8 m A Other 0.442 0.126 0.935
#> 9 m A Test 0.457 -0.0449 0.821
#> 10 m B IQ 109. 109. 109.
#> 11 m B Other 0.870 0.870 0.870
#> 12 m B Test -1.99 -1.99 -1.99
复杂的部分是弄清楚如何否定 NSE 变量(xxx
vs -xxx
)。这是我将如何处理它的示例:
desc_by <- function(dat, ...) {
drops <- lapply(enquos(...), function(d) call("-", d))
dat %>%
gather(var, val, !!!drops) %>%
group_by(...) %>%
summarise_at(vars(val), funs(min, mean, max))
}
desc_by(head(iris), Species, Petal.Width)
# A tibble: 2 x 5 # Groups: Species [1] Species Petal.Width min mean max <fct> <dbl> <dbl> <dbl> <dbl> 1 setosa 0.2 1.3 3.18 5.1 2 setosa 0.4 1.7 3.67 5.4
您仍然必须使用 enquos
和 !!!
才能将 -
应用于每个变量,否则 ...
可用于分组等不变.因此,您根本不需要新的 "mustache"/curly-curly 运算符。