`mutate()` 问题...使用的第一个元素
Problem with `mutate()`...first element used
我有一个这样定义的矢量化函数:
hai_hyperbolic_vec <- function(.x, .scale_type = c("sin","cos","tan","sincos")){
scale_type = base::as.character(.scale_type)
term = .x
if (scale_type == "sin"){
ret <- base::sin(term)
} else if (scale_type == "cos") {
ret <- base::cos(term)
} else if (scale_type == "tan") {
ret <- base::tan(term)
} else if (scale_type == "sincos") {
ret <- base::sin(term) * base::cos(term)
}
return(ret)
}
这很好用。
library(tidyverse)
len_out = 10
by_unit = "month"
start_date = as.Date("2021-01-01")
data_tbl <- tibble(
date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
a = rnorm(len_out),
b = runif(len_out)
)
hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin")
> hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin")
[1] 0.02405150 0.40920185 0.39953987 0.16234068 0.04183186 0.57301045 0.74441929 0.60728533
[9] 0.69755824 0.46611496
我有另一个函数可以扩充 data.frame/tibble
。
hai_hyperbolic_augment <- function(.data
, .value
, .names = "auto"
, .scale_type = c("sin","cos","tan","sincos")
){
column_expr <- rlang::enquo(.value)
if(rlang::quo_is_missing(column_expr)) stop(call. = FALSE, "hyperbolic_augment(.value) is missing.")
col_nms <- names(tidyselect::eval_select(rlang::enquo(.value), .data))
make_call <- function(col, scale_type){
rlang::call2(
"hai_hyperbolic_vec",
.x = rlang::sym(col)
, .scale_type = .scale_type
, .ns = "healthyR.ai"
)
}
grid <- expand.grid(
col = col_nms
, scale_type = .scale_type
, stringsAsFactors = FALSE
)
calls <- purrr::pmap(.l = list(grid$col, grid$scale_type), make_call)
if(any(.names == "auto")) {
newname <- paste0(grid$col, "_", grid$scale_type)
} else {
newname <- as.list(.names)
}
calls <- purrr::set_names(calls, newname)
ret <- tibble::as_tibble(dplyr::mutate(.data, !!!calls))
return(ret)
}
该函数有效,但如果我选择多个 .scale_type
,即使执行了计算,也会发出警告消息。我不明白为什么会发生这种情况,因为向量函数是通过 purrr 应用于列表的。我可以消除这个警告,还是有更好的方法来编写 function/or 使用函数,这样就不会发生这种情况?严格执行一个 scale_type 被调用?我更喜欢能够像我一样调用,因为在增强函数内部制作了一个网格。
> hai_hyperbolic_augment(.data = data_tbl, .value = c(a,b), .scale_type = c("sin","tan"))
# A tibble: 10 x 7
date_col a b a_sin b_sin a_tan b_tan
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-01-01 -1.96 0.0241 -0.925 0.0241 -0.925 0.0241
2 2021-02-01 -1.03 0.422 -0.856 0.409 -0.856 0.409
3 2021-03-01 1.55 0.411 1.00 0.400 1.00 0.400
4 2021-04-01 0.108 0.163 0.108 0.162 0.108 0.162
5 2021-05-01 -0.627 0.0418 -0.587 0.0418 -0.587 0.0418
6 2021-06-01 -0.556 0.610 -0.528 0.573 -0.528 0.573
7 2021-07-01 -0.0544 0.840 -0.0544 0.744 -0.0544 0.744
8 2021-08-01 -0.714 0.653 -0.655 0.607 -0.655 0.607
9 2021-09-01 -0.646 0.772 -0.602 0.698 -0.602 0.698
10 2021-10-01 -1.06 0.485 -0.873 0.466 -0.873 0.466
Warning messages:
1: Problem with `mutate()` column `a_sin`.
i `a_sin = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
2: Problem with `mutate()` column `b_sin`.
i `b_sin = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
3: Problem with `mutate()` column `a_tan`.
i `a_tan = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
4: Problem with `mutate()` column `b_tan`.
i `b_tan = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
问题来了,因为你将两个参数传递给 hai_hyperbolic_vec
的 .scale_type
参数。
如果您进入调试器并查看由行 calls <- purrr::set_names(calls, newname)
创建的 calls
对象,您将看到:
calls
#> $a_sin
healthyR.ai::hai_hyperbolic_vec(.x = a, .scale_type = c("sin", "tan"))
#>
#> $b_sin
#> healthyR.ai::hai_hyperbolic_vec(.x = b, .scale_type = c("sin", "tan"))
#>
#> $a_tan
#> healthyR.ai::hai_hyperbolic_vec(.x = a, .scale_type = c("sin", "tan"))
#>
#> $b_tan
#> healthyR.ai::hai_hyperbolic_vec(.x = b, .scale_type = c("sin", "tan"))
但是在 hai_hyperbolic_vec
函数内部,我们看到了行 if (scale_type == "sin")
。因此,在 call
对象中如上所示的每次调用,您都将一个长度为二的向量传递给此逻辑测试。它只会检查向量的第一个成员,并发出警告说它已经这样做了。
您会注意到您的输出实际上也是错误的 - a_tan
和 b_tan
列与 a_sin
和 b_sin
列相同,因为逻辑表示仅计算 sin
。
我认为这是由于函数 make_call
中的错字(单次添加句点)造成的,您在本应使用 scale_type
时不小心使用了 .scale_type
:
make_call <- function(col, scale_type){
rlang::call2(
"hai_hyperbolic_vec",
.x = rlang::sym(col)
, .scale_type = .scale_type # <- here is the problem
, .ns = "healthyR.ai"
)
}
应该是
make_call <- function(col, scale_type){
rlang::call2(
"hai_hyperbolic_vec",
.x = rlang::sym(col)
, .scale_type = scale_type
, .ns = "healthyR.ai"
)
}
如果您进行此更改,您将不会收到任何警告以及正确的结果:
hai_hyperbolic_augment(.data = data_tbl, .value = c(a,b), .scale_type = c("sin","tan"))
#> # A tibble: 10 x 7
#> date_col a b a_sin b_sin a_tan b_tan
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2021-01-01 -1.01 0.977 -0.849 0.829 -1.61 1.48
#> 2 2021-02-01 0.424 0.719 0.411 0.658 0.451 0.875
#> 3 2021-03-01 -0.133 0.338 -0.132 0.332 -0.134 0.352
#> 4 2021-04-01 0.259 0.238 0.256 0.235 0.265 0.242
#> 5 2021-05-01 0.631 0.110 0.590 0.109 0.731 0.110
#> 6 2021-06-01 -0.0500 0.995 -0.0500 0.839 -0.0500 1.54
#> 7 2021-07-01 0.302 0.569 0.298 0.539 0.312 0.639
#> 8 2021-08-01 -0.681 0.901 -0.629 0.784 -0.810 1.26
#> 9 2021-09-01 -0.296 0.374 -0.292 0.365 -0.305 0.393
#> 10 2021-10-01 -0.384 0.506 -0.374 0.484 -0.404 0.554
我有一个这样定义的矢量化函数:
hai_hyperbolic_vec <- function(.x, .scale_type = c("sin","cos","tan","sincos")){
scale_type = base::as.character(.scale_type)
term = .x
if (scale_type == "sin"){
ret <- base::sin(term)
} else if (scale_type == "cos") {
ret <- base::cos(term)
} else if (scale_type == "tan") {
ret <- base::tan(term)
} else if (scale_type == "sincos") {
ret <- base::sin(term) * base::cos(term)
}
return(ret)
}
这很好用。
library(tidyverse)
len_out = 10
by_unit = "month"
start_date = as.Date("2021-01-01")
data_tbl <- tibble(
date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
a = rnorm(len_out),
b = runif(len_out)
)
hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin")
> hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin")
[1] 0.02405150 0.40920185 0.39953987 0.16234068 0.04183186 0.57301045 0.74441929 0.60728533
[9] 0.69755824 0.46611496
我有另一个函数可以扩充 data.frame/tibble
。
hai_hyperbolic_augment <- function(.data
, .value
, .names = "auto"
, .scale_type = c("sin","cos","tan","sincos")
){
column_expr <- rlang::enquo(.value)
if(rlang::quo_is_missing(column_expr)) stop(call. = FALSE, "hyperbolic_augment(.value) is missing.")
col_nms <- names(tidyselect::eval_select(rlang::enquo(.value), .data))
make_call <- function(col, scale_type){
rlang::call2(
"hai_hyperbolic_vec",
.x = rlang::sym(col)
, .scale_type = .scale_type
, .ns = "healthyR.ai"
)
}
grid <- expand.grid(
col = col_nms
, scale_type = .scale_type
, stringsAsFactors = FALSE
)
calls <- purrr::pmap(.l = list(grid$col, grid$scale_type), make_call)
if(any(.names == "auto")) {
newname <- paste0(grid$col, "_", grid$scale_type)
} else {
newname <- as.list(.names)
}
calls <- purrr::set_names(calls, newname)
ret <- tibble::as_tibble(dplyr::mutate(.data, !!!calls))
return(ret)
}
该函数有效,但如果我选择多个 .scale_type
,即使执行了计算,也会发出警告消息。我不明白为什么会发生这种情况,因为向量函数是通过 purrr 应用于列表的。我可以消除这个警告,还是有更好的方法来编写 function/or 使用函数,这样就不会发生这种情况?严格执行一个 scale_type 被调用?我更喜欢能够像我一样调用,因为在增强函数内部制作了一个网格。
> hai_hyperbolic_augment(.data = data_tbl, .value = c(a,b), .scale_type = c("sin","tan"))
# A tibble: 10 x 7
date_col a b a_sin b_sin a_tan b_tan
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-01-01 -1.96 0.0241 -0.925 0.0241 -0.925 0.0241
2 2021-02-01 -1.03 0.422 -0.856 0.409 -0.856 0.409
3 2021-03-01 1.55 0.411 1.00 0.400 1.00 0.400
4 2021-04-01 0.108 0.163 0.108 0.162 0.108 0.162
5 2021-05-01 -0.627 0.0418 -0.587 0.0418 -0.587 0.0418
6 2021-06-01 -0.556 0.610 -0.528 0.573 -0.528 0.573
7 2021-07-01 -0.0544 0.840 -0.0544 0.744 -0.0544 0.744
8 2021-08-01 -0.714 0.653 -0.655 0.607 -0.655 0.607
9 2021-09-01 -0.646 0.772 -0.602 0.698 -0.602 0.698
10 2021-10-01 -1.06 0.485 -0.873 0.466 -0.873 0.466
Warning messages:
1: Problem with `mutate()` column `a_sin`.
i `a_sin = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
2: Problem with `mutate()` column `b_sin`.
i `b_sin = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
3: Problem with `mutate()` column `a_tan`.
i `a_tan = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
4: Problem with `mutate()` column `b_tan`.
i `b_tan = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used
问题来了,因为你将两个参数传递给 hai_hyperbolic_vec
的 .scale_type
参数。
如果您进入调试器并查看由行 calls <- purrr::set_names(calls, newname)
创建的 calls
对象,您将看到:
calls
#> $a_sin
healthyR.ai::hai_hyperbolic_vec(.x = a, .scale_type = c("sin", "tan"))
#>
#> $b_sin
#> healthyR.ai::hai_hyperbolic_vec(.x = b, .scale_type = c("sin", "tan"))
#>
#> $a_tan
#> healthyR.ai::hai_hyperbolic_vec(.x = a, .scale_type = c("sin", "tan"))
#>
#> $b_tan
#> healthyR.ai::hai_hyperbolic_vec(.x = b, .scale_type = c("sin", "tan"))
但是在 hai_hyperbolic_vec
函数内部,我们看到了行 if (scale_type == "sin")
。因此,在 call
对象中如上所示的每次调用,您都将一个长度为二的向量传递给此逻辑测试。它只会检查向量的第一个成员,并发出警告说它已经这样做了。
您会注意到您的输出实际上也是错误的 - a_tan
和 b_tan
列与 a_sin
和 b_sin
列相同,因为逻辑表示仅计算 sin
。
我认为这是由于函数 make_call
中的错字(单次添加句点)造成的,您在本应使用 scale_type
时不小心使用了 .scale_type
:
make_call <- function(col, scale_type){
rlang::call2(
"hai_hyperbolic_vec",
.x = rlang::sym(col)
, .scale_type = .scale_type # <- here is the problem
, .ns = "healthyR.ai"
)
}
应该是
make_call <- function(col, scale_type){
rlang::call2(
"hai_hyperbolic_vec",
.x = rlang::sym(col)
, .scale_type = scale_type
, .ns = "healthyR.ai"
)
}
如果您进行此更改,您将不会收到任何警告以及正确的结果:
hai_hyperbolic_augment(.data = data_tbl, .value = c(a,b), .scale_type = c("sin","tan"))
#> # A tibble: 10 x 7
#> date_col a b a_sin b_sin a_tan b_tan
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2021-01-01 -1.01 0.977 -0.849 0.829 -1.61 1.48
#> 2 2021-02-01 0.424 0.719 0.411 0.658 0.451 0.875
#> 3 2021-03-01 -0.133 0.338 -0.132 0.332 -0.134 0.352
#> 4 2021-04-01 0.259 0.238 0.256 0.235 0.265 0.242
#> 5 2021-05-01 0.631 0.110 0.590 0.109 0.731 0.110
#> 6 2021-06-01 -0.0500 0.995 -0.0500 0.839 -0.0500 1.54
#> 7 2021-07-01 0.302 0.569 0.298 0.539 0.312 0.639
#> 8 2021-08-01 -0.681 0.901 -0.629 0.784 -0.810 1.26
#> 9 2021-09-01 -0.296 0.374 -0.292 0.365 -0.305 0.393
#> 10 2021-10-01 -0.384 0.506 -0.374 0.484 -0.404 0.554