`mutate()` 问题...使用的第一个元素

Question

我有一个这样定义的矢量化函数：

hai_hyperbolic_vec <- function(.x, .scale_type = c("sin","cos","tan","sincos")){

    scale_type = base::as.character(.scale_type)
    term       = .x

    if (scale_type == "sin"){
        ret <- base::sin(term)
    } else if (scale_type == "cos") {
        ret <- base::cos(term)
    } else if (scale_type == "tan") {
        ret <- base::tan(term)
    } else if (scale_type == "sincos") {
        ret <- base::sin(term) * base::cos(term)
    }

    return(ret)

}

这很好用。

library(tidyverse)

len_out    = 10
by_unit    = "month"
start_date = as.Date("2021-01-01")

data_tbl <- tibble(
    date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
    a    = rnorm(len_out),
    b    = runif(len_out)
)

hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin")
> hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin")
 [1] 0.02405150 0.40920185 0.39953987 0.16234068 0.04183186 0.57301045 0.74441929 0.60728533
 [9] 0.69755824 0.46611496

我有另一个函数可以扩充 data.frame/tibble。

hai_hyperbolic_augment <- function(.data
                                   , .value
                                   , .names = "auto"
                                   , .scale_type = c("sin","cos","tan","sincos")
){

    column_expr <- rlang::enquo(.value)

    if(rlang::quo_is_missing(column_expr)) stop(call. = FALSE, "hyperbolic_augment(.value) is missing.")

    col_nms <- names(tidyselect::eval_select(rlang::enquo(.value), .data))

    make_call <- function(col, scale_type){
        rlang::call2(
            "hai_hyperbolic_vec",
            .x            = rlang::sym(col)
            , .scale_type = .scale_type
            , .ns         = "healthyR.ai"
        )
    }

    grid <- expand.grid(
        col                = col_nms
        , scale_type       = .scale_type
        , stringsAsFactors = FALSE
    )

    calls <- purrr::pmap(.l = list(grid$col, grid$scale_type), make_call)

    if(any(.names == "auto")) {
        newname <- paste0(grid$col, "_", grid$scale_type)
    } else {
        newname <- as.list(.names)
    }

    calls <- purrr::set_names(calls, newname)

    ret <- tibble::as_tibble(dplyr::mutate(.data, !!!calls))

    return(ret)

}

该函数有效，但如果我选择多个 .scale_type，即使执行了计算，也会发出警告消息。我不明白为什么会发生这种情况，因为向量函数是通过 purrr 应用于列表的。我可以消除这个警告，还是有更好的方法来编写 function/or 使用函数，这样就不会发生这种情况？严格执行一个 scale_type 被调用？我更喜欢能够像我一样调用，因为在增强函数内部制作了一个网格。

> hai_hyperbolic_augment(.data = data_tbl, .value = c(a,b), .scale_type = c("sin","tan"))
# A tibble: 10 x 7
   date_col         a      b   a_sin  b_sin   a_tan  b_tan
   <date>       <dbl>  <dbl>   <dbl>  <dbl>   <dbl>  <dbl>
 1 2021-01-01 -1.96   0.0241 -0.925  0.0241 -0.925  0.0241
 2 2021-02-01 -1.03   0.422  -0.856  0.409  -0.856  0.409 
 3 2021-03-01  1.55   0.411   1.00   0.400   1.00   0.400 
 4 2021-04-01  0.108  0.163   0.108  0.162   0.108  0.162 
 5 2021-05-01 -0.627  0.0418 -0.587  0.0418 -0.587  0.0418
 6 2021-06-01 -0.556  0.610  -0.528  0.573  -0.528  0.573 
 7 2021-07-01 -0.0544 0.840  -0.0544 0.744  -0.0544 0.744 
 8 2021-08-01 -0.714  0.653  -0.655  0.607  -0.655  0.607 
 9 2021-09-01 -0.646  0.772  -0.602  0.698  -0.602  0.698 
10 2021-10-01 -1.06   0.485  -0.873  0.466  -0.873  0.466 
Warning messages:
1: Problem with `mutate()` column `a_sin`.
i `a_sin = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used 
2: Problem with `mutate()` column `b_sin`.
i `b_sin = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used 
3: Problem with `mutate()` column `a_tan`.
i `a_tan = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used 
4: Problem with `mutate()` column `b_tan`.
i `b_tan = healthyR.ai::hai_hyperbolic_vec(...)`.
i the condition has length > 1 and only the first element will be used

Answer 1

问题来了，因为你将两个参数传递给 hai_hyperbolic_vec 的 .scale_type 参数。

如果您进入调试器并查看由行 calls <- purrr::set_names(calls, newname) 创建的 calls 对象，您将看到：

calls
#> $a_sin
healthyR.ai::hai_hyperbolic_vec(.x = a, .scale_type = c("sin", "tan"))
#> 
#> $b_sin
#> healthyR.ai::hai_hyperbolic_vec(.x = b, .scale_type = c("sin", "tan"))
#> 
#> $a_tan
#> healthyR.ai::hai_hyperbolic_vec(.x = a, .scale_type = c("sin", "tan"))
#> 
#> $b_tan
#> healthyR.ai::hai_hyperbolic_vec(.x = b, .scale_type = c("sin", "tan"))

但是在 hai_hyperbolic_vec 函数内部，我们看到了行 if (scale_type == "sin")。因此，在 call 对象中如上所示的每次调用，您都将一个长度为二的向量传递给此逻辑测试。它只会检查向量的第一个成员，并发出警告说它已经这样做了。

您会注意到您的输出实际上也是错误的 - a_tan 和 b_tan 列与 a_sin 和 b_sin 列相同，因为逻辑表示仅计算 sin。

我认为这是由于函数 make_call 中的错字（单次添加句点）造成的，您在本应使用 scale_type 时不小心使用了 .scale_type：

    make_call <- function(col, scale_type){
        rlang::call2(
            "hai_hyperbolic_vec",
            .x            = rlang::sym(col)
            , .scale_type = .scale_type # <- here is the problem
            , .ns         = "healthyR.ai"
        )
    }

应该是

    make_call <- function(col, scale_type){
        rlang::call2(
            "hai_hyperbolic_vec",
            .x            = rlang::sym(col)
            , .scale_type = scale_type
            , .ns         = "healthyR.ai"
        )
    }

如果您进行此更改，您将不会收到任何警告以及正确的结果：

hai_hyperbolic_augment(.data = data_tbl, .value = c(a,b), .scale_type = c("sin","tan"))
#> # A tibble: 10 x 7
#>    date_col         a     b   a_sin b_sin   a_tan b_tan
#>    <date>       <dbl> <dbl>   <dbl> <dbl>   <dbl> <dbl>
#>  1 2021-01-01 -1.01   0.977 -0.849  0.829 -1.61   1.48 
#>  2 2021-02-01  0.424  0.719  0.411  0.658  0.451  0.875
#>  3 2021-03-01 -0.133  0.338 -0.132  0.332 -0.134  0.352
#>  4 2021-04-01  0.259  0.238  0.256  0.235  0.265  0.242
#>  5 2021-05-01  0.631  0.110  0.590  0.109  0.731  0.110
#>  6 2021-06-01 -0.0500 0.995 -0.0500 0.839 -0.0500 1.54 
#>  7 2021-07-01  0.302  0.569  0.298  0.539  0.312  0.639
#>  8 2021-08-01 -0.681  0.901 -0.629  0.784 -0.810  1.26 
#>  9 2021-09-01 -0.296  0.374 -0.292  0.365 -0.305  0.393
#> 10 2021-10-01 -0.384  0.506 -0.374  0.484 -0.404  0.554

`mutate()` 问题...使用的第一个元素

Problem with `mutate()`...first element used

r

function

purrr