将变量传递给 tidyr 的收集以重命名 key/value 列?

Pass variable to tidyr's gather to rename key/value columns?

我想在自定义函数中调用 tidyr::gather(),我将一对字符变量传递给该自定义函数,这些变量将用于重命名 keyvalue 列。例如

myFunc <- function(mydata, key.col, val.col) {
    new.data <- tidyr::gather(data = mydata, key = key.col, value = val.col)
    return(new.data)    
}

然而,这并没有达到预期的效果。

temp.data <- data.frame(day.1 = c(20, 22, 23), day.2 = c(32, 22, 45), day.3 = c(17, 9, 33))

# Call my custom function, renaming the key and value columns 
# "day" and "temp", respectively
long.data <- myFunc(mydata = temp.data, key.col = "day", val.col = "temp")

# Columns have *not* been renamed as desired
head(long.data)
  key.col val.col
1   day.1      20
2   day.1      22
3   day.1      23
4   day.2      32
5   day.2      22
6   day.2      45

期望的输出:

head(long.data)
    day temp
1 day.1   20
2 day.1   22
3 day.1   23
4 day.2   32
5 day.2   22
6 day.2   45

我的理解是 gather() 对大多数参数使用裸变量名(就像在这个例子中一样,使用 "key.col" 作为列名而不是 value 存储在 key.col 中)。我尝试了多种在 gather() 调用中传递值的方法,但大多数 return 错误。例如,在 myFunc return Error: Invalid column specification 中调用 gather() 的这三个变体(出于说明目的,忽略具有相同行为的 value 参数) :

gather(data = mydata, key = as.character(key.col) value = val.col)

gather(data = mydata, key = as.name(key.col) value = val.col)

gather(data = mydata, key = as.name(as.character(key.col)) value = val.col)

作为解决方法,我只是在调用 gather() 之后重命名列:

colnames(long.data)[colnames(long.data) == "key"] <- "day"

但是鉴于 gather() 声称的重命名 key/value 列的功能,我如何在自定义函数内的 gather() 调用中执行此操作?

大多数(如果不是全部)Haldey 的函数使用裸变量名作为参数(例如 dplyr 的函数)有一个 function_ 版本使用常规评估并且是 "suitable for programming with".所以,你需要的应该是:

myFunc <- function(mydata, key.col, val.col) {
  tidyr::gather_(data = mydata, key_col = key.col,
                 value_col = val.col, gather_cols = colnames(mydata))         
}

这里唯一的"catch"是必须指定gather_cols,这在使用gather时不是必需的,或者可以单独完成... .

然后:

> myFunc2(mydata = temp.data, key.col = "day", val.col = "temp")
    day temp
1 day.1   20
2 day.1   22
3 day.1   23
4 day.2   32
5 day.2   22
6 day.2   45
7 day.3   17
8 day.3    9
9 day.3   33

要将其放入函数中,您必须像这样使用 gather_()

myFunc <- function(mydata, key.col, val.col, gather.cols) {
  new.data <- gather_(data = mydata,
                      key_col = key.col,
                      value_col = val.col,
                      gather_cols = colnames(mydata)[gather.cols])
  return(new.data)    
}

temp.data <- data.frame(day.1 = c(20, 22, 23), day.2 = c(32, 22, 45),
day.3 = c(17, 9, 33))
temp.data


     day.1 day.2 day.3
1    20    32    17
2    22    22     9
3    23    45    33

# Call my custom function, renaming the key and value columns 
# "day" and "temp", respectively

long.data <- myFunc(mydata = temp.data, key.col = "day", val.col =   
"temp", gather.cols = 1:3)
# Columns *have* been renamed as desired
head(long.data)

  day temp
1 day.1   20
2 day.1   22
3 day.1   23
4 day.2   32
5 day.2   22
6 day.2   45

如前所述,主要区别在于 gather_ 您必须使用 gather_cols 参数指定要收集的列。

请注意函数的下划线版本现已弃用(至少从 tidyr 版本 0.8.2 开始)。参见,例如,?gather_

...遇到同样的问题,我现在在这里找到了答案:https://dplyr.tidyverse.org/articles/programming.html

您可以让 dplyr 评估符号,方法是用感叹号关闭它们。在您最初的问题中,代码为:

gather(data = mydata, key = !!key.col value = !!val.col)