如何将列作为变量或索引传递给 tidyr::separate

How should one pass columns as variables or indexes to tidyr::separate

尝试将列名作为索引或变量传递给 tidyr::separate 时出错。

设置库和数据:

library(tidyr)
library(dplyr)
x <- data.frame(col1 = 1:4,
                col2 = c("a,b,c","d,e,f","g,h,i","j,k,l"))
sep <- ","
colnameVar <- "col2"

这些工作(在 dplyr 中):

x %>% select(col2) %>% names
# [1] "col2"
x %>% select(colnameVar %>% as.name %>% eval) %>% names
# [1] "col2"
x %>% select(2) %>% names
# [1] "col2"

像这样(分开):

x %>%
 separate(col2,
 paste("col2",1:3,sep="."),
 sep = sep) %>% names
# [1] "col1"   "col2.1" "col2.2" "col2.3"

但这失败了:

x %>%
 separate(colnameVar %>% as.name %>% eval,
 paste("col2",1:3,sep="."),
 sep = sep) %>% names

Error: Invalid column specification

就像这样:

x %>%
 separate(2,
 paste("col2",1:3,sep="."),
 sep = sep) %>% names

Error: Invalid column specification

应该怎么做?

使用下划线版本的分隔,我们可以在其中传递字符串:

# colnames as a predefined string
x %>%
  separate_(colnameVar, paste("col2", 1:3, sep = "."), sep = sep) %>%
  names
# [1] "col1"   "col2.1" "col2.2" "col2.3"

# colnames as index (well almost, we are getting colname as string by index)
x %>%
  separate_(colnames(x)[2], paste("col2", 1:3, sep = "."), sep = sep) %>%
  names
# [1] "col1"   "col2.1" "col2.2" "col2.3"