使用 name_repair 以编程方式处理重复的列

Question

我正在导入一个电子表格，其中我有一个列标题最初是什么的已知向量。当 read_excel 导入数据时，它正确地抱怨重复的列并重命名它们以区分它们。这是很好的行为。我的问题是我如何 select （从重复的列中）第一次出现该重复的列，删除所有其他重复的列，然后将该列重命名回原始名称。我有一个工作脚本，但它看起来很笨重。我总是很难在管道中以编程方式操作列 headers。

library(readxl)
library(dplyr, warn.conflicts = FALSE)

cols_names <- c("Sepal.Length", "Sepal.Length", "Petal.Length", "Petal.Length", "Species")

datasets <- readxl_example("datasets.xlsx")

d <- read_excel(datasets, col_names = cols_names, skip = 1)
#> New names:
#> * Sepal.Length -> Sepal.Length...1
#> * Sepal.Length -> Sepal.Length...2
#> * Petal.Length -> Petal.Length...3
#> * Petal.Length -> Petal.Length...4


d_sub <- d %>% 
  select(!which(duplicated(cols_names)))

new_col_names <- gsub("\.\.\..*","", colnames(d_sub))

colnames(d_sub) <- new_col_names

d_sub
#> # A tibble: 150 x 3
#>    Sepal.Length Petal.Length Species
#>           <dbl>        <dbl> <chr>  
#>  1          5.1          1.4 setosa 
#>  2          4.9          1.4 setosa 
#>  3          4.7          1.3 setosa 
#>  4          4.6          1.5 setosa 
#>  5          5            1.4 setosa 
#>  6          5.4          1.7 setosa 
#>  7          4.6          1.4 setosa 
#>  8          5            1.5 setosa 
#>  9          4.4          1.4 setosa 
#> 10          4.9          1.5 setosa 
#> # ... with 140 more rows

^{由 reprex package (v0.3.0)}

于 2020-04-08 创建

知道如何以更简化的方式执行此操作吗？

Answer 1

根据@rawr 的评论，这是我看到的答案：

library(readxl)
library(dplyr, warn.conflicts = FALSE)

datasets <- readxl_example("datasets.xlsx")
cols_names <- c("Sepal.Length", "Sepal.Length", "Petal.Length", "Petal.Length", "Species")

d <- read_excel(datasets, col_names = cols_names, skip = 1, .name_repair = make.unique) %>% 
  select(all_of(cols_names))
#> New names:
#> * Sepal.Length -> Sepal.Length.1
#> * Petal.Length -> Petal.Length.1

^{由 reprex package (v0.3.0)}

于 2020-04-08 创建

使用 name_repair 以编程方式处理重复的列

Programmatically deal with duplicated columns using name_repair

r

dplyr

tibble