根据索引列创建新列

Question

我有一个包含 n 个观测值的数据集和一个包含观测值索引的列，例如

col1 col2 col3 ID
12    0    4    1
6     5    3    1
5     21   42   2

并想根据我的索引创建一个新列，例如

col1 col2 col3 ID col_new
12    0    4    1   12
6     5    3    1   6
5     21   42   2   21

没有 for 循环。其实我在做

col_new <- rep(NA, length(ID))
for (i in 1:length(ID))
{
   col_new[i] <- df[i, ID[i]]
}

是否有更好的或 (tidyverse) 方法？

Answer 1

使用data.table的解决方案：

library(data.table)
# Using OPs data
setDT(df)
df[, col_new := get(paste0("col", ID)), 1:nrow(df)]

# df
   col1 col2 col3 ID col_new
1:   12    0    4  1      12
2:    6    5    3  1       6
3:    5   21   42  2      21

解释：

每一行：1:nrow(df)
使用ID得到相应的列：get(paste0("col", ID))
将此值写入新列：col_new :=

Answer 2

我们可以使用 base R 中的 row/column 索引，这应该非常快

df1$col_new <- df1[1:3][cbind(seq_len(nrow(df1)), df1$ID)]
df1$col_new
#[1] 12  6 21

Answer 3

对于可能的 tidyverse 方法，将 dplyr::mutate 与 purrr::map2_int 结合使用怎么样？

library(dplyr)
library(purrr)

mutate(df, new_col = map2_int(row_number(), ID, ~ df[.x, .y]))
#>   col1 col2 col3 ID new_col
#> 1   12    0    4  1      12
#> 2    6    5    3  1       6
#> 3    5   21   42  2      21

数据

df <- read.table(text = "col1 col2 col3 ID
12    0    4    1
6     5    3    1
5     21   42   2", header = TRUE)

Answer 4

另一种 tidyverse 方法，这次只使用 tidyr 和 dplyr:

df %>%
    gather(column, col_new, -ID)  %>%  
    filter(paste0('col', ID) == column) %>%
    select(col_new) %>%
    cbind(df, .)

它比@markdly 优雅的单行线更长，但如果你像我一样并且在大多数时候对 purrr 感到困惑，那么这可能会更容易阅读。

根据索引列创建新列

Create a new column based on an index column

r

dataframe

dplyr

tidyverse