用 R 中的字符串替换作为整数向量列表（不仅仅是单个整数）的数据框列中的整数

Question

我有一个数据框，其中有一列实际上是整数向量列表（不仅仅是单个整数）。

# make example dataframe
starting_dataframe <- 
  data.frame(first_names = c("Megan", 
                             "Abby", 
                             "Alyssa", 
                             "Alex", 
                             "Heather"))

starting_dataframe$player_indices <- 
  list(as.integer(1), 
       as.integer(c(2, 5)), 
       as.integer(3), 
       as.integer(4), 
       as.integer(c(6, 7)))

我想根据第二个索引数据框用字符串替换整数。

# make concordance dataframe
example_concord <- 
  data.frame(last_names = c("Rapinoe", 
                            "Wambach", 
                            "Naeher", 
                            "Morgan", 
                            "Dahlkemper", 
                            "Mitts", 
                            "O'Reilly"), 
              player_ids = as.integer(c(1,2,3,4,5,6,7)))

期望的结果如下所示：

# make dataframe of desired result
desired_result <- 
  data.frame(first_names = c("Megan", 
                             "Abby", 
                             "Alyssa", 
                             "Alex", 
                             "Heather"))

desired_result$player_indices <- 
  list(c("Rapinoe"), 
       c("Wambach", "Dahlkemper"), 
       c("Naeher"), 
       c("Morgan"), 
       c("Mitts", "O'Reilly"))

我这辈子都想不出怎么做，也没有在 Whosebug 上找到类似的案例。我该怎么做？我不介意特定于 dplyr 的解决方案。

Answer 1

我建议创建一个 "lookup dictionary" 类型，并且 lapply 跨越每个 ID：

example_concord_idx <- setNames(as.character(example_concord$last_names),
                                example_concord$player_ids)
example_concord_idx
#            1            2            3            4            5            6 
#    "Rapinoe"    "Wambach"     "Naeher"     "Morgan" "Dahlkemper"      "Mitts" 
#            7 
#   "O'Reilly" 

starting_dataframe$result <- 
  lapply(starting_dataframe$player_indices,
         function(a) example_concord_idx[a])
starting_dataframe
#   first_names player_indices              result
# 1       Megan              1             Rapinoe
# 2        Abby           2, 5 Wambach, Dahlkemper
# 3      Alyssa              3              Naeher
# 4        Alex              4              Morgan
# 5     Heather           6, 7     Mitts, O'Reilly

（代码高尔夫？）

Map(`[`, list(example_concord_idx), starting_dataframe$player_indices)

Answer 2

对于tidyverse爱好者，我将 by r2evans的后半部分改编为使用map()和%>%：

require(tidyverse)

starting_dataframe <- 
  starting_dataframe %>% 
  mutate(
    result = map(.x = player_indices, .f = function(a) example_concord_idx[a])
  )

但绝对不会赢得代码高尔夫！

Answer 3

另一种方法是unlist list-column，然后relist修改其内容：

df1$player_indices <- relist(df2$last_names[unlist(df1$player_indices)], df1$player_indices)
df1
#>   first_names      player_indices
#> 1       Megan             Rapinoe
#> 2        Abby Wambach, Dahlkemper
#> 3      Alyssa              Naeher
#> 4        Alex              Morgan
#> 5     Heather     Mitts, O'Reilly

数据

## initial data.frame w/ list-column
df1 <- data.frame(first_names = c("Megan", "Abby", "Alyssa", "Alex", "Heather"), stringsAsFactors = FALSE)
df1$player_indices <- list(1, c(2,5), 3, 4, c(6,7))

## lookup data.frame
df2 <- data.frame(last_names = c("Rapinoe", "Wambach", "Naeher", "Morgan", "Dahlkemper", 
        "Mitts", "O'Reilly"), stringsAsFactors = FALSE)

注意：我将 stringsAsFactors = FALSE 设置为在 data.frames 中创建字符列，但它与因子列一样有效。

用 R 中的字符串替换作为整数向量列表（不仅仅是单个整数）的数据框列中的整数

Replacing integers in a dataframe column that's a list of integer vectors (not just single integers) with character strings in R

r

nested-lists

dplyr