在嵌套数据(tibble 嵌套命名列表)中,我如何根据嵌套值的存在来改变某个深度级别的新列

In nested data (tibble nesting a named list), how could I mutate a new column in certain depth level based on existence of nested values

当我的数据中包含嵌套列表时,我想改变特定级别的新列,以更深的嵌套级别中存在的值为条件。具体来说,被嵌套的对象是一个命名列表,我想检查该列表的 names 以确定数据中是否存在信息。


编辑——我替换了示例中的数据


虽然@Ronak 的回答确实用最初给出的数据回答了我的问题,但我意识到我犯了一个错误,玩具数据示例没有正确反映我的数据结构。 以下是正确反映我的情况的数据。

library(tibble)

df_correct <-
  structure(
  list(
    var_name = c("age", "classes"),
    title = c("What is your age?",
              "what classes have you taken?"),
    class_descriptions = list(
      NULL,
      list(
        History = "History of Art",
        Chemistry = "Organic Chemistry",
        other = "Other Classes"
      )
    )
  ),
  row.names = c(NA,-2L),
  class = c("tbl_df",
            "tbl", "data.frame")
)

## # A tibble: 2 x 3
##   var_name title                        class_descriptions
##   <chr>    <chr>                        <list>            
## 1 age      What is your age?            <NULL>            
## 2 classes  what classes have you taken? <named list [3]>  <--- this list is what I need to check against

df_correct %>%
  unnest_wider(class_descriptions)

## # A tibble: 2 x 5
##                                    based on whether
##                                    "History" exists
##                                           ↓
##   var_name title                        History        Chemistry         other        
##   <chr>    <chr>                        <chr>          <chr>             <chr>        
## 1 age      What is your age?            NA             NA                NA           
## 2 classes  what classes have you taken? History of Art Organic Chemistry Other Classes

所以给定 df_correct,不一定要使用 unnest_wider(这只是为了显示嵌套数据的结构),我怎样才能在 df_correct 中改变一个新列考虑“历史”是否出现在 class_descriptions?

期望输出 -- 已更新

# A tibble: 2 x 4
  var_name title                        class_descriptions has_taken_history
  <chr>    <chr>                        <list>             <lgl>            
1 age      What is your age?            <NULL>             NA               
2 classes  what classes have you taken? <named list [3]>   TRUE  

更多关于所需的解决方案(而不是输出)

我提出这个问题的希望是找到一种方法在 df_correct 中添加另一列来判断字符串是否存在于命名列表 names class_descriptions.换句话说,我正在寻找一个需要 2 个输入的解决方案 only:

  1. 要搜索的内容(在此示例中,字符串 "History"
  2. 搜索位置(在此示例中,是嵌套在 df_correct 中的命名列表 class_descriptions 的名称)。

如果找到字符串,则在 df_correct 的新列中填充 TRUE,否则,填充 FALSE.

编辑 2

df_correct %>%
  mutate(has_taken_history = map_lgl(class_descriptions, 
                                     ~'History' %in% names(.x)))

# var_name title                        class_descriptions has_taken_history
#  <chr>    <chr>                        <list>             <lgl>            
#1 age      What is your age?            <NULL>             FALSE            
#2 classes  what classes have you taken? <named list [3]>   TRUE        

编辑 1

对于编辑后的数据你可以做:

library(tidyverse)

df_correct %>%
  mutate(class_descriptions1 = class_descriptions) %>%
  unnest_wider(class_descriptions) %>%
  mutate(across(History:other, ~ifelse(is.na(.), NA, TRUE))) %>%
  dplyr::select(var_name, title, class_descriptions = class_descriptions1, has_taken_history = History)

# var_name title                        class_descriptions has_taken_history
#  <chr>    <chr>                        <list>             <lgl>            
#1 age      What is your age?            <NULL>             NA               
#2 classes  what classes have you taken? <named list [3]>   TRUE       

您可以只保留输出中需要的主题。


原答案

您可以使用 map_lgl 返回一个逻辑向量:

df %>% 
  unnest_wider(info) %>%
  mutate(has_taken_history = map_lgl(classes_taken, ~"History" %in% .x), 
         has_taken_chemistry = map_lgl(classes_taken, ~"Chemistry" %in% .x))

#  student_name location      year_born classes_taken has_taken_history has_taken_chemistry
#  <chr>        <chr>             <dbl> <list>        <lgl>             <lgl>              
#1 John         San Francisco      2000 <chr [4]>     TRUE              FALSE              
#2 Sarah        Miami              2002 <chr [4]>     TRUE              TRUE         

所有主题的更通用的解决方案是 unnest 主题并以宽格式获取数据。

df %>% 
  unnest_wider(info) %>%
  unnest(classes_taken) %>%
  mutate(value = TRUE) %>%
  pivot_wider(names_from = classes_taken, values_from = value, values_fill = FALSE)
  
#  student_name location      year_born Astronomy Cosmology History Robotics Chemistry Biology Zoology
#  <chr>        <chr>             <dbl> <lgl>     <lgl>     <lgl>   <lgl>    <lgl>     <lgl>   <lgl>  
#1 John         San Francisco      2000 TRUE      TRUE      TRUE    TRUE     FALSE     FALSE   FALSE  
#2 Sarah        Miami              2002 FALSE     FALSE     TRUE    FALSE    TRUE      TRUE    TRUE