将嵌套列表转换为数据框:仅提取感兴趣的特定元素

Convert nested list to dataframe: extract only specific elements of interest

类似的题我看过很多,但总不能适应我的情况。我有嵌套列表形式的数据,想以某种方式将其转换为数据框。

my_data_object <-
  list(my_variables = list(
    age = list(
      type = "numeric",
      originType = "slider",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 5L,
      title = "what is your age?",
      valueDescriptions = NULL
    ),
    med_field = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 6L,
      title = "what medical branch are you at?",
      valueDescriptions = list(card = "Cardiology", ophth = "Ophthalmology",
                               derm = "Dermatology")
    ),
    covid_vaccine = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 8L,
      title = "when do you plan to get vaccinated?",
      valueDescriptions = list(
        next_mo = "No later than next month",
        within_six_mo = "No later than six months from now",
        never = "I will not get vaccinated"
      )
    )
  ))

期望的输出

  var_name      type    originType title                              
  <chr>         <chr>   <chr>      <chr>                              
1 age           numeric slider     what is your age?                  
2 med_field     string  choice     what medical branch are you at?    
3 covid_vaccine string  choice     when do you plan to get vaccinated?

我的失败尝试

library(tibble)
library(tidyr)

my_data_object %>% 
  enframe() %>% 
  unnest_longer(value) %>% 
  unnest(value)

## # A tibble: 18 x 3
##    name         value            value_id     
##    <chr>        <named list>     <chr>        
##  1 my_variables <chr [1]>        age          
##  2 my_variables <chr [1]>        age          
##  3 my_variables <named list [0]> age          
##  4 my_variables <int [1]>        age          
##  5 my_variables <chr [1]>        age          
##  6 my_variables <NULL>           age          
##  7 my_variables <chr [1]>        med_field    
##  8 my_variables <chr [1]>        med_field    
##  9 my_variables <named list [0]> med_field    
## 10 my_variables <int [1]>        med_field    
## 11 my_variables <chr [1]>        med_field    
## 12 my_variables <named list [3]> med_field    
## 13 my_variables <chr [1]>        covid_vaccine
## 14 my_variables <chr [1]>        covid_vaccine
## 15 my_variables <named list [0]> covid_vaccine
## 16 my_variables <int [1]>        covid_vaccine
## 17 my_variables <chr [1]>        covid_vaccine
## 18 my_variables <named list [3]> covid_vaccine

我正在尝试使用 tidyverse 函数来获取它,但到目前为止,我似乎没有朝着正确的方向前进。多谢指导

编辑

与我最初提供的示例数据不同,实际上我的数据的层次结构有点不同。我认为一旦我掌握了方法,这将很容易概括,但事实并非如此。因此,如果我们考虑数据如下,但实际上我只关心 my_variables 子列表。

my_data_object_2 <-
  list(
  other_variables = list(
    whatever_var_1 = list(
      type = "numeric",
      originType = "slider",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 5L,
      title = "blah question",
      valueDescriptions = NULL
    )
  ),
  my_variables = list(
    age = list(
      type = "numeric",
      originType = "slider",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 5L,
      title = "what is your age?",
      valueDescriptions = NULL
    ),
    med_field = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 6L,
      title = "what medical branch are you at?",
      valueDescriptions = list(card = "Cardiology", ophth = "Ophthalmology",
                               derm = "Dermatology")
    ),
    covid_vaccine = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 8L,
      title = "when do you plan to get vaccinated?",
      valueDescriptions = list(
        next_mo = "No later than next month",
        within_six_mo = "No later than six months from now",
        never = "I will not get vaccinated"
      )
    )
  )
)

那么我如何“放大”/“提取”my_variables 只有这样 才能得到我在上面的“期望输出”中指定的 table ?

您可以flatten对象,使用enframeunnest_wider创建新列。

library(tidyverse)

my_data_object %>% 
  flatten() %>%
  tibble::enframe() %>%
  unnest_wider(value)
  
#  name          type    originType originIndex title                               valueDescriptions
#  <chr>         <chr>   <chr>            <int> <chr>                               <list>           
#1 age           numeric slider               5 what is your age?                   <NULL>           
#2 med_field     string  choice               6 what medical branch are you at?     <named list [3]> 
#3 covid_vaccine string  choice               8 when do you plan to get vaccinated? <named list [3]> 

然后您可以删除不需要的列。


仅使用 my_data_object_2$my_variables :

my_data_object_2$my_variables %>%
  tibble::enframe() %>%
  unnest_wider(value)

遍历 my_data_object tiblifying 指定的列并使用 map_dfr 将它们放在一起(或者 fun(my_data_object$my_variables) 就足够了,这取决于一般情况)。示例数据中没有缺失字段,但如果 3 个规范字段中的任何一个缺失,则将 .default = NA 作为 lcol_chr 参数添加到该字段规范。

library(purrr)
library(tibblify)

spec <-  lcols(
  lcol_chr("type"),
  lcol_chr("originType"),
  lcol_chr("title")
)
fun <- function(x) cbind(var_name = names(x), tibblify(x, spec))

map_dfr(my_data_object, fun)

给予:

       var_name    type originType                               title
1           age numeric     slider                   what is your age?
2     med_field  string     choice     what medical branch are you at?
3 covid_vaccine  string     choice when do you plan to get vaccinated?

根据一般情况,@mgirlich 进行的这种简化(类似于此答案介绍中的备选方案)可能会奏效。 spec来自上方。

library(tibblify)

cbind(
  var_name = names(my_data_object[[1]]),
  tibblify(my_data_object[[1]], spec)
)

像往常一样对 select 特定列使用 lapply,只是 rbind 它们。

res <- do.call(rbind.data.frame, 
               lapply((my_data_object)[[1]], `[`, c("type", "originType", "title")))
res
#                  type originType                               title
# age           numeric     slider                   what is your age?
# med_field      string     choice     what medical branch are you at?
# covid_vaccine  string     choice when do you plan to get vaccinated?

如果您希望行名位于第一列,请执行:

`rownames<-`(cbind(var=rownames(res), res), NULL)
#             var    type originType                               title
# 1           age numeric     slider                   what is your age?
# 2     med_field  string     choice     what medical branch are you at?
# 3 covid_vaccine  string     choice when do you plan to get vaccinated?