在 R 中转换、展平、取消列出具有多种类型的数据框

Transform, Flatten, Unlist a data frame with multiple several types in R

我有一个包含多个列的 data.frame。每列都是不同的“class”。例如:

第 1 列: 是一个列表,“id”,包含 7803 个元素。

第 2 列:“位置”是字符(7803 行,每行都是一个字符)。

第 3 列: 是一个列表,“等位基因”,包含 7803 个元素。

第4列:是列表的列表,“clinical_significance”有7803个元素,其中每个里面可能有1到3个元素。

这是它的外观示例:

这是 dput() 的一个小子集:

structure(list(id = list("rs1585931494", "rs1253996056", "rs368528867", 
    "rs397507487", "rs1291775716", "rs1205853831", "rs555976452", 
    "rs727502904", "rs1481562268"), location = c("1:140734725-140734725", 
"1:140734735-140734735", "1:140734742-140734742", "1:140734743-140734743", 
"1:140734752-140734752", "1:140734755-140734755", "1:140734758-140734758", 
"1:140734763-140734763", "1:140734764-140734764"), alleles = list(
    structure(c("G", "A"), .Dim = 2:1), structure(c("C", "A"), .Dim = 2:1), 
    structure(c("C", "A", "T"), .Dim = c(3L, 1L)), structure(c("G", 
    "A"), .Dim = 2:1), structure(c("G", "C"), .Dim = 2:1), structure(c("C", 
    "A"), .Dim = 2:1), structure(c("T", "A", "C"), .Dim = c(3L, 
    1L)), structure(c("G", "A", "T"), .Dim = c(3L, 1L)), structure(c("C", 
    "A", "T"), .Dim = c(3L, 1L))), clinical_significance = list(
    list(), list(), structure("uncertain significance", .Dim = c(1L, 
    1L)), list(), list(), list(), list(), structure(c("uncertain significance", 
    "likely pathogenic"), .Dim = 2:1), structure("likely pathogenic", .Dim = c(1L, 
    1L))), consequence_type = list("missense_variant", "missense_variant", 
    "missense_variant", "missense_variant", "missense_variant", 
    "stop_gained", "missense_variant", "missense_variant", "missense_variant"), 
    gene_symbol = c("ENSG00000139618", "ENSG00000139618", "ENSG00000139618", 
    "ENSG00000139618", "ENSG00000139618", "ENSG00000139618", 
    "ENSG00000139618", "ENSG00000139618", "ENSG00000139618")), row.names = c(3544L, 
3545L, 3547L, 3548L, 3550L, 3552L, 3554L, 3556L, 3557L), class = "data.frame")

我想要一个简单的 data.frame,每个 [行,列] 有一个字符值。我在尝试取消列出 clinical_significance 列表列表时尤其遇到麻烦。因为它可能包含多个元素,所以我只想将它们折叠成一个元素,用逗号分隔。但我无法接近那个。

我尝试了以下解决方案:

do.call(rbind.data.frame, my_df)

Error in (function (..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = default.stringsAsFactors(),  : 
  invalid list argument: all variables should have the same length


# This "apparently" works but when I try to write it as table, it's an error  
    df <- dplyr::bind_rows(my_df) #or df <- purrr::map_df(my_df, dplyr::bind_rows)
    Error in write.table(df) : unimplemented type 'list' in 'EncodeElement'

感谢任何反馈或建议。

如果我误解了您的需求,我们深表歉意,但试试这个 tidyverse 解决方案:

df |> 
  as_tibble(rownames=NA) |> 
  rownames_to_column() |> 
  group_by(rowname) |> 
  summarise(across(id:gene_symbol, ~map_chr(., ~paste(., collapse=","))))

您提供的示例数据的输出:

> dat
# A tibble: 10 x 7
   rowname id           location              alleles clinical_significance consequence_type        gene_symbol    
   <chr>   <chr>        <chr>                 <chr>   <chr>                 <chr>                   <chr>          
 1 478     rs866323699  1:140721551-140721551 G,A,C   ""                    splice_acceptor_variant ENSG00000139618
 2 479     rs1365858617 1:140721572-140721572 G,A     ""                    missense_variant        ENSG00000139618
 3 481     rs955654903  1:140721574-140721574 T,C     ""                    missense_variant        ENSG00000139618
 4 482     rs1291598718 1:140721575-140721575 A,AA    ""                    stop_gained             ENSG00000139618
 5 484     rs35895841   1:140721578-140721578 C,A     ""                    missense_variant        ENSG00000139618
 6 485     rs1389663088 1:140721586-140721586 T,C     ""                    missense_variant        ENSG00000139618
 7 487     rs772872980  1:140721589-140721589 G,A     ""                    missense_variant        ENSG00000139618
 8 489     rs1239580966 1:140721598-140721598 T,C     ""                    missense_variant        ENSG00000139618
 9 490     rs1315761595 1:140721599-140721599 G,A     ""                    stop_gained             ENSG00000139618
10 491     rs1470673381 1:140721606-140721606 C,G     ""                    missense_variant        ENSG00000139618