使用 dplyr 总结嵌套列表的第一个元素(二维数组?)

use dplyr to summarize first element of a nested list (2-d array?)

我正在尝试了解 dplyr 在汇总 tibble 中的嵌套列表时的适当用法。

结构如下:

> glimpse(mydata)
Rows: 1,000
Columns: 3
$ meta                 <df[,6]> <data.frame[40 x 6]>
$ independent_variable <list> [<"A", "B", "B", "B", "A", "A", "B", "A…
$ dependent_variables  <df[,4]> <data.frame[40 x 4]>


> head(mydata$independent_variable)
[[1]]
      [,1]        [,2]    [,3] [,4]
 [1,] "A" "FALSE" "5" NA  
 [2,] "B"  "FALSE" "5" "NA"
 [3,] "B"  "FALSE" "5" "NA"
 [4,] "B"  "FALSE" "5" "NA"
 [5,] "A"  "FALSE" "13" "NA"
 [6,] "A"  "FALSE" "5" "NA"
 [7,] "B"  "FALSE" "12" "NA" 
 [8,] "A"  "FALSE" "133 "NA"
 [9,] "A"  "FALSE" "131 "NA"
[10,] "A"  "TRUE"  "0"  "NA" 

[[2]]
     [,1]        [,2]    [,3] [,4] 
[1,] "A" "FALSE" "77" NA   
[2,] "B"  "FALSE" NA   "NA"
[3,] "B"  "FALSE" NA   "NA" 
[4,] "B"  "FALSE" NA   "NA" 
[5,] "B"  "FALSE" NA   "NA"
[6,] "A"  "TRUE"  "1"  "NA"

independent_variable 是 N x 4 列表的 1000 个条目(也就是说,所有 1000 个条目都有 4 列,行数不等。第一列是我目前唯一有兴趣查看的列,每个元素只能是 "A" 或 "B")。我想计算 1000 个中每个条目中 "A" 的数量,并为 1000 个条目中的每个条目取回该值。

看来我应该使用 purrr,但我不确定如何在 dplyr 中构建它

这是使用 purrr 的方法:

library(purrr)
library(dplyr)

# my example data
tmp = list(cbind(c("A","A","B"),1),cbind(c("B","A","B"),2))

# define a summary function
count_A = function(x){
  x %>%
    as.data.frame() %>% # needed as the input data is of type 'matrix'
    select(V1) %>%      # the default column name for column 1
    filter(V1 == "A") %>%
    ungroup() %>%       # unnecessary, but clear you are summarising the whole df
    summarise(num_A = n())
}

# test summary function
count_A(tmp[[1]])

# apply function to every element of list
map(tmp, count_A)

在此模式中,您的摘要函数可以是采用单个参数和 returns 所需结果的任何函数。如果该函数在应用于列表的第一个元素时工作正常(请参阅代码,我测试了我的摘要函数),那么您可以预期 map 会将函数应用于列表的每个元素。