如何从列表列中的列表中提取最大值并将其添加为 tibble 中的新列?

How do I extract the maximum value from a list in a list col and add it as a new column in tibble?

我有一个调查问卷答复的数据集,我想确定对一系列项目给出相同答复的受访者。使用 base::rle 我在新列表列中得到 运行 长度;我想为每个案例提取最大 运行 长度并将这些值添加为新列。

library(tidyverse)
x <- tribble(
    ~x1, ~x2, ~x3, ~x4, ~x5, ~x6,
    1,  1, 1, 1, 1, 1, 
    3, 3, 3, 2, 5, 3,
    3, 3, 3, 3, 3, 3,
    4, 4, 5, 5, 5, 5 )
# Add list col of runs
x <- x %>% 
    rowwise() %>% 
    mutate(runs = list(base::rle(c(x1, x2, x3, x4, x5, x6))))
# The list col is a list with 2 elements, 'lengths' and 'values'
str(x$runs[1])
#> List of 1
#>  $ :List of 2
#>   ..$ lengths: int 6
#>   ..$ values : num 1
#>   ..- attr(*, "class")= chr "rle"
# I can obtain max values of "lengths" for each row
map_int(map(x$runs, "lengths"), max)
#> [1] 6 3 6 4
# But I can't work out how to use 'mutate' to create a new variable containing 
# the maximum for each case. I tried the following but it doesn't work.
x <- x %>% 
    rowwise() %>% 
    mutate(run_max = map_int(map(x$runs, "lengths"), max))
#> Error: Problem with `mutate()` column `run_max`.
#> i `run_max = map_int(map(x$runs, "lengths"), max)`.
#> i `run_max` must be size 1, not 4.
#> i Did you mean: `run_max = list(map_int(map(x$runs, "lengths"), max))` ?
#> i The error occurred in row 1.

reprex package (v2.0.1)

于 2021-09-17 创建

我们需要ungroup

library(dplyr)
library(purrr)
x %>% 
    rowwise() %>% ungroup %>%
    mutate(run_max = map_int(map(runs, "lengths"), max))
# A tibble: 4 x 8
     x1    x2    x3    x4    x5    x6 runs   run_max
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>   <int>
1     1     1     1     1     1     1 <rle>        6
2     3     3     3     2     5     3 <rle>        3
3     3     3     3     3     3     3 <rle>        6
4     4     4     5     5     5     5 <rle>        4

或者如果打算用 map

循环,则根本不需要 rowwise 分组
x %>% 
    ungroup %>%
    mutate(run_max = map_int(map(runs, "lengths"), max))

当我们使用rowwise时,不需要map提取

x %>%
    rowwise %>% 
    mutate(run_max = max(runs$lengths)) %>%
    ungroup
# A tibble: 4 x 8
     x1    x2    x3    x4    x5    x6 runs   run_max
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>   <int>
1     1     1     1     1     1     1 <rle>        6
2     3     3     3     2     5     3 <rle>        3
3     3     3     3     3     3     3 <rle>        6
4     4     4     5     5     5     5 <rle>        4