dplyr::mutate() -- 在tibble嵌套列表中,如何忽略NULL嵌套列表?

dplyr::mutate() -- In a tibble nesting lists, how to ignore NULL nested lists?

偶尔,我的更高级别 tibble 中的嵌套列表是 NULL。我想在使用 dplyr::mutate().

忽略 这些列表

示例

将值重新编码为小写和下划线

数据

library(tibble)

df <-
  tibble(movies = c("The Shawshank Redemption", "The Godfather", "The Godfather: Part II", "The Dark Knight", "12 Angry Men"),
                continents = c("Asia", "Australia", "America", "Africa", "Europe"),
                michaels = c("Michael Jackson", "Michael Jordan", "Mike Tyson", "Michael Phelps", "Michael Schumacher"))

df <- add_column(df, ignore_me = list(NULL))

df

## # A tibble: 5 x 4
##   movies                   continents michaels           ignore_me
##   <chr>                    <chr>      <chr>              <list>   
## 1 The Shawshank Redemption Asia       Michael Jackson    <NULL>   
## 2 The Godfather            Australia  Michael Jordan     <NULL>   
## 3 The Godfather: Part II   America    Mike Tyson         <NULL>   
## 4 The Dark Knight          Africa     Michael Phelps     <NULL>   
## 5 12 Angry Men             Europe     Michael Schumacher <NULL> 

正在尝试重新编码值

library(dplyr) # version 1.0.2
library(snakecase)

df %>%
  mutate(across(everything(), snakecase::to_any_case))

Error: Problem with mutate() input ..1.
x argument is not a character vector
i Input ..1 is across(everything(), snakecase::to_any_case).


显然,以下任一方法都可行:

df %>% mutate(across(c(movies, continents, michaels), snakecase::to_any_case))
# or
df %>% mutate(across(-ignore_me, snakecase::to_any_case))

##   movies                   continents michaels           ignore_me
##   <chr>                    <chr>      <chr>              <list>   
## 1 the_shawshank_redemption asia       michael_jackson    <NULL>   
## 2 the_godfather            australia  michael_jordan     <NULL>   
## 3 the_godfather_part_ii    america    mike_tyson         <NULL>   
## 4 the_dark_knight          africa     michael_phelps     <NULL>   
## 5 12_angry_men             europe     michael_schumacher <NULL>  

但实际上我无法预料哪个 column/nested 列表会是 NULL 因此我需要我的代码简单地忽略这样的 NULL 但仍适用于非 NULL 列。


编辑


上面的原文df 完全忽略list 就很容易解决问题。但数据通常也可以是:

df_2 <-
  tibble(movies = c("The Shawshank Redemption", "The Godfather", "The Godfather: Part II", "The Dark Knight", "12 Angry Men"),
         continents = c("Asia", "Australia", "America", "Africa", "Europe"),
         michaels = c("Michael Jackson", "Michael Jordan", "Mike Tyson", "Michael Phelps", "Michael Schumacher"))

df_2 <- add_column(df_2, ignore_me = list(NULL))

set.seed(2021) ; df_2 <- mutate(df_2, across(sample(colnames(df_2), 1), as.list))

df_2

##   movies                   continents michaels  ignore_me
##   <chr>                    <chr>      <list>    <list>   
## 1 The Shawshank Redemption Asia       <chr [1]> <NULL>   
## 2 The Godfather            Australia  <chr [1]> <NULL>   
## 3 The Godfather: Part II   America    <chr [1]> <NULL>   
## 4 The Dark Knight          Africa     <chr [1]> <NULL>   
## 5 12 Angry Men             Europe     <chr [1]> <NULL>   

您可以忽略所有列表列:

library(dplyr)
df %>% mutate(across(where(Negate(is.list)), snakecase::to_any_case))

或者如果不是所有 list 列都将是 NULL,您可以通过检查它们的长度来专门找到具有 NULL 值的列并忽略具有长度的列作为 0.

df %>% mutate(across(where(~!all(lengths(.) == 0)), snakecase::to_any_case))


#  movies                   continents michaels           ignore_me
#  <chr>                    <chr>      <chr>              <list>   
#1 the_shawshank_redemption asia       michael_jackson    <NULL>   
#2 the_godfather            australia  michael_jordan     <NULL>   
#3 the_godfather_part_ii    america    mike_tyson         <NULL>   
#4 the_dark_knight          africa     michael_phelps     <NULL>   
#5 12_angry_men             europe     michael_schumacher <NULL>   

对于修改后的 df_2 我们可以使用:

df_2$michaels[[3]] <- c(df_2$michaels[[3]], df_2$michaels[[4]]) 

df_2 %>% 
  mutate(across(where(~all(lengths(.) > 0)), 
                ~relist(to_any_case(unlist(.)), .)))


#  movies                   continents michaels  ignore_me
#  <chr>                    <chr>      <list>    <list>   
#1 the_shawshank_redemption asia       <chr [1]> <NULL>   
#2 the_godfather            australia  <chr [1]> <NULL>   
#3 the_godfather_part_ii    america    <chr [2]> <NULL>   
#4 the_dark_knight          africa     <chr [1]> <NULL>   
#5 12_angry_men             europe     <chr [1]> <NULL>   

添加 purrr 的一个选项可以是:

df %>%
 mutate(across(where(~ !all(map_lgl(., is.null))), to_any_case))

  movies                   continents michaels           ignore_me
  <chr>                    <chr>      <chr>              <list>   
1 the_shawshank_redemption asia       michael_jackson    <NULL>   
2 the_godfather            australia  michael_jordan     <NULL>   
3 the_godfather_part_ii    america    mike_tyson         <NULL>   
4 the_dark_knight          africa     michael_phelps     <NULL>   
5 12_angry_men             europe     michael_schumacher <NULL>  

对于第二个数据集:

df_2 %>%
 mutate(across(where(~ !all(map_lgl(., is.null))), ~ to_any_case(unlist(.)))) 

  movies                   continents michaels           ignore_me
  <chr>                    <chr>      <chr>              <list>   
1 the_shawshank_redemption asia       michael_jackson    <NULL>   
2 the_godfather            australia  michael_jordan     <NULL>   
3 the_godfather_part_ii    america    mike_tyson         <NULL>   
4 the_dark_knight          africa     michael_phelps     <NULL>   
5 12_angry_men             europe     michael_schumacher <NULL>