如何在 tibble 的 list-column 中将 <NULL> 单元格重新编码为嵌套的 NA (<lgl [1]>)?
How to recode <NULL> cells to nested NA (<lgl [1]>) in a tibble's list-column?
在 list-columns 的小标题中,我如何用嵌套的 NA
替换 <NULL>
条目(这将采用 <lgl [1]>
的嵌套形式)?
library(tibble)
tbl_with_null <-
tibble(letter = letters[1:10],
value_1 = list(1, 2, 4, data.frame(a = 1, 2, 3), NULL, 6, 7, c(8, 11, 25), NULL, 10),
value_2 = list("A", "B", "C", "D", NULL, NULL, NULL, list("H", "B", list(data.frame(id = 1:3))), "I", "J"))
> tbl_with_null
## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <NULL> <NULL>
## 6 f <dbl [1]> <NULL>
## 7 g <dbl [1]> <NULL>
## 8 h <dbl [3]> <list [3]>
## 9 i <NULL> <chr [1]>
## 10 j <dbl [1]> <chr [1]>
有没有办法对整个tbl_with_null
进行操作,将<NULL>
替换为NA
得到:
## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <lgl [1]> <- NA <lgl [1]> # <- NA
## 6 f <dbl [1]> <lgl [1]> # <- NA
## 7 g <dbl [1]> <lgl [1]> # <- NA
## 8 h <dbl [3]> <list [3]>
## 9 i <lgl [1]> <- NA <chr [1]>
## 10 j <dbl [1]> <chr [1]>
更新
我在this solution的基础上取得了一些进展:
tbl_with_null %>%
mutate(across(c(value_1, value_2), ~replace(., !lengths(.), list(NA))))
## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <lgl [1]> <lgl [1]>
## 6 f <dbl [1]> <lgl [1]>
## 7 g <dbl [1]> <lgl [1]>
## 8 h <dbl [3]> <list [3]>
## 9 i <lgl [1]> <chr [1]>
## 10 j <dbl [1]> <chr [1]>
但是,这是不够的,因为我正在寻找一种解决方案,可以盲目用[=替换NULL
23=] 跨越整个数据框。如果我们使用 mutate(across(everything(), ~replace(., !lengths(.), list(NA))))
,我们会发现 letters
列也变成了 list-column,这是无意的。
## # A tibble: 10 x 3
## letter value_1 value_2
## <list> <list> <list>
## 1 <chr [1]> <dbl [1]> <chr [1]>
## 2 <chr [1]> <dbl [1]> <chr [1]>
## 3 <chr [1]> <dbl [1]> <chr [1]>
## 4 <chr [1]> <df[,3] [1 x 3]> <chr [1]>
## 5 <chr [1]> <lgl [1]> <lgl [1]>
## 6 <chr [1]> <dbl [1]> <lgl [1]>
## 7 <chr [1]> <dbl [1]> <lgl [1]>
## 8 <chr [1]> <dbl [3]> <list [3]>
## 9 <chr [1]> <lgl [1]> <chr [1]>
## 10 <chr [1]> <dbl [1]> <chr [1]>
更新 2
我以为我已经完成了
mutate(across(everything(), ~simplify(replace(., !lengths(.), list(NA)))))
但不幸的是,这在某些情况下会失败,例如此数据:
tbl_with_no_null <-
tbl_with_null %>%
slice(8) %>%
select(letter, value_1)
## # A tibble: 1 x 2
## letter value_1
## <chr> <list>
## 1 h <dbl [3]>
当我期待的时候
tbl_with_no_null %>%
mutate(across(everything(), ~simplify(replace(., !lengths(.), list(NA)))))
会return一样tbl_with_no_null
(因为没有<NULL>
来替换):
## # A tibble: 1 x 2
## letter value_1
## <chr> <list>
## 1 h <dbl [3]>
但是我得到了错误:
Error: Problem with `mutate()` input `..1`.
x Input `..1` can't be recycled to size 1.
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
i Input `..1` must be size 1, not 3.
底线
我正在寻找一种在列表列中用 NA
替换 <NULL>
的方法,当然,如果没有 <NULL>
可以替换,那么 return输入 as-is.
base::rapply
不会通过 NULL
进行递归,但是您可以使用 rrapply
来实现这一点,并且效率很高:
library(rrapply)
rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null)
# A tibble: 10 x 3
letter value_1 value_2
<chr> <list> <list>
1 a <dbl [1]> <chr [1]>
2 b <dbl [1]> <chr [1]>
3 c <dbl [1]> <chr [1]>
4 d <df[,3] [1 x 3]> <chr [1]>
5 e <lgl [1]> <lgl [1]>
6 f <dbl [1]> <lgl [1]>
7 g <dbl [1]> <lgl [1]>
8 h <dbl [3]> <list [3]>
9 i <lgl [1]> <chr [1]>
10 j <dbl [1]> <chr [1]>
或者按照@JorisC 的建议。在评论中,使用 class
参数,它在大型列表中似乎快了 25%:
rrapply(tbl_with_null, classes = "NULL", how = "replace", f = function(x) NA)
纯属娱乐:
eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null)))))
# A tibble: 10 x 3
letter value_1 value_2
<chr> <list> <list>
1 a <dbl [1]> <chr [1]>
2 b <dbl [1]> <chr [1]>
3 c <dbl [1]> <chr [1]>
4 d <df[,3] [1 x 3]> <chr [1]>
5 e <lgl [1]> <lgl [1]>
6 f <dbl [1]> <lgl [1]>
7 g <dbl [1]> <lgl [1]>
8 h <dbl [3]> <list [3]>
9 i <lgl [1]> <chr [1]>
10 j <dbl [1]> <chr [1]>
fortunes::fortune(106)
# If the answer is parse() you should usually rethink the question.
# -- Thomas Lumley
# R-help (February 2005)
速度比较令人惊讶,我原以为 parse
是最慢的解决方案:
microbenchmark::microbenchmark(
rrapply = rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null),
parse = eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null))))),
dplyr = mutate(tbl_with_null,across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA)))
Unit: microseconds
expr min lq mean median uq max neval cld
rrapply 25.401 31.801 60.92102 51.2510 58.3010 1053.502 100 a
parse 225.001 269.701 327.31600 329.1005 362.4505 687.800 100 b
dplyr 2942.501 3207.301 3604.63105 3500.0005 3766.1510 6541.402 100 c
我建议采用以下方法。
# packages
library(tibble)
library(purrr)
library(dplyr)
# data
tbl_with_null <-
tibble(
letter = letters[1:10],
value_1 = list(1, 2, 4, data.frame(a = 1, 2, 3), NULL, 6, 7, c(8, 11, 25), NULL, 10),
value_2 = list("A", "B", "C", "D", NULL, NULL, NULL, list("H", "B", list(data.frame(id = 1:3))), "I", "J")
)
# replace all NULL in list format with NA
tbl_with_null %>%
mutate(across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA))
#> # A tibble: 10 x 3
#> letter value_1 value_2
#> <chr> <list> <list>
#> 1 a <dbl [1]> <chr [1]>
#> 2 b <dbl [1]> <chr [1]>
#> 3 c <dbl [1]> <chr [1]>
#> 4 d <df[,3] [1 x 3]> <chr [1]>
#> 5 e <lgl [1]> <lgl [1]>
#> 6 f <dbl [1]> <lgl [1]>
#> 7 g <dbl [1]> <lgl [1]>
#> 8 h <dbl [3]> <list [3]>
#> 9 i <lgl [1]> <chr [1]>
#> 10 j <dbl [1]> <chr [1]>
# slice
tbl_with_null %>%
slice(8) %>%
mutate(across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA))
#> # A tibble: 1 x 3
#> letter value_1 value_2
#> <chr> <list> <list>
#> 1 h <dbl [3]> <list [3]>
由 reprex package (v1.0.0)
于 2021 年 3 月 14 日创建
查看相应功能的帮助页面了解更多详情(或在此处添加评论!)
您已经非常接近解决问题了!如果您只想替换嵌套列中的 NULL,而不是将 mutate 应用于所有内容,只需将其应用于那些使用 where(is.list)
而不是 everything()
将值键入为列表的列,如上面所示的 aglia。虽然您可以保留简化,但在我的测试中似乎没有必要。
library(tidyverse)
tbl_with_null <-
tibble(letter = letters[1:10],
value_1 = list(1, 2, 4, data.frame(a = 1, 2, 3), NULL, 6, 7, c(8, 11, 25), NULL, 10),
value_2 = list("A", "B", "C", "D", NULL, NULL, NULL, list("H", "B", list(data.frame(id = 1:3))), "I", "J"))
tbl_with_null %>%
mutate(across(where(is.list), ~replace(., !lengths(.), list(NA))))
在坚持使用 tidyverse 的同时,此解决方案比我计算机上的 agila 快一点,但如果您愿意使用额外的软件包,显然,rrapply 是更快的解决方案。
> microbenchmark::microbenchmark(
+ rrapply = rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null),
+ parse = eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null))))),
+ dplyr1 = mutate(tbl_with_null,across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA)),
+ dplyr2 = mutate(tbl_with_null, across(where(is.list), ~simplify(replace(., !lengths(.), list(NA))))),
+ dplyr3 = mutate(tbl_with_null, across(where(is.list), ~replace(., !lengths(.), list(NA))))
+ )
Unit: microseconds
expr min lq mean median uq max neval
rrapply 27.795 42.4015 49.85706 45.9475 49.935 508.133 100
parse 354.237 371.6450 400.97961 391.9885 425.434 598.792 100
dplyr1 2472.218 2526.7575 2625.90951 2578.0390 2667.312 3086.635 100
dplyr2 2270.130 2338.4955 2529.54983 2380.3345 2491.390 7513.478 100
dplyr3 2243.784 2291.5100 2525.00431 2346.0720 2439.517 7318.504 100
这里有一些基于 data.table
的解决方案,rrapply 稍快,更传统的 lapply 方法较慢:
dt <- as.data.table( tbl_with_null )
dt.worker <- function(x) {
if( identical( x, list(NULL) ) )
return(list(NA))
return(x)
}
dt[, lapply( .SD, dt.worker ), by = letter ]
rrapply( dt, function(x) NA, how = "replace", condition = is.null)
microbenchmark(
rrapply = rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null),
parse = eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null))))),
dplyr1 = mutate(tbl_with_null,across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA)),
dplyr2 = mutate(tbl_with_null, across(where(is.list), ~simplify(replace(., !lengths(.), list(NA))))),
dplyr3 = mutate(tbl_with_null, across(where(is.list), ~replace(., !lengths(.), list(NA)))),
dt.lapply = dt[, lapply( .SD, dt.worker ), by = letter ],
dt.rrapply = rrapply( dt, function(x) NA, how = "replace", condition = is.null)
)
Unit: microseconds
expr min lq mean median uq max neval cld
rrapply 22.592 28.2730 37.91673 35.0210 36.3885 460.414 100 a
parse 213.831 242.7650 255.37595 254.2365 267.8920 308.278 100 b
dplyr1 1986.615 2028.5695 2197.87663 2061.2655 2082.5410 8258.728 100 d
dplyr2 1803.212 1836.4240 1934.95871 1861.9965 1895.8655 8053.553 100 c
dplyr3 1779.537 1814.3925 1848.84501 1835.6575 1866.9810 2203.042 100 c
dt.lapply 287.349 321.2775 349.15118 338.7005 377.2070 446.948 100 b
dt.rrapply 16.962 26.1245 32.82651 29.5205 32.3605 425.738 100 a
运行 data.table
上的 dplyr::mutate
解决方案似乎比它们的 tibble 等价物稍快,但它们仍然像预期的那样慢了很多。
在 list-columns 的小标题中,我如何用嵌套的 NA
替换 <NULL>
条目(这将采用 <lgl [1]>
的嵌套形式)?
library(tibble)
tbl_with_null <-
tibble(letter = letters[1:10],
value_1 = list(1, 2, 4, data.frame(a = 1, 2, 3), NULL, 6, 7, c(8, 11, 25), NULL, 10),
value_2 = list("A", "B", "C", "D", NULL, NULL, NULL, list("H", "B", list(data.frame(id = 1:3))), "I", "J"))
> tbl_with_null
## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <NULL> <NULL>
## 6 f <dbl [1]> <NULL>
## 7 g <dbl [1]> <NULL>
## 8 h <dbl [3]> <list [3]>
## 9 i <NULL> <chr [1]>
## 10 j <dbl [1]> <chr [1]>
有没有办法对整个tbl_with_null
进行操作,将<NULL>
替换为NA
得到:
## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <lgl [1]> <- NA <lgl [1]> # <- NA
## 6 f <dbl [1]> <lgl [1]> # <- NA
## 7 g <dbl [1]> <lgl [1]> # <- NA
## 8 h <dbl [3]> <list [3]>
## 9 i <lgl [1]> <- NA <chr [1]>
## 10 j <dbl [1]> <chr [1]>
更新
我在this solution的基础上取得了一些进展:
tbl_with_null %>%
mutate(across(c(value_1, value_2), ~replace(., !lengths(.), list(NA))))
## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <lgl [1]> <lgl [1]>
## 6 f <dbl [1]> <lgl [1]>
## 7 g <dbl [1]> <lgl [1]>
## 8 h <dbl [3]> <list [3]>
## 9 i <lgl [1]> <chr [1]>
## 10 j <dbl [1]> <chr [1]>
但是,这是不够的,因为我正在寻找一种解决方案,可以盲目用[=替换NULL
23=] 跨越整个数据框。如果我们使用 mutate(across(everything(), ~replace(., !lengths(.), list(NA))))
,我们会发现 letters
列也变成了 list-column,这是无意的。
## # A tibble: 10 x 3
## letter value_1 value_2
## <list> <list> <list>
## 1 <chr [1]> <dbl [1]> <chr [1]>
## 2 <chr [1]> <dbl [1]> <chr [1]>
## 3 <chr [1]> <dbl [1]> <chr [1]>
## 4 <chr [1]> <df[,3] [1 x 3]> <chr [1]>
## 5 <chr [1]> <lgl [1]> <lgl [1]>
## 6 <chr [1]> <dbl [1]> <lgl [1]>
## 7 <chr [1]> <dbl [1]> <lgl [1]>
## 8 <chr [1]> <dbl [3]> <list [3]>
## 9 <chr [1]> <lgl [1]> <chr [1]>
## 10 <chr [1]> <dbl [1]> <chr [1]>
更新 2
我以为我已经完成了
mutate(across(everything(), ~simplify(replace(., !lengths(.), list(NA)))))
但不幸的是,这在某些情况下会失败,例如此数据:
tbl_with_no_null <-
tbl_with_null %>%
slice(8) %>%
select(letter, value_1)
## # A tibble: 1 x 2
## letter value_1
## <chr> <list>
## 1 h <dbl [3]>
当我期待的时候
tbl_with_no_null %>%
mutate(across(everything(), ~simplify(replace(., !lengths(.), list(NA)))))
会return一样tbl_with_no_null
(因为没有<NULL>
来替换):
## # A tibble: 1 x 2
## letter value_1
## <chr> <list>
## 1 h <dbl [3]>
但是我得到了错误:
Error: Problem with `mutate()` input `..1`.
x Input `..1` can't be recycled to size 1.
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
i Input `..1` must be size 1, not 3.
底线
我正在寻找一种在列表列中用 NA
替换 <NULL>
的方法,当然,如果没有 <NULL>
可以替换,那么 return输入 as-is.
base::rapply
不会通过 NULL
进行递归,但是您可以使用 rrapply
来实现这一点,并且效率很高:
library(rrapply)
rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null)
# A tibble: 10 x 3
letter value_1 value_2
<chr> <list> <list>
1 a <dbl [1]> <chr [1]>
2 b <dbl [1]> <chr [1]>
3 c <dbl [1]> <chr [1]>
4 d <df[,3] [1 x 3]> <chr [1]>
5 e <lgl [1]> <lgl [1]>
6 f <dbl [1]> <lgl [1]>
7 g <dbl [1]> <lgl [1]>
8 h <dbl [3]> <list [3]>
9 i <lgl [1]> <chr [1]>
10 j <dbl [1]> <chr [1]>
或者按照@JorisC 的建议。在评论中,使用 class
参数,它在大型列表中似乎快了 25%:
rrapply(tbl_with_null, classes = "NULL", how = "replace", f = function(x) NA)
纯属娱乐:
eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null)))))
# A tibble: 10 x 3
letter value_1 value_2
<chr> <list> <list>
1 a <dbl [1]> <chr [1]>
2 b <dbl [1]> <chr [1]>
3 c <dbl [1]> <chr [1]>
4 d <df[,3] [1 x 3]> <chr [1]>
5 e <lgl [1]> <lgl [1]>
6 f <dbl [1]> <lgl [1]>
7 g <dbl [1]> <lgl [1]>
8 h <dbl [3]> <list [3]>
9 i <lgl [1]> <chr [1]>
10 j <dbl [1]> <chr [1]>
fortunes::fortune(106)
# If the answer is parse() you should usually rethink the question.
# -- Thomas Lumley
# R-help (February 2005)
速度比较令人惊讶,我原以为 parse
是最慢的解决方案:
microbenchmark::microbenchmark(
rrapply = rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null),
parse = eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null))))),
dplyr = mutate(tbl_with_null,across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA)))
Unit: microseconds
expr min lq mean median uq max neval cld
rrapply 25.401 31.801 60.92102 51.2510 58.3010 1053.502 100 a
parse 225.001 269.701 327.31600 329.1005 362.4505 687.800 100 b
dplyr 2942.501 3207.301 3604.63105 3500.0005 3766.1510 6541.402 100 c
我建议采用以下方法。
# packages
library(tibble)
library(purrr)
library(dplyr)
# data
tbl_with_null <-
tibble(
letter = letters[1:10],
value_1 = list(1, 2, 4, data.frame(a = 1, 2, 3), NULL, 6, 7, c(8, 11, 25), NULL, 10),
value_2 = list("A", "B", "C", "D", NULL, NULL, NULL, list("H", "B", list(data.frame(id = 1:3))), "I", "J")
)
# replace all NULL in list format with NA
tbl_with_null %>%
mutate(across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA))
#> # A tibble: 10 x 3
#> letter value_1 value_2
#> <chr> <list> <list>
#> 1 a <dbl [1]> <chr [1]>
#> 2 b <dbl [1]> <chr [1]>
#> 3 c <dbl [1]> <chr [1]>
#> 4 d <df[,3] [1 x 3]> <chr [1]>
#> 5 e <lgl [1]> <lgl [1]>
#> 6 f <dbl [1]> <lgl [1]>
#> 7 g <dbl [1]> <lgl [1]>
#> 8 h <dbl [3]> <list [3]>
#> 9 i <lgl [1]> <chr [1]>
#> 10 j <dbl [1]> <chr [1]>
# slice
tbl_with_null %>%
slice(8) %>%
mutate(across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA))
#> # A tibble: 1 x 3
#> letter value_1 value_2
#> <chr> <list> <list>
#> 1 h <dbl [3]> <list [3]>
由 reprex package (v1.0.0)
于 2021 年 3 月 14 日创建查看相应功能的帮助页面了解更多详情(或在此处添加评论!)
您已经非常接近解决问题了!如果您只想替换嵌套列中的 NULL,而不是将 mutate 应用于所有内容,只需将其应用于那些使用 where(is.list)
而不是 everything()
将值键入为列表的列,如上面所示的 aglia。虽然您可以保留简化,但在我的测试中似乎没有必要。
library(tidyverse)
tbl_with_null <-
tibble(letter = letters[1:10],
value_1 = list(1, 2, 4, data.frame(a = 1, 2, 3), NULL, 6, 7, c(8, 11, 25), NULL, 10),
value_2 = list("A", "B", "C", "D", NULL, NULL, NULL, list("H", "B", list(data.frame(id = 1:3))), "I", "J"))
tbl_with_null %>%
mutate(across(where(is.list), ~replace(., !lengths(.), list(NA))))
在坚持使用 tidyverse 的同时,此解决方案比我计算机上的 agila 快一点,但如果您愿意使用额外的软件包,显然,rrapply 是更快的解决方案。
> microbenchmark::microbenchmark(
+ rrapply = rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null),
+ parse = eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null))))),
+ dplyr1 = mutate(tbl_with_null,across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA)),
+ dplyr2 = mutate(tbl_with_null, across(where(is.list), ~simplify(replace(., !lengths(.), list(NA))))),
+ dplyr3 = mutate(tbl_with_null, across(where(is.list), ~replace(., !lengths(.), list(NA))))
+ )
Unit: microseconds
expr min lq mean median uq max neval
rrapply 27.795 42.4015 49.85706 45.9475 49.935 508.133 100
parse 354.237 371.6450 400.97961 391.9885 425.434 598.792 100
dplyr1 2472.218 2526.7575 2625.90951 2578.0390 2667.312 3086.635 100
dplyr2 2270.130 2338.4955 2529.54983 2380.3345 2491.390 7513.478 100
dplyr3 2243.784 2291.5100 2525.00431 2346.0720 2439.517 7318.504 100
这里有一些基于 data.table
的解决方案,rrapply 稍快,更传统的 lapply 方法较慢:
dt <- as.data.table( tbl_with_null )
dt.worker <- function(x) {
if( identical( x, list(NULL) ) )
return(list(NA))
return(x)
}
dt[, lapply( .SD, dt.worker ), by = letter ]
rrapply( dt, function(x) NA, how = "replace", condition = is.null)
microbenchmark(
rrapply = rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null),
parse = eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null))))),
dplyr1 = mutate(tbl_with_null,across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA)),
dplyr2 = mutate(tbl_with_null, across(where(is.list), ~simplify(replace(., !lengths(.), list(NA))))),
dplyr3 = mutate(tbl_with_null, across(where(is.list), ~replace(., !lengths(.), list(NA)))),
dt.lapply = dt[, lapply( .SD, dt.worker ), by = letter ],
dt.rrapply = rrapply( dt, function(x) NA, how = "replace", condition = is.null)
)
Unit: microseconds
expr min lq mean median uq max neval cld
rrapply 22.592 28.2730 37.91673 35.0210 36.3885 460.414 100 a
parse 213.831 242.7650 255.37595 254.2365 267.8920 308.278 100 b
dplyr1 1986.615 2028.5695 2197.87663 2061.2655 2082.5410 8258.728 100 d
dplyr2 1803.212 1836.4240 1934.95871 1861.9965 1895.8655 8053.553 100 c
dplyr3 1779.537 1814.3925 1848.84501 1835.6575 1866.9810 2203.042 100 c
dt.lapply 287.349 321.2775 349.15118 338.7005 377.2070 446.948 100 b
dt.rrapply 16.962 26.1245 32.82651 29.5205 32.3605 425.738 100 a
运行 data.table
上的 dplyr::mutate
解决方案似乎比它们的 tibble 等价物稍快,但它们仍然像预期的那样慢了很多。