purrr::map2 用于大型数据框的更有效方法
More efficient way to purrr::map2 for a large dataframe
是否有更快的方法来执行以下操作,在实际应用程序中,df
有很多行(因此 list_of_colnames
具有相同数量的元素):
list_of_colnames <- list(c("A", "B"), c("A"))
some_vector <- c("fish", "cat")
map2(split(df, seq(nrow(df))), list_of_colnames, function(row, colnames) {
row$indicator <- ifelse(any(row[, colnames] %in% some_vector), 1, 0)
return(row)
})
虽然当前的实施有效,但大型 df
需要几个世纪的时间。事实上,我认为 split()
是一个主要瓶颈。
谢谢!
一个选择可能是利用 row/column
索引
rowind <- rep(seq_len(nrow(df)), lengths(list_of_colnames) * nrow(df))
df$indicator <- +(tapply(c(t(df[unlist(list_of_colnames)])) %in% some_vector,
rowind, FUN = any))
-输出
> df
A B indicator
1 fish A 1
2 hello cat 1
数据
df <- data.frame(A = c('fish', 'hello'), B = c('A', 'cat'))
您可以避免将数据框一起拆分成一个列表,而是使用 rowwise
和 c_across
从 dplyr
:
跨行应用您的条件
library(dplyr)
library(purrr)
list_of_colnames <- list(c("A", "B"), c("A"))
some_vector <- c("fish", "cat")
map(list_of_colnames, ~
df %>%
rowwise() %>%
mutate(indicator = as.numeric(any(c_across(all_of(.x)) %in% some_vector))) %>%
ungroup()
)
输出
仍在映射 list_of_columns
returns 列表输出:
[[1]]
# A tibble: 3 x 4
A B C indicator
<chr> <chr> <chr> <lgl>
1 fish dog bird TRUE
2 dog cat bird TRUE
3 bird lion cat FALSE
[[2]]
# A tibble: 3 x 4
A B C indicator
<chr> <chr> <chr> <lgl>
1 fish dog bird TRUE
2 dog cat bird FALSE
3 bird lion cat FALSE
数据
structure(list(A = c("fish", "dog", "bird"), B = c("dog", "cat",
"lion"), C = c("bird", "bird", "cat")), class = "data.frame", row.names = c(NA,
-3L))
是否有更快的方法来执行以下操作,在实际应用程序中,df
有很多行(因此 list_of_colnames
具有相同数量的元素):
list_of_colnames <- list(c("A", "B"), c("A"))
some_vector <- c("fish", "cat")
map2(split(df, seq(nrow(df))), list_of_colnames, function(row, colnames) {
row$indicator <- ifelse(any(row[, colnames] %in% some_vector), 1, 0)
return(row)
})
虽然当前的实施有效,但大型 df
需要几个世纪的时间。事实上,我认为 split()
是一个主要瓶颈。
谢谢!
一个选择可能是利用 row/column
索引
rowind <- rep(seq_len(nrow(df)), lengths(list_of_colnames) * nrow(df))
df$indicator <- +(tapply(c(t(df[unlist(list_of_colnames)])) %in% some_vector,
rowind, FUN = any))
-输出
> df
A B indicator
1 fish A 1
2 hello cat 1
数据
df <- data.frame(A = c('fish', 'hello'), B = c('A', 'cat'))
您可以避免将数据框一起拆分成一个列表,而是使用 rowwise
和 c_across
从 dplyr
:
library(dplyr)
library(purrr)
list_of_colnames <- list(c("A", "B"), c("A"))
some_vector <- c("fish", "cat")
map(list_of_colnames, ~
df %>%
rowwise() %>%
mutate(indicator = as.numeric(any(c_across(all_of(.x)) %in% some_vector))) %>%
ungroup()
)
输出
仍在映射 list_of_columns
returns 列表输出:
[[1]]
# A tibble: 3 x 4
A B C indicator
<chr> <chr> <chr> <lgl>
1 fish dog bird TRUE
2 dog cat bird TRUE
3 bird lion cat FALSE
[[2]]
# A tibble: 3 x 4
A B C indicator
<chr> <chr> <chr> <lgl>
1 fish dog bird TRUE
2 dog cat bird FALSE
3 bird lion cat FALSE
数据
structure(list(A = c("fish", "dog", "bird"), B = c("dog", "cat",
"lion"), C = c("bird", "bird", "cat")), class = "data.frame", row.names = c(NA,
-3L))