在 tibble 的整行中搜索字符串？

Question

我正在尝试清理来自许多不同组的示例信息 sheet，因此我关心的治疗信息可能位于任意数量的不同列中。这是一个抽象的例子：

sample_info = tribble(
  ~id, ~could_be_here, ~or_here,    ~or_even_in_this_one,
  1,   NA,             "not_me",    "find_me_other_stuff",
  2,   "Extra_Find_Me", NA,         "diff_stuff",
  3,   NA,              "Find_me",  NA,
  4,   NA,              "not_here", "not_here_either"
)

我想在哪里找到“find_me” 1) 不区分大小写，2) 它可以在任何列中，以及 3) 它可以作为更大字符串的一部分。我想为是否在任何列中找到“find_me”创建一个 TRUE 或 FALSE 列。我怎样才能做到这一点？（我想过 unite 处理所有列，然后运行一个 str_detect 在那一团糟，但一定有一个不那么 hacky 的方法，对吧？）

明确地说，我想要一个相当于 sample_info %>% mutate(find_me = c(TRUE, TRUE, TRUE, FALSE)).

的最终提示

我希望在下面链接的类似案例中使用 stringr::str_detect(., regex('find_me', ignore_case = T)) 和 pmap_lgl(any(c(...) <insert logic check>)) 之类的东西，但我不确定如何将它们组合成一个 mutate-compatible声明。

我看过的东西：

Answer 1

一个 dplyr 和 purrr 选项可以是：

sample_info %>%
 mutate(find_me = pmap_lgl(across(-id), ~ any(str_detect(c(...), regex("find_me", ignore_case = TRUE)), na.rm = TRUE)))

     id could_be_here or_here  or_even_in_this_one find_me
  <dbl> <chr>         <chr>    <chr>               <lgl>  
1     1 <NA>          not_me   find_me_other_stuff TRUE   
2     2 Extra_Find_Me <NA>     diff_stuff          TRUE   
3     3 <NA>          Find_me  <NA>                TRUE   
4     4 <NA>          not_here not_here_either     FALSE

或仅使用 dplyr:

sample_info %>%
 rowwise() %>%
 mutate(find_me = any(str_detect(c_across(-id), regex("find_me", ignore_case = TRUE)), na.rm = TRUE))

Answer 2

希望我能理解您的想法。这就是我在多个列中找到所有 find_me 的方式：

library(dplyr)
library(purrr)
library(stringr)

sample_info = tribble(
  ~id, ~could_be_here, ~or_here,    ~or_even_in_this_one,
  1,   NA,             "not_me",    "find_me_other_stuff",
  2,   "Extra_Find_Me", NA,         "diff_stuff",
  3,   NA,              "Find_me",  NA,
  4,   NA,              "not_here", "not_here_either"
)

sample_info %>%
  mutate(find_me_exist = if_any(, ~ str_detect(., regex("find_me", ignore_case = TRUE), )
                                , .names = "{.col}.fn{.fn}"))

# A tibble: 4 x 5
     id could_be_here or_here  or_even_in_this_one find_me_exist
  <dbl> <chr>         <chr>    <chr>               <lgl>        
1     1 NA            not_me   find_me             TRUE         
2     2 Extra_Find_me NA       diff_stuff          TRUE         
3     3 NA            find_Me  NA                  TRUE         
4     4 NA            not_here not_here_either     FALSE

抱歉，我不得不编辑我的代码，使其不区分大小写。

Answer 3

虽然在这种情况下使用 apply() 可能很危险，因为它会将 data.frame 合并到矩阵中，但这对我有用：

sample_info$find_me<-apply(sapply(sample_info, function(x) grepl('find_me', x, ignore.case = TRUE)), 1, any)

但我有一种感觉，每次我使用嵌套 apply/sapply/lapply 函数时，必须有更好的方法...

Answer 4

如果您确实想尝试这种 hacky 方式，您使用 unite 的想法确实可行：

 sample_info %>% unite(new, remove = FALSE) %>% 
    mutate(found = str_detect(.$new, regex("find_me", ignore_case = TRUE))) %>% 
    select(-new)

在 tibble 的整行中搜索字符串？

Search for string across entire row of a tibble?

r

stringr

dplyr

purrr

tibble