如何在函数中使用 str_detect() 来创建新列?

How can I use str_detect() in a function to create new columns?

我有一个包含一列自由格式文本的数据集,例如

dat <– tribble(
  ~id, ~freeform_text,
  1, "some words to detect from",
  2, "some more words to detect"
)

我想创建一个函数,它接受指定的数据框,在列中搜索给定的字符串,然后 returns 一个新列指示是否检测到该字符串。

我想使用 tidyverse 语法来做到这一点。

这是我到目前为止尝试过的...

create_text_feature <- function(data, column, string) {
  data %>% 
    mutate("{{string}}_detected" := ifelse(str_detect({{column}}, string), 1, 0))
}

理想情况下,我会 运行 create_text_feature(dat, freeform_text, more) 最后得到以下数据集。

dat <– tribble(
  ~id, ~freeform_text, ~more_detected,
  1, "some words to detect from", 0,
  2, "some more words to detect", 1
)

如果可以创建它来获取字符串列表并以相同的方式创建多个新列,我将更加感激。

您可以通过将模式作为带引号的字符串并在赋值中使用单花括号来实现您的结果:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.2
#> Warning: package 'tibble' was built under R version 4.1.2
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
#> Warning: package 'dplyr' was built under R version 4.1.2

dat <- tribble(
  ~id, ~freeform_text,
  1, "some words to detect from",
  2, "some more words to detect"
)

create_text_feature <- function(data, column, string) {
  data %>% 
    mutate("{string}_detected" := ifelse(str_detect({{column}}, string), 1, 0))
}


create_text_feature(dat, freeform_text, "more")
#> # A tibble: 2 x 3
#>      id freeform_text             more_detected
#>   <dbl> <chr>                             <dbl>
#> 1     1 some words to detect from             0
#> 2     2 some more words to detect             1