将 regmatches 函数应用于 R 中的 chr 列表

Apply regmatches function to a list of chr in R

我将这个字符列表存储在一个名为 x:

的变量中
x <- 
  c(
    "images/logos/france2.png", 
    "images/logos/cnews.png",
    "images/logos/lcp.png", 
    "images/logos/europe1.png",
    "images/logos/rmc-bfmtv.png",
    "images/logos/sudradio.png",
    "images/logos/franceinfo.png"
  )
  
pattern <- "images/logos/\s*(.*?)\s*.png"

regmatches(x, regexec(pattern, x))[[1]][2]

我希望根据某种模式提取每个 chr 字符串的一部分,就像这个函数一样,它工作正常,但仅适用于列表中的第一项。

pattern <- "images/logos/\s*(.*?)\s*.png"

y <- regmatches(x, regexec(pattern, x))[[1]][2]

仅returns:

"france2"

如何将 regmatches 函数应用于列表中的所有项目以获得这样的结果?

[1] "france2"    "europe1"    "sudradio"  
[4] "cnews"      "rmc-bfmtv"  "franceinfo"
[7] "lcp"        "rmc"        "lcp"

仅供参考,这是来自刮板

src标签列表

可能的解决方案:

library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
  strings = c("images/logos/france2.png","images/logos/cnews.png",
              "images/logos/lcp.png","images/logos/europe1.png",
              "images/logos/rmc-bfmtv.png","images/logos/sudradio.png",
              "images/logos/franceinfo.png")
)

df %>% 
  mutate(strings = str_remove(strings, "images/logos/") %>% 
           str_remove("\.png"))

#>      strings
#> 1    france2
#> 2      cnews
#> 3        lcp
#> 4    europe1
#> 5  rmc-bfmtv
#> 6   sudradio
#> 7 franceinfo

或者更简单:

library(tidyverse)

df %>% 
  mutate(strings = str_extract(strings, "(?<=images/logos/)(.*)(?=\.png)"))

#>      strings
#> 1    france2
#> 2      cnews
#> 3        lcp
#> 4    europe1
#> 5  rmc-bfmtv
#> 6   sudradio
#> 7 franceinfo

尝试gsub

gsub(
  ".*/(.*)\.png", "\1",
  c(
    "images/logos/france2.png", "images/logos/cnews.png",
    "images/logos/lcp.png", "images/logos/europe1.png",
    "images/logos/rmc-bfmtv.png", "images/logos/sudradio.png",
    "images/logos/franceinfo.png"
  )
)

这给出了

[1] "france2"    "cnews"      "lcp"        "europe1"    "rmc-bfmtv"
[6] "sudradio"   "franceinfo"

regmatches(..., regexec(...)) 的输出是一个列表。您可以使用 sapply 从列表的每个元素中提取第二个元素。

sapply(regmatches(x, regexec(pattern, x)), `[[`, 2)

#[1] "france2"    "europe1"    "sudradio"   "cnews"   "rmc-bfmtv"  "franceinfo"
#[7] "lcp"        "rmc"        "lcp"   

您也可以使用 tools 包中的函数 basename + file_path_sans_ext 直接给出所需的输出。

tools::file_path_sans_ext(basename(x))
#[1] "france2"    "europe1"    "sudradio"   "cnews"   "rmc-bfmtv"  "franceinfo"
#[7] "lcp"        "rmc"        "lcp"