查找特定长度且包含特定字符的字符串

Finding strings that are a certain length and contain specific characters

示例数据

a<-c("hour","four","ruoh", "six", "high", "our")

我想查找所有包含 o & u & h & 的字符串都是 4 个字符,但顺序无关紧要。

我要return"hour","four","ruoh" 这是我的尝试

grepl("o+u+r", a) nchar(a)==4

您可以使用 strsplitsetdiff,我在您的示例数据中添加了一个额外的边缘案例:

a<-c("hour","four","ruoh", "six", "high", "our","oouh")
a[nchar(a) == 4 &
  lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
# [1] "hour" "ruoh"

grepl

a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
# [1] "hour" "ruoh" "oouh"

sapply(c("o","u","h"), Negate(grepl), a) 给你一个矩阵,其中的单词不包含每个字母,然后 rowSums 就像按行应用的 any 一样,因为它将被强制转换为逻辑。

将 grepl 与您编辑的方法一起使用(r 而不是 h):

a<-c("hour","four","ruoh", "six", "high", "our")

a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]

Returns:

[1] "hour" "four" "ruoh"

匹配长度为 4 的字符串,其中包含字符 hou 使用:

grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
      c("hour","four","ruoh", "six", "high", "our"),
      perl = TRUE)
[1]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
  • (?=^.{4}$): 字符串长度为 4.
  • (?=.*x)x出现在字符串中的任何位置。