如何将所有没有 am 或 pm 的数字替换为标准字符串？

Question

我有一个如下所示的向量：

x <- c('3b  and to 10 am ','1c  and to 12 pm','#01-93  and to 10 am ')

我需要将所有没有字符串 am 或 pm 的数字替换为 "none"。所以我尝试使用

sub('.*-([0-9]+).*' ,'none',x)

但这行不通。我的预期输出如下：

x <- c('none  and to 10 am ','none and to 12 pm','none  and to 10 am ')

感谢任何帮助。

Answer 1

我们可以使用具有前瞻性的正则表达式来检查上午或下午。我使用了 stringr 包，但它也应该与基本函数一起使用。

library(stringr)

str_replace(x, "(\S*[0-9]+\S*)(?!\S*\s(am|pm))", "none")

# > "none  and to 10 am " "none  and to 12 pm"  "none  and to 10 am "

如果每个元素中可以替换多个数字，请使用 str_replace_all() 而不是 str_replace()

如果你想去掉多余的 space，我建议使用 stringr::str_squish()。

正则表达式分解

\S* 查找零个或多个非白色 space 字符。
[0-9] 匹配数字 0 - 9。
所以 (\S*[0-9]+\S*) 寻找两边有零个或多个非白色 space 字符的数字。这与您的示例中的所有情况都匹配，但如果此假设不正确，您可能必须更具体。
\s 匹配白色 space 字符
(am|pm) 匹配上午或下午
(?!x) 向前看并打折后跟 x
因此 (?!\S*\s(am|pm)) 向前看，并对在下一个 space 之后有上午或下午的任何比赛打折。这对于打折第二个数字至关重要。

Answer 2

您可以使用先行 ?! 运算符来不匹配上午和下午。将 perl 设置为 true 很重要，否则表达式无效。

sub('#?[0-9]+(\-[0-9]+)?[a-z]*(?!am|pm)' , 'none', x, perl = TRUE)

How to replace all the numbers without am or pm to a standard string?