gsub 并在字符串中返回正确的数字

Question

我在数据框中有一个文本字符串，如下所示

2 Sector. District 1, Area 1

我的目标是提取 Sector 之前的数字，否则 return 为空。

我认为以下正则表达式可以工作：

gsub("^(?:([0-9]+).*Sector.*|.*)$","\1",TEXTSTRINGCOLUMN)

当单词 Sector 不存在时，这正确地 return 什么都没有，但是 return 是 1 而不是 2。非常感谢帮助我出错的地方。谢谢！

Answer 1

我们可以对 "Sector" 使用正则表达式前瞻，将数字捕获为一个组，并在替换中指定捕获组 (\1)。

sub('.*?(\d+)\s*(?=Sector).*', '\1', v1, perl=TRUE)
#[1] "2"

编辑：根据@Avinash Raj 的评论修改。

不使用环视，（归功于@Avinash Raj）

sub('.*?(\d+)\s*Sector.*', '\1', v1)

v1 <- "2 Sector. District 1, Area 1"

Answer 2

试试，

x <- "2 Sector. District 1, Area 1"
substring(x, 0, as.integer(grepl("Sector", x)))
#[1] "2"

gsub and returning the correct number in a string