R sub with perl - 开始向后搜索?

R sub with perl - starts search backwards?

我有如下所示的 a 字符串。我需要提取 first // 和第一个后续 / 之间的部分字符串。我将 subperl = F 一起使用,但它比 perl = T 慢大约 4 倍。所以我尝试了 perl = T ,发现搜索从字符串的 END 开始??

    a = "https://moo.com/meh/woof//A.ds.serving/hgtht//ghhg/tjtke"
    print(gsub(".*//(.*?)/.*","\1",a))

    "moo.com"

    print(gsub(".*//(.*?)/.*","\1",a,perl=T))

    "ghhg"

moo.com 是我需要的。我很惊讶地看到这个 - 它是否记录在某处?我如何用 perl 重写它 - 我有 2000 万行要处理,速度很重要。谢谢!

编辑:并不是每个字符串都以 http

开头

您可以尝试 .*?//(.*?)/.* 使第一个 .* 也变得惰性,这样 // 将匹配第一个 // 实例:

gsub(".*?//(.*?)/.*","\1",a,perl=T)
# [1] "moo.com"

并且 ?gsub 说:

The standard regular-expression code has been reported to be very slow when applied to extremely long character strings (tens of thousands of characters or more): the code used when perl = TRUE seems much faster and more reliable for such usages.

The standard version of gsub does not substitute correctly repeated word-boundaries (e.g. pattern = "\b"). Use perl = TRUE for such matches.