R sub with perl - 开始向后搜索?
R sub with perl - starts search backwards?
我有如下所示的 a
字符串。我需要提取 first //
和第一个后续 /
之间的部分字符串。我将 sub
与 perl = F
一起使用,但它比 perl = T
慢大约 4 倍。所以我尝试了 perl = T
,发现搜索从字符串的 END 开始??
a = "https://moo.com/meh/woof//A.ds.serving/hgtht//ghhg/tjtke"
print(gsub(".*//(.*?)/.*","\1",a))
"moo.com"
print(gsub(".*//(.*?)/.*","\1",a,perl=T))
"ghhg"
moo.com
是我需要的。我很惊讶地看到这个 - 它是否记录在某处?我如何用 perl
重写它 - 我有 2000 万行要处理,速度很重要。谢谢!
编辑:并不是每个字符串都以 http
开头
您可以尝试 .*?//(.*?)/.*
使第一个 .*
也变得惰性,这样 //
将匹配第一个 //
实例:
gsub(".*?//(.*?)/.*","\1",a,perl=T)
# [1] "moo.com"
并且 ?gsub
说:
The standard regular-expression code has been reported to be very slow
when applied to extremely long character strings (tens of thousands of
characters or more): the code used when perl = TRUE seems much faster
and more reliable for such usages.
The standard version of gsub does not substitute correctly repeated
word-boundaries (e.g. pattern = "\b"). Use perl = TRUE for such
matches.
我有如下所示的 a
字符串。我需要提取 first //
和第一个后续 /
之间的部分字符串。我将 sub
与 perl = F
一起使用,但它比 perl = T
慢大约 4 倍。所以我尝试了 perl = T
,发现搜索从字符串的 END 开始??
a = "https://moo.com/meh/woof//A.ds.serving/hgtht//ghhg/tjtke"
print(gsub(".*//(.*?)/.*","\1",a))
"moo.com"
print(gsub(".*//(.*?)/.*","\1",a,perl=T))
"ghhg"
moo.com
是我需要的。我很惊讶地看到这个 - 它是否记录在某处?我如何用 perl
重写它 - 我有 2000 万行要处理,速度很重要。谢谢!
编辑:并不是每个字符串都以 http
您可以尝试 .*?//(.*?)/.*
使第一个 .*
也变得惰性,这样 //
将匹配第一个 //
实例:
gsub(".*?//(.*?)/.*","\1",a,perl=T)
# [1] "moo.com"
并且 ?gsub
说:
The standard regular-expression code has been reported to be very slow when applied to extremely long character strings (tens of thousands of characters or more): the code used when perl = TRUE seems much faster and more reliable for such usages.
The standard version of gsub does not substitute correctly repeated word-boundaries (e.g. pattern = "\b"). Use perl = TRUE for such matches.