在正则表达式捕获组中，排除一个词

Question

我有这种url:

https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245

我只想提取 com 和 app

之间的单词

https://example.com/en/app/893245 -> en
https://example.com/ru/app/wq23245 -> ru
https://example.com/app/8984245 ->

我试图从捕获组中排除应用程序，但我不知道该怎么做，除非像这样：

.*com\/((?!app).*)\/app

是否可以像这样但不捕获单词应用程序？ example\.com\/(\w+|?!app)\/

红色 link: https://rubular.com/r/NnojSgQK7EuelE

Answer 1

如果你需要一个普通的正则表达式，你可以使用 lookarounds:

/(?<=example\.com\/)\w+(?=\/app)/

或者，在 URL 的上下文中可能更好：

/(?<=example\.com\/)[^\/]+(?=\/app)/

参见Rubular demo。

In Ruby，你可以使用

strs = ['https://example.com/en/app/893245','https://example.com/ru/app/wq23245','https://example.com/app/8984245']
strs.each { |s|
    p s[/example\.com\/(\w+)\/app/, 1]
}
# => ["en", "ru", nil]

Answer 2

你可以使用 sed

sed -n -f script.sed yourinput.txt

并在 script.sed 内：

s/.*com\/\(.*\)\/app.*//p

示例输入：

https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245

示例输出：

$ sed -n -f comapp.sed comapp.txt
en
ru

在正则表达式捕获组中，排除一个词

In regex capture group, exclude one word

regex

regex-negation