正则表达式匹配小数点但不匹配 .html

Question

我有这种格式的网址：-

/scan/anything/se=hello-world/se=word.html
/scan/anything/se=hello-world/se=1.5/
/scan/anything/se=temp-2.5/se=1.5.html

我正在尝试匹配每个 se= 后的单词字符加上破折号和小数点并捕获它们。

我想出的正则表达式是这样的：-

  ^/scan/.*?se=([\w-.]*)/?(?:se=)([\w-.]*)/?(?:.html)?

因为我在字符 class 中添加了一个点 (.) 来匹配小数点它也匹配 .html 所以捕获 word.html 和 1.5.html 而不仅仅是来自 url 1 和 3 的 "word" 和“1.5”，我怎样才能阻止它匹配。html 我尝试了各种否定，但 none 似乎有效。

期望的输出：

你好世界和单词
hello-world 和 1.5
temp-2.5 和 1.5

Answer 1

您想使用像这样的负面字符 class 结合正面展望，这不算作捕获组的一部分：

se=([^/]+)/se=((?:[^/]+)(?=\.html)|[^/]+)

这样你就可以捕获所有非 / 到下一个 /

这里是 Python 中的一个小例子：

import re

thelist = [
"/scan/anything/se=hello-world/se=word.html",
"/scan/anything/se=hello-world/se=1.5/",
"/scan/anything/se=temp-2.5/se=1.5.html",
]

regex = "se=([^/]+)/se=((?:[^/]+)(?=\.html)|[^/]+)"

for item in thelist:
    thematch = re.search(regex, item)
    print(thematch.group(1))
    print(thematch.group(2))
    print("------------")

结果：

hello-world
word
------------
hello-world
1.5
------------
temp-2.5
1.5
------------

http://regex101.com 是一个不错的小网站，如果您需要调整正则表达式

Answer 2

我建议这个正则表达式：

se=((?:[\w-.]+)(?=\.html)|[\w-.]+)

参见 this demo。

这将匹配任何可以包含 - 或 . 的单词，直到可能出现 .html（它将在 .html 之前停止，如果有的话）。

编辑：

上面的正则表达式不会捕获 .html，即使它在内部和 URL 中，就像在参数的末尾一样。例如，在这种情况下将捕获以下内容：

/scan/anything/se=hello-world.html/se=word.html
^^^^^^^^^^^ ^^^^^

因此，如果您想捕获除最后一个 .html 以外的所有内容，则必须添加字符串结尾字符 $ :

se=((?:[\w-.]+)(?=\.html$)|[\w-.]+)

参见 this second demo。

编辑 2 :

根据此处 OP 评论提供的信息，此正则表达式更适合进行 URL 重定向：

^\/scan\/anything\/se=([\w-.]+)\/se=((?:[\w-.]+)(?=\.html)|[\w-.]+)

参见 this demo。

这将为每个 URL 分别捕获 $1 和 $2 中的 se 个参数，同时仍然匹配与上述正则表达式相同的输入。

正则表达式匹配小数点但不匹配 .html

Regex to match decimal point but not .html

regex

regex-negation