使用正则表达式解析 URL

Question

我正在尝试在我的正则表达式中组合 if else，基本上如果字符串中存在某些模式，则捕获一种模式，如果不存在，则捕获另一种模式。

字符串是： 'https://www.searchpage.com/searchcompany.aspx?companyId=41490234&page=0&leftlink=true" 并且我想提取 '?"

周围的人员

所以如果'?'在字符串内部检测到，正则表达式应该捕获“？”之后的所有内容标记;如果没有，那么就从头开始捕捉。

我用过：'(.*\?.*)?(\?.*&.*)|(^&.*)' 但是没用...

有什么建议吗？

谢谢！

Answer 1

regex 可能不是这个问题的最佳解决方案...为什么不

my_url.split("?",1)

如果这真的是您想要做的一切

或如其他人所建议的那样

from urlparse import urlparse
print urlparse(my_url)

Answer 2

这个正则表达式：

(^[^?]*$|(?<=\?).*)

捕获：

^[^?]*$一切，如果没有?，或者
(?<=\?).* ? 之后的所有内容，如果有

但是，如果您使用的是 URL，则应查看 urllib.parse (Python 3) or urlparse (Python 2)。

Answer 3

使用urlparse:

>>> import urlparse
>>> parse_result = urlparse.urlparse('https://www.searchpage.com/searchcompany.aspx?
companyId=41490234&page=0&leftlink=true')

>>> parse_result
ParseResult(scheme='https', netloc='www.searchpage.com', 
path='/searchcompany.aspx', params='', 
query='companyId=41490234&page=0&leftlink=true', fragment='')

>>> urlparse.parse_qs(parse_result.query)
{'leftlink': ['true'], 'page': ['0'], 'companyId': ['41490234']}

最后一行是 key/value 对的字典。

使用正则表达式解析 URL

Parsing URL with regex

python

regex

conditional-statements