正则表达式[Python] 从url 路径参数中提取

Question

我有一个来自访问日志的 URL。例子： /someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w

/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen

我无法对服务名称或函数名称做出任何假设。

我正在尝试找到一个可以仅匹配第一个日志的正则表达式：

67814
alloy%20nudge%20w

第二个：

asdNmasdf423-asd342e
FS443GH
front%20parking%20sen

通过一些启发式方法，我尝试使用 [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,} 仅匹配长字符串，但函数名称（getPersonFromAllAccessoriesByDescription、getDealerFromSomethingSomething ) 也被抓住了。

我正在考虑可以与 [a-zA-Z0-9_%-]{15,} 执行相同操作的正则表达式，但条件是它必须至少是一位数字，因此这样函数名称将被跳过。

谢谢

Answer 1

你的启发式很好，使用

\b(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}

参见proof。

说明

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [a-zA-Z_%-]*             any character of: 'a' to 'z', 'A' to
                             'Z', '_', '%', '-' (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    [0-9]                    any character of: '0' to '9'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [a-zA-Z0-9_%-]{5,}       any character of: 'a' to 'z', 'A' to 'Z',
                           '0' to '9', '_', '%', '-' (at least 5
                           times (matching the most amount possible))

正则表达式[Python] 从url 路径参数中提取

Regex[Python] Extract from url path parameters

regex

url

extract