正则表达式[Python] 从url 路径参数中提取
Regex[Python] Extract from url path parameters
我有一个来自访问日志的 URL。例子:
/someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w
/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen
我无法对服务名称或函数名称做出任何假设。
我正在尝试找到一个可以仅匹配第一个日志的正则表达式:
67814
alloy%20nudge%20w
第二个:
asdNmasdf423-asd342e
FS443GH
front%20parking%20sen
通过一些启发式方法,我尝试使用 [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,}
仅匹配长字符串,但函数名称(getPersonFromAllAccessoriesByDescription、getDealerFromSomethingSomething ) 也被抓住了。
我正在考虑可以与 [a-zA-Z0-9_%-]{15,}
执行相同操作的正则表达式,但条件是它必须至少是一位数字,因此这样函数名称将被跳过。
谢谢
你的启发式很好,使用
\b(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}
参见proof。
说明
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[a-zA-Z_%-]* any character of: 'a' to 'z', 'A' to
'Z', '_', '%', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9_%-]{5,} any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '%', '-' (at least 5
times (matching the most amount possible))
我有一个来自访问日志的 URL。例子:
/someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w
/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen
我无法对服务名称或函数名称做出任何假设。
我正在尝试找到一个可以仅匹配第一个日志的正则表达式:
67814
alloy%20nudge%20w
第二个:
asdNmasdf423-asd342e
FS443GH
front%20parking%20sen
通过一些启发式方法,我尝试使用 [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,}
仅匹配长字符串,但函数名称(getPersonFromAllAccessoriesByDescription、getDealerFromSomethingSomething ) 也被抓住了。
我正在考虑可以与 [a-zA-Z0-9_%-]{15,}
执行相同操作的正则表达式,但条件是它必须至少是一位数字,因此这样函数名称将被跳过。
谢谢
你的启发式很好,使用
\b(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}
参见proof。
说明
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[a-zA-Z_%-]* any character of: 'a' to 'z', 'A' to
'Z', '_', '%', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9_%-]{5,} any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '%', '-' (at least 5
times (matching the most amount possible))