python urllib.parse.urljoin 在以数字和冒号开头的路径上
python urllib.parse.urljoin on path starting with numbers and colon
请问,这是怎么回事?
>>> import urllib.parse
>>> base = 'http://example.com'
>>> urllib.parse.urljoin(base, 'abc:123')
'http://example.com/abc:123'
>>> urllib.parse.urljoin(base, '123:abc')
'123:abc'
>>> urllib.parse.urljoin(base + '/', './123:abc')
'http://example.com/123:abc'
python3.7 文档说:
Changed in version 3.5: Behaviour updated to match the semantics defined in RFC 3986.
该 RFC 的哪一部分强制实施了这种疯狂行为,是否应将其视为错误?
该 RFC 的哪一部分强制实施了这种疯狂做法?
此行为正确并且与其他实现一致,如RFC3986:
所示
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.
已经有人讨论过post:
Colons are allowed in the URI path. But you need to be careful when writing relative URI paths with a colon since it is not allowed when used like this:
<a href="tag:sample">
In this case tag would be interpreted as the URI’s scheme. Instead you need to write it like this:
<a href="./tag:sample">
urljoin
的用法
函数 urljoin
只是将两个参数都视为 URL (没有任何假设)。它要求它们的方案相同或第二个方案表示 相对 URI 路径 。否则,它只有 returns 第二个参数(尽管恕我直言,它应该会引发错误)。您可以通过查看 source of urljoin.
来更好地理解逻辑
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
...
bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
urlparse(base, '', allow_fragments)
scheme, netloc, path, params, query, fragment = \
urlparse(url, bscheme, allow_fragments)
if scheme != bscheme or scheme not in uses_relative:
return _coerce_result(url)
解析例程urlparse
的结果如下:
>>> from urllib.parse import urlparse
>>> urlparse('123:abc')
ParseResult(scheme='123', netloc='', path='abc', params='', query='', fragment='')
>>> urlparse('abc:123')
ParseResult(scheme='', netloc='', path='abc:123', params='', query='', fragment='')
>>> urlparse('abc:a123')
ParseResult(scheme='abc', netloc='', path='a123', params='', query='', fragment='')
请问,这是怎么回事?
>>> import urllib.parse
>>> base = 'http://example.com'
>>> urllib.parse.urljoin(base, 'abc:123')
'http://example.com/abc:123'
>>> urllib.parse.urljoin(base, '123:abc')
'123:abc'
>>> urllib.parse.urljoin(base + '/', './123:abc')
'http://example.com/123:abc'
python3.7 文档说:
Changed in version 3.5: Behaviour updated to match the semantics defined in RFC 3986.
该 RFC 的哪一部分强制实施了这种疯狂行为,是否应将其视为错误?
该 RFC 的哪一部分强制实施了这种疯狂做法?
此行为正确并且与其他实现一致,如RFC3986:
所示A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.
已经有人讨论过post:
Colons are allowed in the URI path. But you need to be careful when writing relative URI paths with a colon since it is not allowed when used like this:
<a href="tag:sample">
In this case tag would be interpreted as the URI’s scheme. Instead you need to write it like this:
<a href="./tag:sample">
urljoin
的用法
函数 urljoin
只是将两个参数都视为 URL (没有任何假设)。它要求它们的方案相同或第二个方案表示 相对 URI 路径 。否则,它只有 returns 第二个参数(尽管恕我直言,它应该会引发错误)。您可以通过查看 source of urljoin.
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
...
bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
urlparse(base, '', allow_fragments)
scheme, netloc, path, params, query, fragment = \
urlparse(url, bscheme, allow_fragments)
if scheme != bscheme or scheme not in uses_relative:
return _coerce_result(url)
解析例程urlparse
的结果如下:
>>> from urllib.parse import urlparse
>>> urlparse('123:abc')
ParseResult(scheme='123', netloc='', path='abc', params='', query='', fragment='')
>>> urlparse('abc:123')
ParseResult(scheme='', netloc='', path='abc:123', params='', query='', fragment='')
>>> urlparse('abc:a123')
ParseResult(scheme='abc', netloc='', path='a123', params='', query='', fragment='')