python urllib.parse.urljoin 在以数字和冒号开头的路径上

Question

请问，这是怎么回事？

>>> import urllib.parse
>>> base = 'http://example.com'
>>> urllib.parse.urljoin(base, 'abc:123')
'http://example.com/abc:123'
>>> urllib.parse.urljoin(base, '123:abc')
'123:abc'
>>> urllib.parse.urljoin(base + '/', './123:abc')
'http://example.com/123:abc'

python3.7 文档说：

Changed in version 3.5: Behaviour updated to match the semantics defined in RFC 3986.

该 RFC 的哪一部分强制实施了这种疯狂行为，是否应将其视为错误？

Answer 1

该 RFC 的哪一部分强制实施了这种疯狂做法？

此行为正确并且与其他实现一致，如RFC3986:

所示

A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.

已经有人讨论过post:

Colons are allowed in the URI path. But you need to be careful when writing relative URI paths with a colon since it is not allowed when used like this:
<a href="tag:sample">
In this case tag would be interpreted as the URI’s scheme. Instead you need to write it like this:
<a href="./tag:sample">

`urljoin`

的用法

函数 urljoin 只是将两个参数都视为 URL （没有任何假设）。它要求它们的方案相同或第二个方案表示 相对 URI 路径 。否则，它只有 returns 第二个参数（尽管恕我直言，它应该会引发错误）。您可以通过查看 source of urljoin.

来更好地理解逻辑

def urljoin(base, url, allow_fragments=True):
    """Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter."""
    ...
    bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
            urlparse(base, '', allow_fragments)
    scheme, netloc, path, params, query, fragment = \
            urlparse(url, bscheme, allow_fragments)

    if scheme != bscheme or scheme not in uses_relative:
        return _coerce_result(url)

解析例程urlparse的结果如下：

>>> from urllib.parse import urlparse
>>> urlparse('123:abc')
ParseResult(scheme='123', netloc='', path='abc', params='', query='', fragment='')
>>> urlparse('abc:123')
ParseResult(scheme='', netloc='', path='abc:123', params='', query='', fragment='')
>>> urlparse('abc:a123')
ParseResult(scheme='abc', netloc='', path='a123', params='', query='', fragment='')

python urllib.parse.urljoin 在以数字和冒号开头的路径上

python urllib.parse.urljoin on path starting with numbers and colon

python

url

urllib

该 RFC 的哪一部分强制实施了这种疯狂做法？

urljoin

`urljoin`