使用 urllib 连接基础 url 和路径

Question

我正在尝试使用 Python 3 的 urllib.parse, but do not get the desired result. In addition I tried using os.path.join (which is not meant to be used for this purpose) and simple string concatenation using .format() 连接基数 URL url1 和相对路径 url2：

import os.path
import urllib.parse

url1 = "www.sampleurl.tld"
url2 = "/some/path/here"


print(urllib.parse.urljoin(url1, url2))
# --> "/some/path/here"

print(os.path.join(url1, url2))
# --> "/some/path/here"

print("{}{}".format(url1, url2))
# --> "www.sampleurl.tld/some/path/here" (desired output)

简单的字符串连接 returns 所需的绝对值 url。然而，这种方法似乎非常幼稚而且不是很优雅，因为它假设 url2 以 / 开头，但事实可能并非如此。当然，我可以通过调用 url2.startswith('/') and change the string concatenation to "{}/{}".format(url1, url2) to provide the desired flexibility, but I am still wondering how to do this in a proper way by means of urllib.parse.

来检查这一点

Answer 1

urljoin 期望第一个参数 baseurl 包含架构。

因此，将 https:// 或 http:// 添加到您的 url1 字符串中应该可以完成工作。

import urllib.parse

url1 = "https://www.sampleurl.tld"
url2 = "/some/path/here"


print(urllib.parse.urljoin(url1, url2))
# --> "https://www.sampleurl.tld/some/path/here"

Answer 2

import urllib.parse

url1 = 'www.sampleurl.tld'
url2 = '/some/path/here'

urlString = urllib.parse.ParseResult(scheme='https', netloc=url1, path=url2, params='', query='', fragment='')
urllib.parse.urlunparse(urlString)

你可以试试这个。 URL 不是从列表创建的，而是从 class ParseResult 创建的。

使用 urllib 连接基础 url 和路径

Concatenate base url and path using urllib

python

url

urllib

urlparse

python-3.x