如何将 "Short Links" 从 "href" 转换为实际 URL?
How to convert "Short Links" from an "href" to an Actual URL?
假设我正在抓取一个网页,并从中抓取了所有链接。在 python 中,我如何转换这样的链接:
Catalog.php
Products.aspx
Contact.html
到像这样的实际链接:
https://example.com/Catalog.php
https://example.com/Products.aspx
https://example.com/Contact.html
我使用 DuckDuckGo 的强大功能在堆栈溢出上到处搜索。也许这个问题有重复,但我不知道如何表达这个问题。
import urllib.parse
urllib.parse.urljoin("https://example.com", "/Catalog.php")
假设您将 https://example.com 作为基本路径。
您可以使用 urllib 中的 urljoin 方法。
Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.
import urllib.parse
base_path = "https://example.com/"
relative_path = "/Catalog.php"
new_url = urllib.parse.urljoin(base_path,relative_path)
你得到
>>> https://example.com/Catalog.php
假设我正在抓取一个网页,并从中抓取了所有链接。在 python 中,我如何转换这样的链接:
Catalog.php
Products.aspx
Contact.html
到像这样的实际链接:
https://example.com/Catalog.php
https://example.com/Products.aspx
https://example.com/Contact.html
我使用 DuckDuckGo 的强大功能在堆栈溢出上到处搜索。也许这个问题有重复,但我不知道如何表达这个问题。
import urllib.parse
urllib.parse.urljoin("https://example.com", "/Catalog.php")
假设您将 https://example.com 作为基本路径。
您可以使用 urllib 中的 urljoin 方法。
Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.
import urllib.parse
base_path = "https://example.com/"
relative_path = "/Catalog.php"
new_url = urllib.parse.urljoin(base_path,relative_path)
你得到
>>> https://example.com/Catalog.php