使用 Python 在 Visual Studio 代码中显示日语字符

Question

根据 this older answer，Python 3 个字符串默认符合 UTF-8。但是在我使用 BeautifulSoup 的网络抓取工具中，当我尝试打印或显示 URL 时，日语字符显示为“%E3%81%82”或“%E3%81%B3”实际字符数。

This Japanese website 是我从中收集信息的那个，更具体地说是与可点击字母按钮中的 link 对应的 URL。当您将鼠标悬停在例如あa 上时，您的浏览器将显示您要点击的 link 是 https://kokugo.jitenon.jp/cat/gojuon.php?word=あ。但是，使用 BeautifulSoup 提取 link 的 ["href"] 属性，我得到 https://kokugo.jitenon.jp/cat/gojuon.php?word=%E3%81%82.

两个版本link到同一个网页，但为了调试，我想知道是否可以确保显示的字符串包含实际的日文字符。如果不是，我如何转换字符串以适应此目的？

Answer 1

叫做Percent-encoding:

Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI.

应用unquote method from urllib.parse module:

urllib.parse.unquote(string, encoding='utf-8', errors='replace')
Replace %xx escapes by their single-character equivalent. The optional encoding and errors parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the bytes.decode() method.

string must be a str. Changed in version 3.9: string parameter supports bytes and str objects (previously only str).

encoding defaults to 'utf-8'. errors defaults to 'replace', meaning invalid sequences are replaced by a placeholder character.

例子:

from urllib.parse import unquote
encodedUrl = 'JapaneseChars%E3%81%82or%E3%81%B3'
decodedUrl = unquote( encodedUrl )
print( decodedUrl )

JapaneseCharsあorび

可以将 unquote 方法应用于几乎任何字符串，即使已经解码：

print( unquote(decodedUrl) )

JapaneseCharsあorび

使用 Python 在 Visual Studio 代码中显示日语字符

Display Japanese characters in Visual Studio Code using Python

python

beautifulsoup

utf-8

python-3.x