“'charmap' 编解码器无法编码字符”（Http 请求）

Question

我正在使用 urllib.request.urlopen 从网络 API 获取 *.srt 文件。（相关）代码 (Python 3.x):

with urllib.request.urlopen(req) as response:
    result = response.read().decode('utf-8')
    print(result)

    with open(subpath, 'w') as file:

        file.write(result)
        file.close()

这工作正常，但某些文件除外。对于某些文件，我收到以下错误： UnicodeEncodeError: 'charmap' codec can't encode character '\u266a' in position 37983: character maps to <undefined>

（\u266a 是四分音符。）

如何解决这个问题？我可以从 .read() 返回的字节对象中过滤这个字符吗？或者我可以忽略编码错误吗？提前致谢。

此外，请注意我确实发现了许多关于“...无法编码字符...”的主题-错误，但是，在大多数情况下，使用 .decode('utf-8') 是解决方案。

Answer 1

我一直没能解决解码错误，但是，我找到了解决方法。

通过以二进制模式写入文件，可以写入字节对象，因此不需要解码：

with urllib.request.urlopen(req) as response:
    result = response.read()
    # print(result)

    with open(subpath, 'wb') as file:

        file.write(result)
        file.close()

“'charmap' 编解码器无法编码字符”（Http 请求）

"'charmap' codec can't encode character" (Http Request)

python

encoding

request