如何在 urlopen 中正确编码字符串?

How to properly encode string inside urlopen?

问题: 我有一个文本文件,其名称是用俄语写的。我从文本文件中获取每个名称,并以文本文件中的行作为页面标题向维基百科提出请求。然后我想把这个网站上所有现有图片的信息。

程序:[=3​​1=]

    with open('names-video.txt', "r", encoding='Windows-1251') as file:
            for line in file.readlines():
                print(line)
                name = "_".join(line.split())
                print(name)
                html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
                bs = BeautifulSoup(html, 'html.parser')
                images = bs.findAll('img', {'src': re.compile('.jpg')})

                print(images[0])

names-video.txt:

Алимпиев, Виктор Гелиевич 
Андреев, Алексей Викторович (художник)
Баевер, Антонина
Булдаков, Алексей Александрович
Жестков, Максим Евгеньевич
Канис, Полина Владимировна
Мустафин, Денис Рафаилович
Преображенский, Кирилл Александрович
Селезнёв, Владимир Викторович
Сяйлев, Андрей Фёдорович
Шерстюк, Татьяна Александровна

错误信息:

error from callback <bound method SocketHandler.handle_message of <amino.socket.SocketHandler object at 0x0000018B92600FA0>>: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)
  File "C:\Users\Desktop\ИНФА\pycharm\venv\lib\site-packages\websocket\_app.py", line 344, in _callback
    callback(*args)
  File "C:\Users\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 80, in handle_message
    self.client.handle_socket_message(data)
  File "C:\Users\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\client.py", line 345, in handle_socket_message
    return self.callbacks.resolve(data)
  File "C:\Users\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 204, in resolve
    return self.methods.get(data["t"], self.default)(data)
  File "C:\Users\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 192, in _resolve_chat_message
    return self.chat_methods.get(key, self.default)(data)
  File "C:\Users\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 221, in on_text_message
    def on_text_message(self, data): self.call(getframe(0).f_code.co_name, objects.Event(data["o"]).Event)
  File "C:\Users\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 209, in call
    handler(data)
  File "C:\Users\Desktop\python-bots\music_bot\bot.py", line 56, in on_text_message
    html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
    response = self._open(req, data)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1342, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1266, in _send_request
    self.putrequest(method, url, **skips)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1104, in putrequest
    self._output(self._encode_request(request))
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1184, in _encode_request
    return request.encode('ascii')

问题: 由于某种原因,代码在 urlopen() 中断。 print(line)print(name) 工作得很好。这可能是什么问题?很长一段时间以来,我一直在努力解决这个问题,如果有任何解决方案,我将不胜感激,在此先感谢。

您需要对非 ASCII 字符进行百分比编码,使其成为正确的 URI:

from urllib.parse import quote
...
        name = "_".join(line.split())
        # Percent encode the UTF-8 characters
        name = quote(name)
        print(name)
...