如何将 UTF-8 中的网络抓取图像 link 编码为 ASCII 但仍然具有功能 link?
How to encode a webscraped image link in UTF-8 to ASCII but still have a functional link?
我正在尝试将 link 抓取到图像中,以便在我的 Kivy 应用程序中使用它。问题是图像地址中有波兰语符号(ę、ł、ó、ą),我收到此错误:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 36-37: ordinal not in range(128)
完整的错误回溯:
Traceback (most recent call last):
File "F:\Kivy\lib\site-packages\kivy\loader.py", line 342, in _load_urllib
fd = opener.open(request)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 526, in open
response = self._open(req, data)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 544, in _open
'_open', req)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1250, in _send_request
self.putrequest(method, url, **skips)
File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1117, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u0142' in position 36: ordinal not in range(128)
[INFO ] [GL ] NPOT texture support is available
[INFO ] [WindowSDL ] exiting mainloop and closing.
[INFO ] [Base ] Leaving application in progress...
Process finished with exit code 0
这里有一个例子,你可以明白我的意思。在正常加载图片时,没有错误,第二个输出 UnicodeEncodeError
并显示黑色。
from kivy.app import App
from kivy.lang import Builder
build_structure = """
Screen:
BoxLayout:
AsyncImage:
# This doesnt load because it's in UTF-8 and outputs the error above
# but it doesn't break the app.
source: app.link_to_image_bad
AsyncImage:
# This one does load
source: app.link_to_image_good
"""
class ImageApp(App):
# This link has Polish signs in it so it will give the UnicodeEncodeError
link_to_image_bad = "https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szkoła-do-hymnu.png"
link_to_image_good = "https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Olimpiada-statystyczna.png"
def build(self):
return Builder.load_string(build_structure)
if __name__ == '__main__':
ImageApp().run()
以上代码的输出:
有没有办法避免这个错误并且仍然可以正常工作link?
URL 应该已经是 ASCII 兼容的。互联网(又名 HTTP)上的流量是这样工作的:只有 ASCII URLS(有额外的限制)。浏览器现在倾向于取消转义 URL。 [我们在 URL 中部分看到的 %20
和其他 %xx
字符。注意:现在我们有 UTF-8 编码,并且最重要的是 URL 转义。所以,你应该记住你有两层编码。
你应该逃脱 URL,见 URL quoting。我会使用 quote()
和 unquote()
。在评论中,我们看到了一个quote_plus()
,但那个也改变了space,有时间用,但它会改变原始数据的含义。
编辑:
好的,我有问题。 kivy 如何处理 URLS 似乎有些奇怪。 quote()
仅用于路径部分,不适用于 URL.
的第一部分
作为 hack(如果你有一个特定的端口,它就不起作用:它会在端口前面引用 :
):
url = 'https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szkoła-do-hymnu.png'
url_split = url.split('//')
'//'.join([url_split[0], urllib.parse.quote(url_split[1]))
所以你得到了想要的:'https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szko%C5%82a-do-hymnu.png'
浏览器使用。
您可能希望将它包含在您自己的函数中(并且可能检查是否有端口号,以将其排除在引号之外)。
等等,也许有人有 Kivy 的真正解决方案。我从不使用完全限定路径(协议和域也是如此),所以对我来说基本 quote()
就足够了。
我正在尝试将 link 抓取到图像中,以便在我的 Kivy 应用程序中使用它。问题是图像地址中有波兰语符号(ę、ł、ó、ą),我收到此错误:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 36-37: ordinal not in range(128)
完整的错误回溯:
Traceback (most recent call last):
File "F:\Kivy\lib\site-packages\kivy\loader.py", line 342, in _load_urllib
fd = opener.open(request)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 526, in open
response = self._open(req, data)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 544, in _open
'_open', req)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1250, in _send_request
self.putrequest(method, url, **skips)
File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1117, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u0142' in position 36: ordinal not in range(128)
[INFO ] [GL ] NPOT texture support is available
[INFO ] [WindowSDL ] exiting mainloop and closing.
[INFO ] [Base ] Leaving application in progress...
Process finished with exit code 0
这里有一个例子,你可以明白我的意思。在正常加载图片时,没有错误,第二个输出 UnicodeEncodeError
并显示黑色。
from kivy.app import App
from kivy.lang import Builder
build_structure = """
Screen:
BoxLayout:
AsyncImage:
# This doesnt load because it's in UTF-8 and outputs the error above
# but it doesn't break the app.
source: app.link_to_image_bad
AsyncImage:
# This one does load
source: app.link_to_image_good
"""
class ImageApp(App):
# This link has Polish signs in it so it will give the UnicodeEncodeError
link_to_image_bad = "https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szkoła-do-hymnu.png"
link_to_image_good = "https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Olimpiada-statystyczna.png"
def build(self):
return Builder.load_string(build_structure)
if __name__ == '__main__':
ImageApp().run()
以上代码的输出:
有没有办法避免这个错误并且仍然可以正常工作link?
URL 应该已经是 ASCII 兼容的。互联网(又名 HTTP)上的流量是这样工作的:只有 ASCII URLS(有额外的限制)。浏览器现在倾向于取消转义 URL。 [我们在 URL 中部分看到的 %20
和其他 %xx
字符。注意:现在我们有 UTF-8 编码,并且最重要的是 URL 转义。所以,你应该记住你有两层编码。
你应该逃脱 URL,见 URL quoting。我会使用 quote()
和 unquote()
。在评论中,我们看到了一个quote_plus()
,但那个也改变了space,有时间用,但它会改变原始数据的含义。
编辑:
好的,我有问题。 kivy 如何处理 URLS 似乎有些奇怪。 quote()
仅用于路径部分,不适用于 URL.
作为 hack(如果你有一个特定的端口,它就不起作用:它会在端口前面引用 :
):
url = 'https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szkoła-do-hymnu.png'
url_split = url.split('//')
'//'.join([url_split[0], urllib.parse.quote(url_split[1]))
所以你得到了想要的:'https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szko%C5%82a-do-hymnu.png'
浏览器使用。
您可能希望将它包含在您自己的函数中(并且可能检查是否有端口号,以将其排除在引号之外)。
等等,也许有人有 Kivy 的真正解决方案。我从不使用完全限定路径(协议和域也是如此),所以对我来说基本 quote()
就足够了。