Python ValueError: unknown url type: space (?)

Question

我正在使用 Python 2.7 中的 urllib2 模块，使用 Spyder 3.0 通过读取包含文本文件列表的文本文件来批量下载文本文件：

    reload(sys)
    sys.setdefaultencoding('utf-8')
    with open('ocean_not_templated_url.txt', 'r') as text:
        lines = text.readlines()
        for line in lines:
            url = urllib2.urlopen(line.strip('ïÃ¯Â»Â¿ \xa0\t\n\r\v'))
            with open(line.strip('\n\r\t ').replace('/', '!').replace(':', '~'), 'wb') as out:
                for d in url:
                    out.write(d)

我已经在我删除的 url 中发现了一堆奇怪的字符，但是，脚本在将近 90% 完成时失败，并出现以下错误：

我认为它是一个不间断的space（代码中用\xa0表示），但它仍然失败。有什么想法吗？

Answer 1

真奇怪URL！

指定网络通信协议。如果文件存在于 WWW 上，请尝试在 URL 前加上 http:// 和域名。

文件总是驻留在某个地方，在某个服务器的目录中，或者在您系统的本地。所以必须有这样的文件的网络路径，例如：

http://127.0.0.1/folder1/samuel/file1.txt

同一个例子，localhost 是 127.0.0.1 的别名（通常）

http://localhost/folder1/samuel/file1.txt

这可能会解决问题。只需考虑您的文件存在的位置以及应该如何处理它...

更新：

我对此进行了很多实验。我想我知道为什么会出现该错误！ :D

I speculate that your file which stores the URL's actually has a sneaky empty line near the end. I can say it's near the end as you said that it executes about 90% of it and then fails. So, the python urllib2 function get_type is unable to process that empty url and throws unknown url type:

我认为这就是问题所在！删除文件 ocean_not_templated_url.txt 中的空行并尝试一下！

请查看并告诉我！ :P

Python ValueError: unknown url type: space (?)

Python ValueError: unknown url type: space (?)

python

download

urllib2