尝试下载文件时出现 http 错误 400 urllib2

Question

事情是这样的，我正在做一个从不同站点下载文件的脚本。问题是我不明白为什么它会抛出这个错误，而如果我在浏览器上输入相同的 url 它会让我下载文件。还有其他 urls 可以正常工作。所以...这是代码：

import os
from bs4 import BeautifulSoup
import time
import urllib2

f = urllib2.Request(url)
f.add_header('User-Agent', 'Mozilla/5.0 Windows NT 6.3; WOW64; rv:34.0')
request = urllib2.urlopen(f)
data = request.read()
soup = BeautifulSoup(data, 'html.parser')
p_name = soup.find('h2', id="searchResults").contents[0]
if not os.path.exists(p_name):
  os.makedirs(p_name)
for a in soup.find_all('a', href="#register"):
    f = a["data-durl"]
#Following two lines just prepares file name
    n = len(f.split("/"))
    n_file = f.split("/")[n-1]
    path_file = p_name+"\"+n_file
    if os.path.isfile(path_file):
        print "Firmware already downloaded. skipping it"
    else:
        print "Downloading "+ path_file
        link = urllib2.urlopen(f)
        datos = link.read()
#print "[+] Downloading firmware %s" % n_file
#n_archivo = "Archivo"+str(b)+".zip"
        with open(path_file, "wb") as code:
           code.write(datos)
    time.sleep(2)

此 url 不适用于此脚本：Non working url 但是这个很好用 working url

希望你能帮助我。

编辑：我添加了我为此使用的库。和堆栈跟踪我发现了错误！！问题是它尝试下载的文件名上有空格。使用 f.replace(" ","%20") 应该可以正常工作:)

Answer 1

您需要将文件名中的 space 转换为 URL 编码 space：%20。为此，您可以使用 str.replace():

在这两行之间添加一行

print "Downloading "+ path_file
f = f.replace(' ', '%20')
link = urllib2.urlopen(f)

这将从 url 下载：

http://www.downloads.netgear.com/files/GDC/ME101/ME101%20Software%20Utility%20Version%202.0.zip

而不是来自

http://www.downloads.netgear.com/files/GDC/ME101/ME101 Software Utility Version 2.0.zip

这是无效的，因为它包含 spaces。

此 URL 在您的浏览器中仍然有效，因为当您输入带有 space 的 URL 时，您的浏览器会自动将它们转换为 %20。

尝试下载文件时出现 http 错误 400 urllib2

Getting http error 400 urllib2 when trying to download a file

python

urllib2

beautifulsoup