Python: How do I solve ConnectionError: HTTPSConnectionPool(host='data.humdata.orghttps', port=443)

Python: How do I solve ConnectionError: HTTPSConnectionPool(host='data.humdata.orghttps', port=443)

我正在尝试从这个 site 下载包含国家/地区数据的数据。我成功抓取了页面内容并获得了各个国家的下载链接,尤其是带有.zip文件的链接。

我收集文件下载链接的代码示例如下:

data = soup.find_all('div', {"class":'hdx-btn-group hdx-btn-group-fixed'})

for item in enumerate(data):
            if '.geojson' in item.get_text() or '.topojson' in item.get_text():
                continue
            elif 'ADM0' in item.get_text() or 'ADM1' in item.get_text() or 'ADM2' in item.get_text() or 'ADM3' in item.get_text():
                a_links = item.find('a', href=True)
                split_link = str(a_links).split('"')
                index_link = 0
                for val,every in enumerate(split_link):
                    if every == ' href=':
                        index_link = val + 1
                
                our_link = split_link[index_link]
                full_link = new_base_url + our_link
                all_new_links[m]['link'].append(full_link)
                all_new_links[m]['country'] = country_name

这里是文件下载链接列表的示例:

file_download = ['https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/4e9bd60a361c465be0bca5487f4e3d496a27d0be/releaseData/gbOpen/YEM/ADM2/geoBoundaries-YEM-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/eff4a77b014f07e326e04eeb24bc81de2a45f64f/releaseData/gbOpen/WSM/ADM2/geoBoundaries-WSM-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/3a56abbc5fdde21f7dee6ddb9062171da5689b6b/releaseData/gbOpen/VUT/ADM3/geoBoundaries-VUT-ADM3-all.zip','https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/f591db4ef2ab61b972fce008c7addd287d46c846/releaseData/gbOpen/TZA/ADM3/geoBoundaries-TZA-ADM3-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/b62f4ea59d20c9b258718a3c845b8bb12c7458f0/releaseData/gbOpen/UGA/ADM2/geoBoundaries-UGA-ADM2-all.zip']

这是我下载内容文件的代码

z = 1
for i in file_download:
    with open(basename('country'+str(z)+'.zip'), "wb") as f:
        f.write(requests.get(i).content)
        print(f"Item {z} Finished")
z += 1

然而,当我运行上面的代码下载数据文件(在这种情况下是压缩的)时,我得到错误:

ConnectionError: HTTPSConnectionPool(host='data.humdata.orghttps', port=443): Max retries exceeded with url: //github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000000F5E604BEE0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

我尝试了几种方法来尝试解决错误,但都没有成功。其中包括添加 headers 和 verify=False

z = 1
for i in file_download:
    with open(basename('country'+str(z)+'.zip'), "wb") as f:
        f.write(requests.get(i, headers=headers, verify=False).content)
        print(f"Item {z} Finished")
z += 1

如何解决这个错误?我很乐意感谢任何帮助。

简单错误,请查看您在 file_download 中的网址,您错误地将原始站点地址放在了前面,例如:'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip' 应该是 'https://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip'

请注意,您可以只克隆此存储库,其中所有这些数据集都位于: https://github.com/wmgeolab/geoBoundaries

我维护 geoBoundaries,并且还会推荐我们自己的 API,它(在我看来 :) ) ) 比 HDX 更容易使用: https://www.geoboundaries.org/api.html