Python: How do I solve ConnectionError: HTTPSConnectionPool(host='data.humdata.orghttps', port=443)
Python: How do I solve ConnectionError: HTTPSConnectionPool(host='data.humdata.orghttps', port=443)
我正在尝试从这个 site 下载包含国家/地区数据的数据。我成功抓取了页面内容并获得了各个国家的下载链接,尤其是带有.zip文件的链接。
我收集文件下载链接的代码示例如下:
data = soup.find_all('div', {"class":'hdx-btn-group hdx-btn-group-fixed'})
for item in enumerate(data):
if '.geojson' in item.get_text() or '.topojson' in item.get_text():
continue
elif 'ADM0' in item.get_text() or 'ADM1' in item.get_text() or 'ADM2' in item.get_text() or 'ADM3' in item.get_text():
a_links = item.find('a', href=True)
split_link = str(a_links).split('"')
index_link = 0
for val,every in enumerate(split_link):
if every == ' href=':
index_link = val + 1
our_link = split_link[index_link]
full_link = new_base_url + our_link
all_new_links[m]['link'].append(full_link)
all_new_links[m]['country'] = country_name
这里是文件下载链接列表的示例:
file_download = ['https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/4e9bd60a361c465be0bca5487f4e3d496a27d0be/releaseData/gbOpen/YEM/ADM2/geoBoundaries-YEM-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/eff4a77b014f07e326e04eeb24bc81de2a45f64f/releaseData/gbOpen/WSM/ADM2/geoBoundaries-WSM-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/3a56abbc5fdde21f7dee6ddb9062171da5689b6b/releaseData/gbOpen/VUT/ADM3/geoBoundaries-VUT-ADM3-all.zip','https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/f591db4ef2ab61b972fce008c7addd287d46c846/releaseData/gbOpen/TZA/ADM3/geoBoundaries-TZA-ADM3-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/b62f4ea59d20c9b258718a3c845b8bb12c7458f0/releaseData/gbOpen/UGA/ADM2/geoBoundaries-UGA-ADM2-all.zip']
这是我下载内容文件的代码
z = 1
for i in file_download:
with open(basename('country'+str(z)+'.zip'), "wb") as f:
f.write(requests.get(i).content)
print(f"Item {z} Finished")
z += 1
然而,当我运行上面的代码下载数据文件(在这种情况下是压缩的)时,我得到错误:
ConnectionError: HTTPSConnectionPool(host='data.humdata.orghttps',
port=443): Max retries exceeded with url:
//github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection
object at 0x000000F5E604BEE0>: Failed to establish a new connection:
[Errno 11001] getaddrinfo failed'))
我尝试了几种方法来尝试解决错误,但都没有成功。其中包括添加 headers 和 verify=False
z = 1
for i in file_download:
with open(basename('country'+str(z)+'.zip'), "wb") as f:
f.write(requests.get(i, headers=headers, verify=False).content)
print(f"Item {z} Finished")
z += 1
如何解决这个错误?我很乐意感谢任何帮助。
简单错误,请查看您在 file_download 中的网址,您错误地将原始站点地址放在了前面,例如:'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip' 应该是 'https://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip'
请注意,您可以只克隆此存储库,其中所有这些数据集都位于:
https://github.com/wmgeolab/geoBoundaries
我维护 geoBoundaries,并且还会推荐我们自己的 API,它(在我看来 :) ) ) 比 HDX 更容易使用:
https://www.geoboundaries.org/api.html
我正在尝试从这个 site 下载包含国家/地区数据的数据。我成功抓取了页面内容并获得了各个国家的下载链接,尤其是带有.zip文件的链接。
我收集文件下载链接的代码示例如下:
data = soup.find_all('div', {"class":'hdx-btn-group hdx-btn-group-fixed'})
for item in enumerate(data):
if '.geojson' in item.get_text() or '.topojson' in item.get_text():
continue
elif 'ADM0' in item.get_text() or 'ADM1' in item.get_text() or 'ADM2' in item.get_text() or 'ADM3' in item.get_text():
a_links = item.find('a', href=True)
split_link = str(a_links).split('"')
index_link = 0
for val,every in enumerate(split_link):
if every == ' href=':
index_link = val + 1
our_link = split_link[index_link]
full_link = new_base_url + our_link
all_new_links[m]['link'].append(full_link)
all_new_links[m]['country'] = country_name
这里是文件下载链接列表的示例:
file_download = ['https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/4e9bd60a361c465be0bca5487f4e3d496a27d0be/releaseData/gbOpen/YEM/ADM2/geoBoundaries-YEM-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/eff4a77b014f07e326e04eeb24bc81de2a45f64f/releaseData/gbOpen/WSM/ADM2/geoBoundaries-WSM-ADM2-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/3a56abbc5fdde21f7dee6ddb9062171da5689b6b/releaseData/gbOpen/VUT/ADM3/geoBoundaries-VUT-ADM3-all.zip','https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/f591db4ef2ab61b972fce008c7addd287d46c846/releaseData/gbOpen/TZA/ADM3/geoBoundaries-TZA-ADM3-all.zip', 'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/b62f4ea59d20c9b258718a3c845b8bb12c7458f0/releaseData/gbOpen/UGA/ADM2/geoBoundaries-UGA-ADM2-all.zip']
这是我下载内容文件的代码
z = 1
for i in file_download:
with open(basename('country'+str(z)+'.zip'), "wb") as f:
f.write(requests.get(i).content)
print(f"Item {z} Finished")
z += 1
然而,当我运行上面的代码下载数据文件(在这种情况下是压缩的)时,我得到错误:
ConnectionError: HTTPSConnectionPool(host='data.humdata.orghttps', port=443): Max retries exceeded with url: //github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000000F5E604BEE0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
我尝试了几种方法来尝试解决错误,但都没有成功。其中包括添加 headers 和 verify=False
z = 1
for i in file_download:
with open(basename('country'+str(z)+'.zip'), "wb") as f:
f.write(requests.get(i, headers=headers, verify=False).content)
print(f"Item {z} Finished")
z += 1
如何解决这个错误?我很乐意感谢任何帮助。
简单错误,请查看您在 file_download 中的网址,您错误地将原始站点地址放在了前面,例如:'https://data.humdata.orghttps://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip' 应该是 'https://github.com/wmgeolab/geoBoundaries/raw/ba7b3ab481359205226ac8712f07326d6a7edb3d/releaseData/gbOpen/ZMB/ADM2/geoBoundaries-ZMB-ADM2-all.zip'
请注意,您可以只克隆此存储库,其中所有这些数据集都位于: https://github.com/wmgeolab/geoBoundaries
我维护 geoBoundaries,并且还会推荐我们自己的 API,它(在我看来 :) ) ) 比 HDX 更容易使用: https://www.geoboundaries.org/api.html