How to address urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url

Question

我试图用 selenium 抓取网站的几页并使用结果，但是当我运行函数两次时

[WinError 10061] No connection could be made because the target machine actively refused it'

第二次函数调用出现错误。这是我的方法：

import os
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as soup

opts = webdriver.ChromeOptions()
opts.binary_location = os.environ.get('GOOGLE_CHROME_BIN', None)
opts.add_argument("--headless")
opts.add_argument("--disable-dev-shm-usage")
opts.add_argument("--no-sandbox")
browser = webdriver.Chrome(executable_path="CHROME_DRIVER PATH", options=opts)

lst =[]
def search(st):
    for i in range(1,3):
        url = "https://gogoanime.so/anime-list.html?page=" + str(i)
        browser.get(url)
        req = browser.page_source
        sou = soup(req, "html.parser")
        title = sou.find('ul', class_ = "listing")
        title = title.find_all("li")
        for j in range(len(title)):
            lst.append(title[j].getText().lower()[1:])
    browser.quit()
    print(len(lst))
    
search("a")
search("a")

输出

272
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

Answer 1

这个错误信息...

raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

...表示无法建立新连接引发 MaxRetryError，因为无法建立连接。

几件事：

首先，根据讨论 max-retries-exceeded exceptions are confusing，traceback 有点误导。为方便用户，请求包装了异常。原来的异常是显示的部分消息。
请求从不重试（它为 urllib3 的 HTTPConnectionPool 设置 retries=0），因此如果没有 [=63，错误会更加规范=]MaxRetryError 和 HTTPConnectionPool 关键字。所以理想的 Traceback 应该是：
```
  ConnectionError(<class 'socket.error'>: [Errno 1111] Connection refused)
```

根本原因和解决方案

启动网络驱动程序和网络客户端会话后，接下来在 def search(st) 中调用 get() o 访问 url 和在后续行中，您还调用了 browser.quit()，它用于调用 /shutdown 端点，随后 Webdriver 和 web-client 实例被完全销毁，关闭所有 pages/tabs/windows。因此不再存在连接。

You can find a couple of relevant detailed discussion in:

Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?

在这种情况下，在下一次迭代（由于 for 循环）调用 browser.get() 时，没有活动连接。因此你看到了错误。

所以一个简单的解决方案是删除行 browser.quit() 并在相同的 浏览上下文 .

中调用 browser.get(url)

结论

升级到 Selenium 3.14.1 后，您将能够设置超时并查看规范的 Tracebacks 并且能够采取必要的行动。

参考资料

您可以在以下位置找到相关的详细讨论：

tl;博士

几个相关的讨论：

Answer 2

我在 Robot Framework 中遇到了同样的问题。

MaxRetryError: HTTPConnectionPool(host='options=add_argument("--ignore-certificate-errors")', port=80): Max retries exceeded with url: /session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001ABA3190F10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')).

当我将 Pycharm 中的所有库更新到最新版本并且我选择了 Intellibot@SeleniumLibrary.patched

后，这个问题就得到了解决

Answer 3

问题

driver 被要求在退出后抓取 URL。请确保您在获取内容之前没有退出 driver。

解决方案

关于您的代码，在执行 search("a") 时，driver 检索 url、returns 内容，然后关闭。

当 serach() 再次运行时，driver 不再存在，因此无法继续执行 URL。

您需要从函数中删除 browser.quit() 并将其添加到脚本末尾。

lst =[]
def search(st):
    for i in range(1,3):
        url = "https://gogoanime.so/anime-list.html?page=" + str(i)
        browser.get(url)
        req = browser.page_source
        sou = soup(req, "html.parser")
        title = sou.find('ul', class_ = "listing")
        title = title.find_all("li")
        for j in range(len(title)):
            lst.append(title[j].getText().lower()[1:])
    print(len(lst))
    
search("a")
search("a")
browser.quit()

How to address urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url