打开 urllib2 握手失败的页面

Opening a page with urllib2 handshake failure

我只是想打开一个网页:https://close5.com/home/

而且我不断收到关于我的 ssl 的不同错误。这是我的一些尝试和他们的错误。我愿意接受适用于任一框架的修复程序。我的最终目标是使用将此页面变成 beautifulsoup4 汤。

错误:

Traceback (most recent call last):
  File "test.py", line 54, in <module>
    print soup_maker_two(url)
  File "test.py", line 45, in soup_maker_two
    response = br.open(url)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 193, in open
    response = urlopen(self, req, data)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 344, in _open
    '_open', req)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1170, in https_open
    return self.do_open(conn_factory, req)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1118, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure>

代码:

import mechanize
import ssl
from functools import wraps

def sslwrap(func):
    @wraps(func)
    def bar(*args, **kw):
        kw['ssl_version'] = ssl.PROTOCOL_TLSv1
        return func(*args, **kw)
    return bar

def soup_maker_two(url):
    br = mechanize.Browser()
    br.set_handle_robots(False)
    br.set_handle_equiv(False)
    br.set_handle_refresh(False)
    br.addheaders = [('User-agent', 'Firefox')]
    ssl.wrap_socket = sslwrap(ssl.wrap_socket)
    response = br.open(url)
    for f in br.forms():
        print f
    return 'hi'



if __name__ == "__main__":
    url = 'https://close5.com/'
    print soup_maker_two(url)

我也试过得到这个错误和代码组合

第二次尝试

错误:

Traceback (most recent call last):
  File "test.py", line 29, in <module>
    print str(soup_maker(url))[0:1000]
  File "test.py", line 22, in soup_maker
    webpage = opener.open(req)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

代码:

from bs4 import BeautifulSoup
import urllib2

def soup_maker(url):
    class RedirectHandler(urllib2.HTTPRedirectHandler):
        def http_error_302(self, req, fp, code, msg, headers):
            result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
            result.status = code
            return result



    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
                                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
                                'Accept-Encoding': 'none',
                                'Accept-Language': 'en-US,en;q=0.8',
                                'Connection': 'keep-alive'}

    req = urllib2.Request(url,headers=hdr)
    opener = urllib2.build_opener(RedirectHandler())
    webpage = opener.open(req)
    soup = BeautifulSoup(webpage, "html5lib")
    return soup


if __name__ == "__main__":
    url = 'https://close5.com/home/'
    print str(soup_maker(url))[0:1000]

编辑 1

从 bs4 导入 BeautifulSoup 有人建议我使用:

def soup_maker(url):     
    soup = BeautifulSoup(requests.get(url).content, "html5lib")
    return soup

if __name__ == "__main__":
    import requests    
    url = 'https://close5.com/home/'
    print str(soup_maker(url))[:1000]

此代码适用于 Padraic,但不适用于我。我收到错误:

Traceback (most recent call last):
  File "test_3.py", line 10, in <module>
    print str(soup_maker(url))[:1000]
  File "test_3.py", line 4, in soup_maker
    soup = BeautifulSoup(requests.get(url).content, "html5lib")
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 385, in send
    raise SSLError(e)
requests.exceptions.SSLError: [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure

和之前一样的错误。我猜这可能与我使用的是 Python 2.7.6 有关,但我不确定。另外,我不确定如何使用该信息来解决我的问题。

编辑 2

问题可能在于请求的版本不正确。目前我的 pip freeze

中有 requests==2.2.1
sudo pip install -U requests

returns

Downloading/unpacking requests from https://pypi.python.org/packages/2.7/r/requests/requests-2.9.1-py2.py3-none-any.whl#md5=58a444aaa02780ad01983f5f540e67b2
  Downloading requests-2.9.1-py2.py3-none-any.whl (501kB): 501kB downloaded
Installing collected packages: requests
  Found existing installation: requests 2.2.1
    Not uninstalling requests at /usr/lib/python2.7/dist-packages, owned by OS
Successfully installed requests
Cleaning up..

sudo pip2 install -U requestsreturns一样

sudo pip uninstall requests returns

Not uninstalling requests at /usr/lib/python2.7/dist-packages, owned by OS

我是 运行 ubuntu 14.04 和 python 2.7.6 并请求 2.2.1

编辑 3

sudo pip install --ignore-installed requests

给予

Downloading/unpacking requests
  Downloading requests-2.9.1-py2.py3-none-any.whl (501kB): 501kB downloaded
Installing collected packages: requests
Successfully installed requests
Cleaning up...

sudo pip freeze 仍然给出 requests==2.2.1

编辑 4

经过很多建议后,我现在有了

$python
Python 2.7.6 (default, Jun 22 2015, 18:00:18) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests;requests.__version__
'2.9.1'
>>> url = 'https://close5.com/home/'
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(requests.get(url).content, "html5lib")
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 447, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
>>> 

我建议使用 requests:

def soup_maker(url):     
    soup = BeautifulSoup(requests.get(url).content)
    return soup

if __name__ == "__main__":
    import requests    
    url = 'https://close5.com/home/'
    print str(soup_maker(url))[:1000]

这会给你你需要的东西:

<html><head><title>Buy &amp; Sell Locally with Close5</title><meta content="Close5 provides a safe and easy environment to list your items and sell them fast. Shop cars, home goods and Children's items locally with Close5" name="description"/><meta content="index, follow" name="robots"/><!--link(rel="canonical" href="https://www.close5.com")-->
<link href="https://www.close5.com/images/favicons/favicon-160x160.png" rel="image_src"/><meta content="index, follow" name="robots"/><!-- Facebook Item Tags--><meta content="Buy &amp; Sell Locally with Close5" property="og:title"/><meta content="Close5" property="og:site_name"/><!-- meta(property="og:url" content='https://www.close5.com/images/app-icon.png')--><meta content="Close5 provides a safe and easy environment to list your items and sell them fast. Shop cars, home goods and Children's items locally with Close5" property="og:description"/><meta content="1470902013158927" property="fb:app_id"/><meta content="100000228184034" property="fb:

编辑 1:

你的pip版本太旧了,升级pip install -U requests

编辑2:

您使用 apt-get 安装了请求,因此您需要:

 apt-get remove python-requests
 pip install --ignore-installed requests # pip install -U requests should also work

我会完全删除 pip 并下载 get-pip.py、运行 python get-pip.py 并坚持使用 pip 安装软件包。 pip 很可能已成功安装请求,较新的版本可能在您的路径中更靠后。

编辑 3:

您使用 apt-get 安装了请求,因此您无法使用 pip 删除它,请按照 Edit2 中的建议使用 apt-get remove python-requests

编辑4:

输出中的 link 解释了正在发生的事情并建议:

pip install pyopenssl ndg-httpsclient pyasn1

您还可以:

pip install requests[security]