Mechanize Python 页面下载不适用于 HTTPS

Question

我现在在 Linux Mint 13 Xfce 32-bit, 3.2.0-7 和 Python 2.7.3。我只是想只阅读受 HTTPS 保护的网页的源代码。这是我的小程序：

#!/usr/bin/env python
import mechanize

browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.set_handle_equiv(False)
browser.addheaders = [('User-Agent',
                               'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36     (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'),
                              ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'),
                              ('Accept-Encoding', 'gzip, deflate, sdch'),
                              ('Accept-Language', 'en-US,en;q=0.8,ru;q=0.6'),
                              ('Cache-Control', 'max-age=0'),
                              ('Connection', 'keep-alive')]

html = browser.open('https://scholar.google.com/citations?view_op=search_authors')
print html.read()

但是我看到的不是页面的源代码，而是这样的：

有什么问题以及如何解决？我需要使用 mechanize，因为稍后我需要玩这个页面。

Answer 1

您的代码适合我，但我会删除该行

('Accept-Encoding', 'gzip, deflate, sdch'),

之后不必反转该编码。澄清一下：您正在获取内容，但您希望它在 "clear text" 中。您可以通过不请求 gzip 压缩内容来获得明文。

Mechanize Python 页面下载不适用于 HTTPS

Mechanize Python page download does not work with HTTPS

python

mechanize

mechanize-python