python 没有正确解析 lib return 网络源代码

Question

def parser(self):
    r = requests.get(self.url)
    self.soup = BeautifulSoup(r.content, "lxml")

但是当我打印 soup 时，我发现它与我真正想要的网络源代码不同。

例如，这是下面的网络源代码：

{div class="zh-question-followers-sidebar"}
{div class="zg-gray-normal"}

{a href="/question/24269892/followers"}{strong}109141{/strong}{/a}
people focus on the questions

{/div}

但是当我使用beautifulsoup得到xml时，它并没有那样显示代码。相反，它显示如下：

{div class="zm-side-section"}
{div class="zm-side-section-inner zg-gray-normal" id="zh-question-side-header-wrap"}
{button class="follow-button zg-follow zg-btn-green" data-follow="q:m:button" data-id="1889792"}focus question{/button}

109143
people focus on the questions

{/div}
{/div}

谁能告诉我为什么以及如何获得正确的源代码？

Answer 1

并非所有客户端都提供相同的页面。您应该将请求的用户代理设置为流行的桌面浏览器的用户代理：

headers = {'User-Agent': '''Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
                            AppleWebKit/537.36 (KHTML, like Gecko)  
                            Chrome/39.0.2171.95 Safari/537.36'''}

response = requests.get(url, headers=headers)

python 没有正确解析 lib return 网络源代码

python parse lib does not correctly return web source code

html

python

lxml

beautifulsoup

web-crawler