代码中带有反抓取保护的页面？

Question

我正在尝试从网页中提取信息，在处理 Xpath 帮助程序（chrome 扩展）时它完美地显示了内容，但是在将其用于 scrapy 时 returns“None”，或“空”：网站：https://cutt.ly/bjj3ohW 数字--NN 是它测试的形式。

我尝试过 Xpath (//*[@id="da_price"],//*[@id="da_price"]/text()), .get(''), .extract(), .get('').strip(), Css #da_price,#da_price::text,我还用了beautifulsoup和scrapy_splas手returns结果none还是空的。我仍然不想尝试使用selenium，因为链接数量很大。

Answer 1

您定位的元素可能是动态呈现的。我试过了并成功了，我的目标是在页面上降低价格。

import scrapy

class TestSpider(scrapy.Spider):
    name = 'testspider'

    def start_requests(self):
        return [scrapy.Request(
            url='https://cutt.ly/bjj3ohW',
        )]

    def parse(self, response):
        price = response.css('.price-final > strong::text').get()
        print(price)

测试它是否动态呈现的一个好方法是在 Chrome (F12) 中打开检查面板并查看“网络”选项卡。重新加载页面并查看第一个响应应该是 .html 文件。单击该文件，然后单击 Response。在那里你可以看到你可以在 Scrapy 中解析的 html 代码。单击 ctrl+F 并搜索您要解析的 CSS 选择器。

代码中带有反抓取保护的页面？

Page with anti-scraping protection in the code?

xpath

scrapy

python-3.x