Scrapy：response.xpath 打印 None，但点击网页链接后，xPath 是正确的

Question

我正在尝试打印出我要抓取的项目的 h1 标题。我试过打印结果 print(response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get()) 来自这样的产品 https://www.steinersports.com/football/tampa-bay-buccaneers/tom-brady-tampa-bay-buccaneers-super-bowl-lv-champions-autographed-white-nike-game-jersey-with-lv-mvp-inscription/o-8094+t-92602789+p-2679909745+z-8-2492872768?_ref=p-FALP:m-GRID:i-r20c0:po-60。

我不确定如何调试这个错误，因为当我点击返回的链接 none 并检查 xpath 时，它是正确的。感谢任何帮助，完整代码如下：

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http import Request


class SteinerSportsCrawlSpiderSpider(CrawlSpider):
    name = 'steinersports_crawl_spider'
    allowed_domains = ['steinersports.com']
    start_urls = [
        'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV',
        ]
    base_url = 'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV'



    rules = (

        
        Rule(LinkExtractor(allow=r'/signed'), follow=True), 
        Rule(LinkExtractor(allow=r'football/', deny=r'/signed'), callback='parse_item', follow=True),
        
    )

    def parse_item(self, response):
        item = {}
        description_flag = True
        price_flag = True
        item_description = response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[17]/div/div[2]/div').get()
        print(item)
        #item_price = response.xpath('//span[@class="product__price"]/text()').get()
        
        print(response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
        item['item_name'] = response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get()
        
        
        return item

Answer 1

您可以使用 data-talos 属性直接访问 h1 标签。这个 xpath 应该得到标题：

response.xpath("//h1[@data-talos='labelPdpProductTitle']/text()").extract_first()

Scrapy：response.xpath 打印 None，但点击网页链接后，xPath 是正确的

Scrapy: response.xpath prints None, but upon clicking into weblink, xPath is correct

python

xpath

scrapy