Scrapy:response.xpath 打印 None,但点击网页链接后,xPath 是正确的
Scrapy: response.xpath prints None, but upon clicking into weblink, xPath is correct
我正在尝试打印出我要抓取的项目的 h1 标题。我试过打印结果
print(response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
来自这样的产品 https://www.steinersports.com/football/tampa-bay-buccaneers/tom-brady-tampa-bay-buccaneers-super-bowl-lv-champions-autographed-white-nike-game-jersey-with-lv-mvp-inscription/o-8094+t-92602789+p-2679909745+z-8-2492872768?_ref=p-FALP:m-GRID:i-r20c0:po-60。
我不确定如何调试这个错误,因为当我点击返回的链接 none 并检查 xpath 时,它是正确的。感谢任何帮助,完整代码如下:
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http import Request
class SteinerSportsCrawlSpiderSpider(CrawlSpider):
name = 'steinersports_crawl_spider'
allowed_domains = ['steinersports.com']
start_urls = [
'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV',
]
base_url = 'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV'
rules = (
Rule(LinkExtractor(allow=r'/signed'), follow=True),
Rule(LinkExtractor(allow=r'football/', deny=r'/signed'), callback='parse_item', follow=True),
)
def parse_item(self, response):
item = {}
description_flag = True
price_flag = True
item_description = response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[17]/div/div[2]/div').get()
print(item)
#item_price = response.xpath('//span[@class="product__price"]/text()').get()
print(response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
item['item_name'] = response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get()
return item
您可以使用 data-talos
属性直接访问 h1
标签。这个 xpath 应该得到标题:
response.xpath("//h1[@data-talos='labelPdpProductTitle']/text()").extract_first()
我正在尝试打印出我要抓取的项目的 h1 标题。我试过打印结果
print(response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
来自这样的产品 https://www.steinersports.com/football/tampa-bay-buccaneers/tom-brady-tampa-bay-buccaneers-super-bowl-lv-champions-autographed-white-nike-game-jersey-with-lv-mvp-inscription/o-8094+t-92602789+p-2679909745+z-8-2492872768?_ref=p-FALP:m-GRID:i-r20c0:po-60。
我不确定如何调试这个错误,因为当我点击返回的链接 none 并检查 xpath 时,它是正确的。感谢任何帮助,完整代码如下:
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http import Request
class SteinerSportsCrawlSpiderSpider(CrawlSpider):
name = 'steinersports_crawl_spider'
allowed_domains = ['steinersports.com']
start_urls = [
'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV',
]
base_url = 'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV'
rules = (
Rule(LinkExtractor(allow=r'/signed'), follow=True),
Rule(LinkExtractor(allow=r'football/', deny=r'/signed'), callback='parse_item', follow=True),
)
def parse_item(self, response):
item = {}
description_flag = True
price_flag = True
item_description = response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[17]/div/div[2]/div').get()
print(item)
#item_price = response.xpath('//span[@class="product__price"]/text()').get()
print(response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
item['item_name'] = response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get()
return item
您可以使用 data-talos
属性直接访问 h1
标签。这个 xpath 应该得到标题:
response.xpath("//h1[@data-talos='labelPdpProductTitle']/text()").extract_first()