Scrapy xpath 不起作用 - 只能与 css-选择器结合使用?

Scrapy xpath not working - only in combination with css-selector?

我尝试用 scrapy 抓取以下站点并尝试用 scrapy shell -

这是基础蜘蛛:

import scrapy

class ZoosSpider(scrapy.Spider):
    name = 'zoos'
    allowed_domains = ['https://www.tripadvisor.co.uk/Attractions-g186216-Activities-c48-a_allAttractions.true-United_Kingdom.html']
    start_urls = ['http://https://www.tripadvisor.co.uk/Attractions-g186216-Activities-c48-a_allAttractions.true-United_Kingdom.html/']

    def parse(self, response):
        tmpSEC = response.xpath("//section[@data-automation='AppPresentation_SingleFlexCardSection']")
        for elem in tmpSEC:
          pass

我用这个 xpath 得到了所有相关部分: (当我尝试 len(tmpSEC) 我得到 30 这对我来说似乎没问题)

tmpSEC = response.xpath("//section[@data-automation='AppPresentation_SingleFlexCardSection']")

现在我想提取第一个 href-tag 并用这个 xpath 尝试它: (但结果我只得到“/”)

>>> tmpSEC[0].xpath("//a/@href").get()  
'/'

还有

>>> tmpSEC[0].xpath("(//a)[1]/@href").get()  
'/'

但只有使用 css 选择器才能正常工作

>>> tmpSEC[0].css("a::attr(href)").get() 
'/Attraction_Review-g186332-d216481-Reviews-Blackpool_Zoo-Blackpool_Lancashire_England.html'

为什么这只适用于 css-选择器而不适用于 xpath-选择器?

这是使用 xpath 的工作解决方案。您需要像下面这样注入点 (.):

import scrapy


class ZoosSpider(scrapy.Spider):
    name = 'zoos'
    
    start_urls = ['https://www.tripadvisor.co.uk/Attractions-g186216-Activities-c48-a_allAttractions.true-United_Kingdom.html/']

    def parse(self, response):
        tmpSEC = response.xpath(
            "//section[@data-automation='AppPresentation_SingleFlexCardSection']")
        #for elem in tmpSEC:
        yield {
            'link':tmpSEC[0].xpath(".//a/@href").get() 
            }   

输出:

{'link': '/Attraction_Review-g186332-d216481-Reviews-Blackpool_Zoo-Blackpool_Lancashire_England.html'}