为什么 scrapy 为 "Title" 项返回 None?

Why is scrapy returning None for the "Title" item?

我正在尝试抓取 https://www.jobs.ch/de/stellenangebote/administration-hr-consulting-ceo/,我目前被困在那里,因为 scrapy returns None 用于“标题”项,即作业名称。 css 选择器在 shell 中工作正常,其他项目也工作。我曾尝试更改选择器或添加延迟,但似乎无济于事。有人有想法吗?下面的代码。

import scrapy
from jobscraping.items import JobscrapingItem


class GetdataSpider(scrapy.Spider):
    name = 'getdata2'
    start_urls = ['https://www.jobs.ch/de/stellenangebote/administration-hr-consulting-ceo/']

    def parse(self, response):
        for add in response.css('div.sc-AxiKw.VacancySerpItem__ShadowBox-qr45cp-0.hqhfbd'):
            item = JobscrapingItem()
            addpage = response.urljoin(add.css('div.sc-AxiKw.VacancySerpItem__ShadowBox-qr45cp-0.hqhfbd a::attr(href)').get())
            item['link'] = addpage


            request = scrapy.Request(addpage, callback=self.get_addinfos)
            request.meta['item'] = item
            yield request

    def get_addinfos(self, response):
        item = response.meta['item']
        item['Title'] = response.css('.sc-AxhUy.Text__h2-jiiyzm-1.eBKnmN.sc-fzqNJr.Text__span-jiiyzm-8.Text-jiiyzm-9.iNTZsv::text').get()
        item['Company'] = response.css('span.sc-fzqNJr.Text__span-jiiyzm-8.kGLBca.sc-fzqNJr.Text__span-jiiyzm-8.Text-jiiyzm-9.kjfvVS::text').get()
        item['Location'] = response.css('span.sc-fzqNJr.Text__span-jiiyzm-8.kGLBca.sc-fzqNJr.Text__span-jiiyzm-8.Text-jiiyzm-9.WBPTt::text').getall()
        yield item

这是 items.py 文件:

import scrapy


class JobscrapingItem(scrapy.Item):
    # define the fields for your item here like:
    link = scrapy.Field()
    Title = scrapy.Field()
    Company = scrapy.Field()
    Location = scrapy.Field()

您正在使用更复杂的 css 选择器。请记住,您不必总是使用 类 或 id。你可以像在这种情况下使用其他属性 data-cy="vacancy-title" 似乎是完美的。

item['Title'] = response.css('h1[data-cy="vacancy-title"]::text').get()

应该可以。简单易行,出错后调试修改。