select 的正确标签和属性是什么？

Question

我想抓取网站 (http://theschoolofkyiv.org/participants/220/dan-acostioaei) 以仅提取艺术家的姓名和传记。当我定义标签和属性时，它没有任何我想看到的文本。

我正在使用 scrapy 抓取网站。对于其他网站，它工作正常。我已经测试了我的代码，但似乎无法定义正确的标签或属性。你能看看我的代码吗？

这是我用来抓取网站的代码。（我不明白为什么Whosebug总是强制我输入不相关的文本。我已经解释了我想说的。）

import scrapy
from scrapy.selector import Selector
from artistlist.items import ArtistlistItem

class ArtistlistSpider(scrapy.Spider):
    name = "artistlist"
    allowed_domains = ["theschoolofkyiv.org"]
    start_urls = ['http://theschoolofkyiv.org/participants/220/dan-acostioaei']
    enter code here
    def parse(self, response):
        titles = response.xpath("//div[@id='participants']")
        for titles in titles:
            item = ArtistlistItem()
            item['artist'] = response.css('.ng-binding::text').extract()
            item['biography'] = response.css('p::text').extract()
            yield item

这是我得到的输出：

{'artist': [],
 'biography': ['\n                ',
               '\n                ',
               '\n            ',
               '\n                ',
               '\n                ',
               '\n            ']}

Answer 1

简单说明（假设您已经知道 Tony Montana 提到的 AJAX 请求）：

import scrapy
import re
import json
from artistlist.items import ArtistlistItem

class ArtistlistSpider(scrapy.Spider):
    name = "artistlist"
    allowed_domains = ["theschoolofkyiv.org"]
    start_urls = ['http://theschoolofkyiv.org/participants/220/dan-acostioaei']

    def parse(self, response):
        participant_id = re.search(r'participants/(\d+)', response.url).group(1)
        if participant_id:
            yield scrapy.Request(
                url="http://theschoolofkyiv.org/wordpress/wp-json/posts/{participant_id}".format(participant_id=participant_id),
                callback=self.parse_participant,
            )

    def parse_participant(self, response):
        data = json.loads(response.body)
        item = ArtistlistItem()
        item['artist'] = data["title"]
        item['biography'] = data["acf"]["en_participant_bio"]
        yield item

select 的正确标签和属性是什么？

What are the correct tags and properties to select?

web-crawler

scrapy