Python：如何使用 xpath 或 css 选择器提取排名列数据？

Question

我创建了一个 scrapy 蜘蛛来从以下 url“https://olympics.com/tokyo-2020/olympic-games/en/results/cycling-road/athlete-profile-n1346266-aalerud-katrine.htm”中提取数据，但我无法提取“排名”列中的数据。我使用了一个 for 循环来尝试获取数据，但我总是得到 'none' 值，但我不明白为什么。这是我的代码的一部分：

class OlympicSpider(scrapy.Spider):
    name = 'ath_spider'
    start_urls = [
        "https://olympics.com/tokyo-2020/olympic-games/en/results/cycling-road/athlete-profile-n1346266-aalerud-katrine.htm"
    ]

    custom_settings = {
        'FEED_FORMAT':'json',                                
        'FEED_URI': 'athletes_tokyo.json' 
    }
    def parse(self, response):
                               
        event = response.css('td > a.eventTagLink::text').getall()
        
        rank=[] 
        for x in range(1,len(event)+1):
            rank.append(response.xpath(
                '//main/div/div[1]/div[1]/div[2]/a[1]/div/table/tbody/tr[%s]/td[3]/text()' %x).get())
                    
        yield{
            'name' : response.css('h1::text').get().strip(),
            'noc' : response.css('div.playerTag::attr(country)').get(),
            'team' : response.css('a.country::text').get(),
            'sport' : response.css('div.container-fluid > div.row > a::text').get(),
            'sex' : response.xpath('//div/div[1]/div[1]/div[2]/div[1]/div/div[2]/div/div[3]/div[1]/div[3]/text()').extract()[-1].strip(),
            'age': response.xpath('//div/div[1]/div[1]/div[2]/div[1]/div/div[2]/div/div[3]/div[1]/div[2]/text()').extract()[-1].strip(), 
            'event':event,
            'rank':rank
        }

非常感谢您

Answer 1

获取排名值的 XPath 是

//table[@class='table table-schedule']//td[3]/text()

根据您的特定代码，它可能类似于

for x in range(1,len(event)+1):
    rank.append(response.xpath("(//table[@class='table table-schedule']//td[3])[" + str(x) + "]/text()").get())

Python：如何使用 xpath 或 css 选择器提取排名列数据？

Python: How I can extract the rank column data using xpath or css selectors?

python

xpath

css-selectors

scrapy