Scrapy Spider 抓取错误数据

Question

我正在 scrapy 上构建我的第一个蜘蛛，它旨在从博彩网站和 return 比赛球队的名称和赔率中抓取数据。我正在使用 for 循环遍历包含所有所需数据的 class，但代码对第一场比赛的数据进行了 9 次 return（有 9 场比赛）。我做错了什么？

    import scrapy

class SportsBetSpider(scrapy.Spider):
    name = "odds"

    def start_requests(self):
        urls = [
            "https://www.sportsbet.com.au/betting/australian-rules/afl/round-3"
                ]
        for url in urls:
            yield scrapy.Request(url=url,callback=self.parse)

    def parse(self, response):
        for post in response.css('li.cardOuterItem_fn8ai8t'):
            yield{
                'Team 1' : post.xpath('//span[@class="size12_fq5j3k2 normal_fgzdi7m caption_f4zed5e"]/text()')[0].get(),
                'Odds 1' : post.xpath('//span[@class="size14_f7opyze medium_f1wf24vo priceTextSize_frw9zm9"]/text()')[0].get(),
                'Team 2' : post.xpath('//span[@class="size12_fq5j3k2 normal_fgzdi7m caption_f4zed5e"]/text()')[1].get(),
                'Odds 2' : post.xpath('//span[@class="size14_f7opyze medium_f1wf24vo priceTextSize_frw9zm9"]/text()')[1].get()
            }

输出为：

2020-06-14 20:41:38 [scrapy.core.engine] DEBUG: Crawled (200) https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> (referer: None) 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sportsbet.com.au/betting/australian-rules/afl/round-3> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.scraper] DEBUG: Scraped from <200 #Only 8 links allowed but same link as before> {'Team 1': 'Richmond', 'Odds 1': '1.36', 'Team 2': 'Hawthorn', 'Odds 2': '3.08'} 2020-06-14 20:41:38 [scrapy.core.engine] INFO: Closing spider (finished)

当有9个夹具时，它只显示第一个夹具的抓取数据，这是如何解决的？

Answer 1

我找到问题了。我使用的是绝对 xpath 而不是相对 xpath。我通过以下方式解决了这个问题：

yield{
            'Team 1' : post.xpath('.//span[@class="size12_fq5j3k2 normal_fgzdi7m caption_f4zed5e"]/text()')[0].get(),
            'Odds 1' : post.xpath('.//span[@class="size14_f7opyze medium_f1wf24vo priceTextSize_frw9zm9"]/text()')[0].get(),
            'Team 2' : post.xpath('.//span[@class="size12_fq5j3k2 normal_fgzdi7m caption_f4zed5e"]/text()')[1].get(),
            'Odds 2' : post.xpath('.//span[@class="size14_f7opyze medium_f1wf24vo priceTextSize_frw9zm9"]/text()')[1].get()
        }

使它 .// 而不是 // 解决了问题。

Scrapy Spider 抓取错误数据

Scrapy Spider scraping wrong data

parsing

web-crawler

css-selectors

scrapy