最小化 scrapy 代码 - 循环 URL

Minimizing code for scrapy - looping over URLs

我正在使用 scrapy 来抓取 mma 战斗机统计数据,需要一些帮助来最小化我的这部分代码。链接从 A 到 Z,没有下一页按钮可以到达那里。

我确定有更好的方法,但我找不到。

start_urls = [
    'http://www.ufcstats.com/statistics/fighters?char=a&page=all',
    'http://www.ufcstats.com/statistics/fighters?char=b&page=all',
    'http://www.ufcstats.com/statistics/fighters?char=c&page=all',
    'http://www.ufcstats.com/statistics/fighters?char=d&page=all',
    'http://www.ufcstats.com/statistics/fighters?char=e&page=all',
    '....'

]

def start_requests(self):
    urls = [
        'http://www.ufcstats.com/statistics/fighters?char=a&page=all',
        'http://www.ufcstats.com/statistics/fighters?char=b&page=all',
        'http://www.ufcstats.com/statistics/fighters?char=c&page=all',
        'http://www.ufcstats.com/statistics/fighters?char=d&page=all',
        'http://www.ufcstats.com/statistics/fighters?char=e&page=all',
        '....'
    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)

你可以做类似...

def start_requests(self):
    links = []
    alphabet = "abcdefghijklmnopqrstuvwxyz"

    for letter in alphabet:
       link = "http://www.ufcstats.com/statistics/fighters?char=" + letter + "&page=all"
       links.append(link)

    for url in links:
        yield scrapy.Request(url=url, callback=self.parse)

Using Scrapy to Scrape Directory Websites | Generate 26 start urls

您可以使用string.ascii_lowercase来创建字母表