最小化 scrapy 代码 - 循环 URL
Minimizing code for scrapy - looping over URLs
我正在使用 scrapy 来抓取 mma 战斗机统计数据,需要一些帮助来最小化我的这部分代码。链接从 A 到 Z,没有下一页按钮可以到达那里。
我确定有更好的方法,但我找不到。
start_urls = [
'http://www.ufcstats.com/statistics/fighters?char=a&page=all',
'http://www.ufcstats.com/statistics/fighters?char=b&page=all',
'http://www.ufcstats.com/statistics/fighters?char=c&page=all',
'http://www.ufcstats.com/statistics/fighters?char=d&page=all',
'http://www.ufcstats.com/statistics/fighters?char=e&page=all',
'....'
]
def start_requests(self):
urls = [
'http://www.ufcstats.com/statistics/fighters?char=a&page=all',
'http://www.ufcstats.com/statistics/fighters?char=b&page=all',
'http://www.ufcstats.com/statistics/fighters?char=c&page=all',
'http://www.ufcstats.com/statistics/fighters?char=d&page=all',
'http://www.ufcstats.com/statistics/fighters?char=e&page=all',
'....'
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
你可以做类似...
def start_requests(self):
links = []
alphabet = "abcdefghijklmnopqrstuvwxyz"
for letter in alphabet:
link = "http://www.ufcstats.com/statistics/fighters?char=" + letter + "&page=all"
links.append(link)
for url in links:
yield scrapy.Request(url=url, callback=self.parse)
见Using Scrapy to Scrape Directory Websites | Generate 26 start urls
您可以使用string.ascii_lowercase来创建字母表
我正在使用 scrapy 来抓取 mma 战斗机统计数据,需要一些帮助来最小化我的这部分代码。链接从 A 到 Z,没有下一页按钮可以到达那里。
我确定有更好的方法,但我找不到。
start_urls = [
'http://www.ufcstats.com/statistics/fighters?char=a&page=all',
'http://www.ufcstats.com/statistics/fighters?char=b&page=all',
'http://www.ufcstats.com/statistics/fighters?char=c&page=all',
'http://www.ufcstats.com/statistics/fighters?char=d&page=all',
'http://www.ufcstats.com/statistics/fighters?char=e&page=all',
'....'
]
def start_requests(self):
urls = [
'http://www.ufcstats.com/statistics/fighters?char=a&page=all',
'http://www.ufcstats.com/statistics/fighters?char=b&page=all',
'http://www.ufcstats.com/statistics/fighters?char=c&page=all',
'http://www.ufcstats.com/statistics/fighters?char=d&page=all',
'http://www.ufcstats.com/statistics/fighters?char=e&page=all',
'....'
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
你可以做类似...
def start_requests(self):
links = []
alphabet = "abcdefghijklmnopqrstuvwxyz"
for letter in alphabet:
link = "http://www.ufcstats.com/statistics/fighters?char=" + letter + "&page=all"
links.append(link)
for url in links:
yield scrapy.Request(url=url, callback=self.parse)
见Using Scrapy to Scrape Directory Websites | Generate 26 start urls
您可以使用string.ascii_lowercase来创建字母表