启用 运行 Scrapy 项目
Enable to Run Scrapy Project
我对 Scrapy 很陌生,我使用这一行设置我的项目 "scrapy startproject tutorials"
在终端中,我使用 Visual Studio 代码。
我检查过:
- 我的 属性 的名字正是我正在呼叫的人。
- 我的
scrapy.cfg
与我的脚本在同一路径上。
- 我检查了SPIDER_MODULES和NEWSPIDER_MODULE在[蜘蛛> setting.py]
中写得很好
这是我的代码:
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'quotes'
start_urls = [
'http://quotes.toscrape.com/'
]
def parse(self, response):
title = response.css('title').extract()
yield {'titleText' : title}
我的settings.py
BOT_NAME = 'quotes'
SPIDER_MODULES = ['tutorials.spiders']
NEWSPIDER_MODULE = 'tutorials.spiders'
这就是我运行的处理方式:
scrapy crawl quotes
我仍然无法 运行 抓取工具。有什么问题吗?谢谢。
编辑:
我收到的错误消息:
C:\Users\Mohamed\Desktop\python 1\test python\Solution Test - ALIOUA WALID\tutorials>scrapy crawl quotes
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: quotes)
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.2, w3lib 1.20.0, Twisted 19.10.0, Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-7-6.1.7601-SP1
Traceback (most recent call last):
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 69, in load
return self._spiders[spider_name]
KeyError: 'quotes'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Mohamed\AppData\Local\Programs\Python\Python36\Scripts\scrapy.exe\__main__.py", line 7, in <module>
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 146, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 154, in _run_command
cmd.run(args, opts)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\commands\crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 183, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 216, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 220, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: quotes'
我不是 Windows 用户或 Python 专家,所以我不会尝试详细调试您的路径等,但使用您发布的代码,即使您确实修复了你的路径并生成了蜘蛛,它仍然不会 "crawl" 一个网站,因为你没有任何机制让它找到并跟踪指向其他 URL 的链接以进行抓取。
当你写 "crawl" 我假设你的意思是多页,如果你只想要一页,我希望你使用像 "fetch" 或 "parse" 这样的术语(或获取然后解析)。
正如其他人指出的那样,尝试 genspider
但也添加爬网模板的参数...如果没记错的话,它类似于 scrapy genspider -t crawl quotes quotes.toscrape.com
这将为您提供一个带有内置回调的蜘蛛模板,用于查找和抓取其他 URL。
我对 Scrapy 很陌生,我使用这一行设置我的项目 "scrapy startproject tutorials"
在终端中,我使用 Visual Studio 代码。
我检查过:
- 我的 属性 的名字正是我正在呼叫的人。
- 我的
scrapy.cfg
与我的脚本在同一路径上。 - 我检查了SPIDER_MODULES和NEWSPIDER_MODULE在[蜘蛛> setting.py] 中写得很好
这是我的代码:
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'quotes'
start_urls = [
'http://quotes.toscrape.com/'
]
def parse(self, response):
title = response.css('title').extract()
yield {'titleText' : title}
我的settings.py
BOT_NAME = 'quotes'
SPIDER_MODULES = ['tutorials.spiders']
NEWSPIDER_MODULE = 'tutorials.spiders'
这就是我运行的处理方式:
scrapy crawl quotes
我仍然无法 运行 抓取工具。有什么问题吗?谢谢。
编辑:
我收到的错误消息:
C:\Users\Mohamed\Desktop\python 1\test python\Solution Test - ALIOUA WALID\tutorials>scrapy crawl quotes
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: quotes)
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.2, w3lib 1.20.0, Twisted 19.10.0, Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-7-6.1.7601-SP1
Traceback (most recent call last):
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 69, in load
return self._spiders[spider_name]
KeyError: 'quotes'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Mohamed\AppData\Local\Programs\Python\Python36\Scripts\scrapy.exe\__main__.py", line 7, in <module>
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 146, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 154, in _run_command
cmd.run(args, opts)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\commands\crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 183, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 216, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 220, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: quotes'
我不是 Windows 用户或 Python 专家,所以我不会尝试详细调试您的路径等,但使用您发布的代码,即使您确实修复了你的路径并生成了蜘蛛,它仍然不会 "crawl" 一个网站,因为你没有任何机制让它找到并跟踪指向其他 URL 的链接以进行抓取。
当你写 "crawl" 我假设你的意思是多页,如果你只想要一页,我希望你使用像 "fetch" 或 "parse" 这样的术语(或获取然后解析)。
正如其他人指出的那样,尝试 genspider
但也添加爬网模板的参数...如果没记错的话,它类似于 scrapy genspider -t crawl quotes quotes.toscrape.com
这将为您提供一个带有内置回调的蜘蛛模板,用于查找和抓取其他 URL。