Scrapy ImportError: No module named project.settings when using subprocess.Popen
Scrapy ImportError: No module named project.settings when using subprocess.Popen
我有 scrapy 爬虫通过网站抓取。在某些情况下,由于 RAM 问题,scrapy 会自杀。我重写了蜘蛛程序,使其可以拆分并为一个站点运行。
初次运行后,我使用subprocess.Popen用新的起始项再次提交scrapy爬虫。
但我收到错误
ImportError: No module named shop.settingsTraceback (most recent call last):
File "/home/kumar/envs/ishop/bin/scrapy", line 4, in <module> execute()
File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/cmdline.py", line 109, in execute settings = get_project_settings()
File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings settings.setmodule(settings_module_path, priority='project')
File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 109, in setmodule module = import_module(module)
File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module __import__(name)ImportError: No module named shop.settings
子进程cmd是
newp = Popen(comm, stderr=filename, stdout=filename, cwd=fp, shell=True)
通讯 -
source /home/kumar/envs/ishop/bin/activate && cd /home/kumar/projects/usg/shop/spiders/../.. && /home/kumar/envs/ishop/bin/scrapy crawl -a category=laptop -a site=newsite -a start=2 -a numpages=10 -a split=1 'allsitespider'
cwd - /home/kumar/projects/usg
我检查了 sys.path 是正确的 ['/home/kumar/envs/ishop/bin', '/home/kumar/envs/ishop/lib64/python27.zip', '/home/kumar/envs/ishop/lib64/python2.7', '/home/kumar/envs/ishop/lib64/python2.7/plat-linux2', '/home/kumar/envs/ishop/lib64/python2.7/lib-tk', '/home/kumar/envs/ishop/lib64/python2.7/lib-old', '/home/kumar/envs/ishop/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7', '/usr/lib/python2.7', '/home/kumar/envs/ishop/lib/python2.7/site-packages']
但看起来导入语句使用的是 "/usr/lib64/python2.7/importlib/__init__.py"
而不是我的虚拟环境。
我哪里错了?请帮忙?
我建议让 python 专注于抓取任务并使用其他东西进行过程控制。如果是我,我会为 运行 您的程序编写一个 bash 小脚本。
通过 运行 将其与 env -i yourscript.sh
结合来测试启动器脚本是否正常工作,因为这将确保它 运行 没有任何继承的环境设置。
一旦 bash 脚本正常工作,包括设置 virtualenv 等,您可以 python 运行 bash 脚本,而不是 python。那时您已经回避了任何奇怪的环境问题,并为自己准备了一个非常可靠的启动器脚本。
更好的是,考虑到此时您拥有 bash 脚本,使用 "proper" 进程控制器(daemontools、supervisor...)启动进程、在崩溃时重新启动等。
看起来设置没有正确加载。一种解决方案是构建一个 egg 并将其部署在 env 中,然后再启动爬虫。
我有 scrapy 爬虫通过网站抓取。在某些情况下,由于 RAM 问题,scrapy 会自杀。我重写了蜘蛛程序,使其可以拆分并为一个站点运行。
初次运行后,我使用subprocess.Popen用新的起始项再次提交scrapy爬虫。
但我收到错误
ImportError: No module named shop.settingsTraceback (most recent call last):
File "/home/kumar/envs/ishop/bin/scrapy", line 4, in <module> execute()
File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/cmdline.py", line 109, in execute settings = get_project_settings()
File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings settings.setmodule(settings_module_path, priority='project')
File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 109, in setmodule module = import_module(module)
File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module __import__(name)ImportError: No module named shop.settings
子进程cmd是
newp = Popen(comm, stderr=filename, stdout=filename, cwd=fp, shell=True)
通讯 -
source /home/kumar/envs/ishop/bin/activate && cd /home/kumar/projects/usg/shop/spiders/../.. && /home/kumar/envs/ishop/bin/scrapy crawl -a category=laptop -a site=newsite -a start=2 -a numpages=10 -a split=1 'allsitespider'
cwd - /home/kumar/projects/usg
我检查了 sys.path 是正确的 ['/home/kumar/envs/ishop/bin', '/home/kumar/envs/ishop/lib64/python27.zip', '/home/kumar/envs/ishop/lib64/python2.7', '/home/kumar/envs/ishop/lib64/python2.7/plat-linux2', '/home/kumar/envs/ishop/lib64/python2.7/lib-tk', '/home/kumar/envs/ishop/lib64/python2.7/lib-old', '/home/kumar/envs/ishop/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7', '/usr/lib/python2.7', '/home/kumar/envs/ishop/lib/python2.7/site-packages']
但看起来导入语句使用的是 "/usr/lib64/python2.7/importlib/__init__.py"
而不是我的虚拟环境。
我哪里错了?请帮忙?
我建议让 python 专注于抓取任务并使用其他东西进行过程控制。如果是我,我会为 运行 您的程序编写一个 bash 小脚本。
通过 运行 将其与 env -i yourscript.sh
结合来测试启动器脚本是否正常工作,因为这将确保它 运行 没有任何继承的环境设置。
一旦 bash 脚本正常工作,包括设置 virtualenv 等,您可以 python 运行 bash 脚本,而不是 python。那时您已经回避了任何奇怪的环境问题,并为自己准备了一个非常可靠的启动器脚本。
更好的是,考虑到此时您拥有 bash 脚本,使用 "proper" 进程控制器(daemontools、supervisor...)启动进程、在崩溃时重新启动等。
看起来设置没有正确加载。一种解决方案是构建一个 egg 并将其部署在 env 中,然后再启动爬虫。