ImportError: No module named counselor.settings when using scrapy
ImportError: No module named counselor.settings when using scrapy
我的爬虫结构如下:
├── README.md
├── counselor
│ ├── filter_words.py
│ ├── items.py
│ ├── langconv.py
│ ├── main.py
│ ├── pipelines.py
│ ├── queue.py
│ ├── settings.py
│ ├── spiders
│ │ ├── __init__.py
│ │ └── wiki.py
│ └── zh_wiki.py
└── scrapy.cfg
我的main.py如下:
from scrapy import cmdline
cmdline.execute('scrapy crawl wikipieda_spider'.split())
我的counselor/spiders/wiki.py如下:
class WiKiSpider(scrapy.Spider):
urlQueue = Queue()
name = 'wikipieda_spider'
allowed_domains = ['zh.wikipedia.org']
start_urls = ['https://zh.wikipedia.org/wiki/Category:%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BC%96%E7%A8%8B']
custom_settings = {
'ITEM_PIPELINES': {'counselor.pipelines.WikiPipeline': 800}
}
......
我的counselor/settings.py:
BOT_NAME = 'counselor'
SPIDER_MODULES = ['counselor.spiders']
NEWSPIDER_MODULE = 'counselor.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'counselor.pipelines.WikiPipeline': 800,
}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
在项目根目录中,我有scrapy.cfg:
[settings]
default = counselor.settings
[deploy]
#url = http://localhost:6800/
project = counselor
现在我转到我的项目根目录(与 scrapy.cfg 相同的目录)并执行:
python counselor/main.py
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/OpenSSL/crypto.py:14: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
from cryptography import utils, x509
Traceback (most recent call last):
File "counselor/main.py", line 2, in <module>
cmdline.execute('scrapy crawl wikipieda_spider'.split())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/cmdline.py", line 114, in execute
settings = get_project_settings()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/utils/project.py", line 69, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 294, in setmodule
module = import_module(module)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named counselor.settings
我的代码没有直接导入 counselor.settings。为什么会出现这个错误?
因为 scrapy 确实会根据您配置中的项目名称导入它。您需要做的就是通过添加 __init__.py
将您的“counselor”文件夹变成一个模块。它不需要任何内容;为了方便起见,您可以只添加一行 #
。
我的爬虫结构如下:
├── README.md
├── counselor
│ ├── filter_words.py
│ ├── items.py
│ ├── langconv.py
│ ├── main.py
│ ├── pipelines.py
│ ├── queue.py
│ ├── settings.py
│ ├── spiders
│ │ ├── __init__.py
│ │ └── wiki.py
│ └── zh_wiki.py
└── scrapy.cfg
我的main.py如下:
from scrapy import cmdline
cmdline.execute('scrapy crawl wikipieda_spider'.split())
我的counselor/spiders/wiki.py如下:
class WiKiSpider(scrapy.Spider):
urlQueue = Queue()
name = 'wikipieda_spider'
allowed_domains = ['zh.wikipedia.org']
start_urls = ['https://zh.wikipedia.org/wiki/Category:%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BC%96%E7%A8%8B']
custom_settings = {
'ITEM_PIPELINES': {'counselor.pipelines.WikiPipeline': 800}
}
......
我的counselor/settings.py:
BOT_NAME = 'counselor'
SPIDER_MODULES = ['counselor.spiders']
NEWSPIDER_MODULE = 'counselor.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'counselor.pipelines.WikiPipeline': 800,
}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
在项目根目录中,我有scrapy.cfg:
[settings]
default = counselor.settings
[deploy]
#url = http://localhost:6800/
project = counselor
现在我转到我的项目根目录(与 scrapy.cfg 相同的目录)并执行:
python counselor/main.py
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/OpenSSL/crypto.py:14: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
from cryptography import utils, x509
Traceback (most recent call last):
File "counselor/main.py", line 2, in <module>
cmdline.execute('scrapy crawl wikipieda_spider'.split())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/cmdline.py", line 114, in execute
settings = get_project_settings()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/utils/project.py", line 69, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 294, in setmodule
module = import_module(module)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named counselor.settings
我的代码没有直接导入 counselor.settings。为什么会出现这个错误?
因为 scrapy 确实会根据您配置中的项目名称导入它。您需要做的就是通过添加 __init__.py
将您的“counselor”文件夹变成一个模块。它不需要任何内容;为了方便起见,您可以只添加一行 #
。