ImportError: No module named counselor.settings when using scrapy

Question

我的爬虫结构如下：

├── README.md
├── counselor
│   ├── filter_words.py
│   ├── items.py
│   ├── langconv.py
│   ├── main.py
│   ├── pipelines.py
│   ├── queue.py
│   ├── settings.py
│   ├── spiders
│   │   ├── __init__.py
│   │   └── wiki.py
│   └── zh_wiki.py
└── scrapy.cfg

我的main.py如下：

from scrapy import cmdline
cmdline.execute('scrapy crawl wikipieda_spider'.split())

我的counselor/spiders/wiki.py如下：

class WiKiSpider(scrapy.Spider):
    urlQueue = Queue()
    name = 'wikipieda_spider'
    allowed_domains = ['zh.wikipedia.org']
    start_urls = ['https://zh.wikipedia.org/wiki/Category:%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BC%96%E7%A8%8B']
    custom_settings = {
        'ITEM_PIPELINES': {'counselor.pipelines.WikiPipeline': 800}
    }
    ......

我的counselor/settings.py:

BOT_NAME = 'counselor'

SPIDER_MODULES = ['counselor.spiders']
NEWSPIDER_MODULE = 'counselor.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'counselor.pipelines.WikiPipeline': 800,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True

在项目根目录中，我有scrapy.cfg:

[settings]
default = counselor.settings

[deploy]
#url = http://localhost:6800/
project = counselor

现在我转到我的项目根目录（与 scrapy.cfg 相同的目录）并执行：

python counselor/main.py 
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/OpenSSL/crypto.py:14: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
  from cryptography import utils, x509
Traceback (most recent call last):
  File "counselor/main.py", line 2, in <module>
    cmdline.execute('scrapy crawl wikipieda_spider'.split())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/cmdline.py", line 114, in execute
    settings = get_project_settings()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/utils/project.py", line 69, in get_project_settings
    settings.setmodule(settings_module_path, priority='project')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 294, in setmodule
    module = import_module(module)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named counselor.settings

我的代码没有直接导入 counselor.settings。为什么会出现这个错误？

Answer 1

因为 scrapy 确实会根据您配置中的项目名称导入它。您需要做的就是通过添加 __init__.py 将您的“counselor”文件夹变成一个模块。它不需要任何内容；为了方便起见，您可以只添加一行 #。

ImportError: No module named counselor.settings when using scrapy

ImportError: No module named counselor.settings when using scrapy

python

web-crawler

scrapy