Scrapy--Can not import the items to my spider (没有模块名称behance.items)
Scrapy--Can not import the items to my spider (No module name behance.items)
我是 scrapy 的新手,当 运行让蜘蛛爬行时 behance
import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse
from scrapy.crawler import CrawlerProcess
class DmozSpider(scrapy.Spider):
name = "behance"
#allowed_domains = ["behance.com"]
start_urls = [
"https://www.behance.net/gallery/29535305/Mind-Your-Monsters",
]
def __init__ (self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
item = BehanceItem()
hxs = Selector(response)
item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()
yield item
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(DmozSpider)
process.start()
当我 运行 我的爬虫
时,命令行出现以下错误
回溯(最后一次调用):
文件“/home/davy/behance/behance/spiders/behance_spider.py”,第 3 行,位于
从 behance.items 导入 BehanceItem
导入错误:没有名为 behance.items
的模块
我的目录结构:
behance/
├── behance
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── behance_spider.py
-── scrapy.cfg
尝试 运行使用此命令来控制您的蜘蛛:
scrapy crawl behance
或更改您的蜘蛛文件:
import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse
from scrapy.crawler import CrawlerProcess
class BehanceSpider(scrapy.Spider):
name = "behance"
allowed_domains = ["behance.com"]
start_urls = [
"https://www.behance.net/gallery/29535305/Mind-Your-Monsters",
]
def __init__ (self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
item = BehanceItem()
hxs = Selector(response)
item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()
yield item
并在 settings.py
文件所在的目录中创建另一个 python 文件。
run.py
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl("behance")
process.start()
现在 运行 这个文件作为你 运行 正常的 python 脚本。 python run.py
您可以将其添加到您的 python 路径:
export PYTHONPATH=$PYTHONPATH:/home/davy/behance/
我是 scrapy 的新手,当 运行让蜘蛛爬行时 behance
import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse
from scrapy.crawler import CrawlerProcess
class DmozSpider(scrapy.Spider):
name = "behance"
#allowed_domains = ["behance.com"]
start_urls = [
"https://www.behance.net/gallery/29535305/Mind-Your-Monsters",
]
def __init__ (self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
item = BehanceItem()
hxs = Selector(response)
item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()
yield item
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(DmozSpider)
process.start()
当我 运行 我的爬虫
时,命令行出现以下错误回溯(最后一次调用): 文件“/home/davy/behance/behance/spiders/behance_spider.py”,第 3 行,位于 从 behance.items 导入 BehanceItem
导入错误:没有名为 behance.items
的模块我的目录结构:
behance/
├── behance
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── behance_spider.py
-── scrapy.cfg
尝试 运行使用此命令来控制您的蜘蛛:
scrapy crawl behance
或更改您的蜘蛛文件:
import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse
from scrapy.crawler import CrawlerProcess
class BehanceSpider(scrapy.Spider):
name = "behance"
allowed_domains = ["behance.com"]
start_urls = [
"https://www.behance.net/gallery/29535305/Mind-Your-Monsters",
]
def __init__ (self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
item = BehanceItem()
hxs = Selector(response)
item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()
yield item
并在 settings.py
文件所在的目录中创建另一个 python 文件。
run.py
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl("behance")
process.start()
现在 运行 这个文件作为你 运行 正常的 python 脚本。 python run.py
您可以将其添加到您的 python 路径:
export PYTHONPATH=$PYTHONPATH:/home/davy/behance/