Scrapy + Python，返回多项，发布阅读页

Question

我正在尝试使用带有 python 的 Scrapy 将多个项目提取到数据库中。为了构建我的代码，我使用 Scrapy shell 首先读取页面并测试与数据提取相关的代码行。

scrapy shell "http://www.goodmans.net/d/1706/brands.htm"

我尝试了下面的函数，得到了我想要的结果（提取了所有品牌）

response.css('.SubDepartments a::text').extract()

然后我用 scrapy crawl goodmans 构建了代码，运行它给了我一个错误：

import scrapy
import pandas as pd
class GoodmanSpider(scrapy.Spider):
    name = "goodmans" 
    start_urls = ['http://www.goodmans.net/d/1706/brands.htm']

    def parse(self, response):
        category = response.css('.SubDepartments a::text').extract() 
        category_url = response.css('.SubDepartments a::attr(href)').extract()
        yield {'Category': category, 'url': categoy_url}

Answer 1

错误的有趣部分在您的屏幕截图中不可见。最后一行说：

... line 10, in parse
       yield {'Category': category, 'url': categoy_url}
    NameError: name 'categoy_url' is not defined

所以，一个简单的拼写错误:)

Scrapy + Python，返回多项，发布阅读页

Scrapy + Python, returning multiple items, issue reading page

python

web-crawler

scrapy