Scrapy:即使定义了 Key,Item Loader 和 KeyError
Scrapy: Item Loader and KeyError even when Key is defined
意图/预期行为
Return 来自页面的链接文本:https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha
CSV 格式和 shell。
错误
我收到 KeyError:'title',即使我已经在 item.py 项目加载器中定义了密钥。
完整追溯
Traceback (most recent call last):
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\phili\Documents\Python Scripts\Scrapy Spiders\bezrealitky\bezrealitky\spiders\bezrealitky_spider.py", line 33, in parse
yield loader.load_item()
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 115, in load_item
value = self.get_output_value(field_name)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 122, in get_output_value
proc = self.get_output_processor(field_name)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 144, in get_output_processor
self.default_output_processor)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 154, in _get_item_field_attr
value = self.item.fields[field_name].get(key, default)
KeyError: 'title'
Spider.py
def parse(self, response):
for records in response.xpath('//*[starts-with(@class,"record")]'):
loader = BaseItemLoader(selector=records)
loader.add_xpath('title', './/div[@class="details"]/h2/a[@href]/text()')
yield loader.load_item()
Item.py - 物品加载器
class BaseItemLoader(ItemLoader):
title_in = MapCompose(unidecode)
结论
我有点不知所措,因为我认为我遵循了 Scrapy 手册并通过 "title_in" 定义了项目加载器和密钥,但是当我将值赋予它时,我得到了 KeyError。我检查了 Xpath 提供我想要的文本的 shell,所以至少它是有效的。希望得到一些帮助!
即使您使用 ItemLoader
,您也应该先定义 Item
class,然后将其传递给项目加载器,或者将其定义为加载器的 属性:
class CustomItemLoader(ItemLoader):
default_item_class = MyItem
或将其实例传递给加载程序的构造函数:
l = CustomItemLoader(item=Item())
否则项目加载器对项目及其字段一无所知。
意图/预期行为
Return 来自页面的链接文本:https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha
CSV 格式和 shell。
错误
我收到 KeyError:'title',即使我已经在 item.py 项目加载器中定义了密钥。
完整追溯
Traceback (most recent call last):
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\phili\Documents\Python Scripts\Scrapy Spiders\bezrealitky\bezrealitky\spiders\bezrealitky_spider.py", line 33, in parse
yield loader.load_item()
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 115, in load_item
value = self.get_output_value(field_name)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 122, in get_output_value
proc = self.get_output_processor(field_name)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 144, in get_output_processor
self.default_output_processor)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 154, in _get_item_field_attr
value = self.item.fields[field_name].get(key, default)
KeyError: 'title'
Spider.py
def parse(self, response):
for records in response.xpath('//*[starts-with(@class,"record")]'):
loader = BaseItemLoader(selector=records)
loader.add_xpath('title', './/div[@class="details"]/h2/a[@href]/text()')
yield loader.load_item()
Item.py - 物品加载器
class BaseItemLoader(ItemLoader):
title_in = MapCompose(unidecode)
结论
我有点不知所措,因为我认为我遵循了 Scrapy 手册并通过 "title_in" 定义了项目加载器和密钥,但是当我将值赋予它时,我得到了 KeyError。我检查了 Xpath 提供我想要的文本的 shell,所以至少它是有效的。希望得到一些帮助!
即使您使用 ItemLoader
,您也应该先定义 Item
class,然后将其传递给项目加载器,或者将其定义为加载器的 属性:
class CustomItemLoader(ItemLoader):
default_item_class = MyItem
或将其实例传递给加载程序的构造函数:
l = CustomItemLoader(item=Item())
否则项目加载器对项目及其字段一无所知。