使用键时如何处理scrapy中的空字段

Question

我用 scrapy 做了一个蜘蛛，可以成功地从网站上抓取数据。

   def parse(self, response):
            for text in response.css('div.row'):
                yield {
                    'product': text.css('div.item a.item::text').get(),
                    'test1': text.css('div.item span::text')[0].get(),
                    'test2': text.css('div.item span::text')[1].get(),

这不是完整的代码，但这应该足以说明问题。

当 'test2': text.css('div.item span::text')[1].get(), 为空时出现问题。

会给出一个IndexError: list index out of range，这是有道理的。但是如何检查该值是否为空以便我可以将其替换为默认值？

我知道 get() 有一个默认参数 get(default='')，不幸的是因为我使用键 [0] 这个参数不可用。
我正在研究 ternary expressions 但我找不到在我认为是字典的内部执行此操作的方法。

Answer 1

首先得到items = text.css(...),

下一步检查 if len(items) > 0，然后再使用 items[0]
和 if len(items) > 1，然后再使用 items[1]

    def parse(self, response):
        for text in response.css('div.row'):
            items = text.css('div.item span::text')
            yield {
                'product': text.css('div.item a.item::text').get(),
                'test1': items[0].get() if len(items) > 0 else "",
                'test2': items[1].get() if len(items) > 1 else "",

编辑：

您也可以在 a.item:nth-of-type(1)::text

中使用 CSS :nth-of-type(1) 而不是 [0]

'div.item a.item:nth-of-type(1)::text'

或 xpath 与 [1]

'(.//div[@class="item"]/a[@class="item"])[1]/text()'

Scrapy 使用模块 parsel 所以我用 parsel

创建了最少的工作代码

text = '''
<div class="item">
<a class="item" href="a">a</a>
<a class="item" href="b">b</a>
</div>
'''

import parsel

s = parsel.Selector(text)

print(s.css('div.item a.item:nth-of-type(1)::text').get('empty')) # a
print(s.css('div.item a.item:nth-of-type(2)::text').get('empty')) # b
print(s.css('div.item a.item:nth-of-type(3)::text').get('empty')) # empty


print(s.xpath('(.//div[@class="item"]/a[@class="item"])[1]/text()').get('empty'))
print(s.xpath('(.//div[@class="item"]/a[@class="item"])[2]/text()').get('empty'))
print(s.xpath('(.//div[@class="item"]/a[@class="item"])[3]/text()').get('empty'))

使用键时如何处理scrapy中的空字段

How to deal with empty fields in scrapy when using keys

python

scrapy

web-scraping