用 Scrapy 保留/替换 getall() 中的空值

Keep / Replace empty values in getall() with Scrapy

我想从网站上抓取一些元素,我必须保持值的顺序。 例如:

def parse(self, response):
    id_num = response.css('td:nth-child(1)::text').getall()
    issued_at = response.css(
        '.align-center.xcrud-current::text').getall()
    exchange = response.css(
        '.xcrud-current+ .align-center::text').getall()
    base_currency = response.css(
        '.align-center:nth-child(4)::text').getall()
    coin = response.css(
        '.align-center:nth-child(5)::text').getall()
    direction = response.css(
        '.align-center:nth-child(6)::text').getall()
    ask = response.css(
        '.align-right:nth-child(7)::text').getall()
    target = response.css(
        '.align-right:nth-child(8)::text').getall()
    highest = response.css(
        '.align-right:nth-child(9)::text').getall()
    lowest = response.css(
        '.align-right:nth-child(10)::text').getall()
    status = response.css(
        'td:nth-child(11)::text').getall()
    close_time = response.css(
        '.align-right~ .align-center::text').getall()
    dca_level = response.css(
        '.align-right:nth-child(13)::text').getall()

    for id_num, issued_at, exchange, base_currency, coin, direction, ask, target, highest, lowest, status, close_time, dca_level in\
            zip(id_num, issued_at, exchange, base_currency, coin, direction, ask, target, highest, lowest, status, close_time, dca_level):

        yield{
            'Id': id_num,
            'Issued At': issued_at,
            'Exchange': exchange,
            'Base Currency': base_currency,
            'Coin': coin,
            'Direction': direction,
            'Ask': ask,
            'Target': target,
            'Highest': highest,
            'Lowest': lowest,
            'Status': status,
            'Close Time': close_time,
            'DCA Level': dca_level
        }

基本上,ID 是正确的,因为它们都存在,而 close_time 并不总是存在,因此输出 CSV 是 truncated.If 我不使用 ::text,元素都拿走了

例如:

Id,Issued At,Exchange,Base Currency,Coin,Direction,Ask,Target,Highest,Lowest,Status,Close Time,DCA Level
499762,01/12/2020 08:46:40,binance,USDT,CTK,LONG,1.208900000000,1.231802400000,9.975000000000,9.927000000000,open,01/12/2020 08:25:00,0
499837,01/12/2020 08:46:17,kraken,USD,AUD,LONG,0.737670000000,0.745784370000,0.000003860000,0.000003840000,open,01/12/2020 08:30:00,0

我想要的是保留/替换空值。

您需要重写 parse 回调以处理单个项目:

def parse(self, response):
    for item in response.css('your_epxression to_get list_of_items'):
        id_num = item.css('td:nth-child(1)::text').get()
        issued_at = item.css(
        '.align-center.xcrud-current::text').get()
        ...
        yield {'Id': id_num, 'Issued At': issued_at, ...}