用 Scrapy 保留/替换 getall() 中的空值
Keep / Replace empty values in getall() with Scrapy
我想从网站上抓取一些元素,我必须保持值的顺序。
例如:
def parse(self, response):
id_num = response.css('td:nth-child(1)::text').getall()
issued_at = response.css(
'.align-center.xcrud-current::text').getall()
exchange = response.css(
'.xcrud-current+ .align-center::text').getall()
base_currency = response.css(
'.align-center:nth-child(4)::text').getall()
coin = response.css(
'.align-center:nth-child(5)::text').getall()
direction = response.css(
'.align-center:nth-child(6)::text').getall()
ask = response.css(
'.align-right:nth-child(7)::text').getall()
target = response.css(
'.align-right:nth-child(8)::text').getall()
highest = response.css(
'.align-right:nth-child(9)::text').getall()
lowest = response.css(
'.align-right:nth-child(10)::text').getall()
status = response.css(
'td:nth-child(11)::text').getall()
close_time = response.css(
'.align-right~ .align-center::text').getall()
dca_level = response.css(
'.align-right:nth-child(13)::text').getall()
for id_num, issued_at, exchange, base_currency, coin, direction, ask, target, highest, lowest, status, close_time, dca_level in\
zip(id_num, issued_at, exchange, base_currency, coin, direction, ask, target, highest, lowest, status, close_time, dca_level):
yield{
'Id': id_num,
'Issued At': issued_at,
'Exchange': exchange,
'Base Currency': base_currency,
'Coin': coin,
'Direction': direction,
'Ask': ask,
'Target': target,
'Highest': highest,
'Lowest': lowest,
'Status': status,
'Close Time': close_time,
'DCA Level': dca_level
}
基本上,ID 是正确的,因为它们都存在,而 close_time 并不总是存在,因此输出 CSV 是 truncated.If 我不使用 ::text,元素都拿走了
例如:
Id,Issued At,Exchange,Base Currency,Coin,Direction,Ask,Target,Highest,Lowest,Status,Close Time,DCA Level
499762,01/12/2020 08:46:40,binance,USDT,CTK,LONG,1.208900000000,1.231802400000,9.975000000000,9.927000000000,open,01/12/2020 08:25:00,0
499837,01/12/2020 08:46:17,kraken,USD,AUD,LONG,0.737670000000,0.745784370000,0.000003860000,0.000003840000,open,01/12/2020 08:30:00,0
我想要的是保留/替换空值。
您需要重写 parse
回调以处理单个项目:
def parse(self, response):
for item in response.css('your_epxression to_get list_of_items'):
id_num = item.css('td:nth-child(1)::text').get()
issued_at = item.css(
'.align-center.xcrud-current::text').get()
...
yield {'Id': id_num, 'Issued At': issued_at, ...}
我想从网站上抓取一些元素,我必须保持值的顺序。 例如:
def parse(self, response):
id_num = response.css('td:nth-child(1)::text').getall()
issued_at = response.css(
'.align-center.xcrud-current::text').getall()
exchange = response.css(
'.xcrud-current+ .align-center::text').getall()
base_currency = response.css(
'.align-center:nth-child(4)::text').getall()
coin = response.css(
'.align-center:nth-child(5)::text').getall()
direction = response.css(
'.align-center:nth-child(6)::text').getall()
ask = response.css(
'.align-right:nth-child(7)::text').getall()
target = response.css(
'.align-right:nth-child(8)::text').getall()
highest = response.css(
'.align-right:nth-child(9)::text').getall()
lowest = response.css(
'.align-right:nth-child(10)::text').getall()
status = response.css(
'td:nth-child(11)::text').getall()
close_time = response.css(
'.align-right~ .align-center::text').getall()
dca_level = response.css(
'.align-right:nth-child(13)::text').getall()
for id_num, issued_at, exchange, base_currency, coin, direction, ask, target, highest, lowest, status, close_time, dca_level in\
zip(id_num, issued_at, exchange, base_currency, coin, direction, ask, target, highest, lowest, status, close_time, dca_level):
yield{
'Id': id_num,
'Issued At': issued_at,
'Exchange': exchange,
'Base Currency': base_currency,
'Coin': coin,
'Direction': direction,
'Ask': ask,
'Target': target,
'Highest': highest,
'Lowest': lowest,
'Status': status,
'Close Time': close_time,
'DCA Level': dca_level
}
基本上,ID 是正确的,因为它们都存在,而 close_time 并不总是存在,因此输出 CSV 是 truncated.If 我不使用 ::text,元素都拿走了
例如:
Id,Issued At,Exchange,Base Currency,Coin,Direction,Ask,Target,Highest,Lowest,Status,Close Time,DCA Level
499762,01/12/2020 08:46:40,binance,USDT,CTK,LONG,1.208900000000,1.231802400000,9.975000000000,9.927000000000,open,01/12/2020 08:25:00,0
499837,01/12/2020 08:46:17,kraken,USD,AUD,LONG,0.737670000000,0.745784370000,0.000003860000,0.000003840000,open,01/12/2020 08:30:00,0
我想要的是保留/替换空值。
您需要重写 parse
回调以处理单个项目:
def parse(self, response):
for item in response.css('your_epxression to_get list_of_items'):
id_num = item.css('td:nth-child(1)::text').get()
issued_at = item.css(
'.align-center.xcrud-current::text').get()
...
yield {'Id': id_num, 'Issued At': issued_at, ...}