我如何将前缀包含 str 添加到 scrapy 的列表中
how do i add prefix contain str into a list in scrapy
我将列表 'title_url' 更改为字符串,但没有得到我想要的输出,
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'quotes'
base_url = 'https://www.yell.com'
start_urls = ['https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=770796459&keywords=hospitals&location=united+kingdom']
def parse(self, response):
all_data = response.css('div.row.businessCapsule--mainRow')
for data in all_data:
title_url = str(data.css('a.businessCapsule--title::attr(href)').extract())
final_url = self.base_url + title_url
items = {
'Title Url' : final_url,
}
yield items
ON输出端显示如下:
"https://www.yell.com/['/biz/western-care-ltd-yeovil-8342726/']", "https://www.yell.com/['/biz/livingstonecare-service-corby-9019909/'],.....]
我想要这样的输出:
['https://www.yell.com/biz/western-care-ltd-yeovil-8342726/', 'https://www.yell.com/biz/livingstonecare-service-corby-9019909/'......]
将extract()
替换为get()
例如:
def parse(self, response):
all_data = response.css('div.row.businessCapsule--mainRow')
for data in all_data:
title_url = data.css('a.businessCapsule--title::attr(href)').get()
final_url = self.base_url + title_url
items = {
'Title Url' : final_url,
}
yield items
Note: .extract()
returns 项目列表。
我将列表 'title_url' 更改为字符串,但没有得到我想要的输出,
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'quotes'
base_url = 'https://www.yell.com'
start_urls = ['https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=770796459&keywords=hospitals&location=united+kingdom']
def parse(self, response):
all_data = response.css('div.row.businessCapsule--mainRow')
for data in all_data:
title_url = str(data.css('a.businessCapsule--title::attr(href)').extract())
final_url = self.base_url + title_url
items = {
'Title Url' : final_url,
}
yield items
ON输出端显示如下:
"https://www.yell.com/['/biz/western-care-ltd-yeovil-8342726/']", "https://www.yell.com/['/biz/livingstonecare-service-corby-9019909/'],.....]
我想要这样的输出:
['https://www.yell.com/biz/western-care-ltd-yeovil-8342726/', 'https://www.yell.com/biz/livingstonecare-service-corby-9019909/'......]
将extract()
替换为get()
例如:
def parse(self, response):
all_data = response.css('div.row.businessCapsule--mainRow')
for data in all_data:
title_url = data.css('a.businessCapsule--title::attr(href)').get()
final_url = self.base_url + title_url
items = {
'Title Url' : final_url,
}
yield items
Note: .extract()
returns 项目列表。