如何在 response.css 中正确定义并在 scrapy 中产生
How to correctly define in response.css and yield in scrapy
我是 Scrapy 的新手,有一件事我试了两天但还是没有成功。
我正在练习提取 https://sofifa.com/. I adopted the code sample from https://docs.scrapy.org/ 中列出的足球运动员的信息并按如下方式进行编辑。我练习提取的信息是OVA
有谁知道我应该如何在下面的代码中正确定义“span.something...”的元素?
非常感谢,
詹姆斯
import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "player-css"
start_urls = [
'https://sofifa.com/players?type=all&tm%5B0%5D=1&r=210024&set=true',
]
**def parse(self, response):
for playerInfor in response.css("div.card"):
yield {**
**'OVA': playerInfor.css("span.bp3-tag p::bp3-tag p").extract()**
}
next_page_url = response.css("li.next > a::attr(href)").extract_first()
if next_page_url is not None:
yield scrapy.Request(response.urljoin(next_page_url))
使用此 css 选择器 response.css("tbody.list")
而不是 response.css("div.card")
因为 response.css("tbody.list")
数据很容易提取,但是当我使用 response.css("div.card")
结果是一些具有预期输出的空列表。
for playerInfor in response.css("tbody.list"):
print( playerInfor.css('td.col.col-oa.col-sort span::text').getall())
输出
['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']
另一种方法
def parse(self, response):
mydata =response.css('tbody.list td.col.col-oa.col-sort span::text').extract()
yield {
"OVA":mydata
}
#mydata 的输出
['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']
我是 Scrapy 的新手,有一件事我试了两天但还是没有成功。 我正在练习提取 https://sofifa.com/. I adopted the code sample from https://docs.scrapy.org/ 中列出的足球运动员的信息并按如下方式进行编辑。我练习提取的信息是OVA
有谁知道我应该如何在下面的代码中正确定义“span.something...”的元素?
非常感谢, 詹姆斯
import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "player-css"
start_urls = [
'https://sofifa.com/players?type=all&tm%5B0%5D=1&r=210024&set=true',
]
**def parse(self, response):
for playerInfor in response.css("div.card"):
yield {**
**'OVA': playerInfor.css("span.bp3-tag p::bp3-tag p").extract()**
}
next_page_url = response.css("li.next > a::attr(href)").extract_first()
if next_page_url is not None:
yield scrapy.Request(response.urljoin(next_page_url))
使用此 css 选择器 response.css("tbody.list")
而不是 response.css("div.card")
因为 response.css("tbody.list")
数据很容易提取,但是当我使用 response.css("div.card")
结果是一些具有预期输出的空列表。
for playerInfor in response.css("tbody.list"):
print( playerInfor.css('td.col.col-oa.col-sort span::text').getall())
输出
['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']
另一种方法
def parse(self, response):
mydata =response.css('tbody.list td.col.col-oa.col-sort span::text').extract()
yield {
"OVA":mydata
}
#mydata 的输出
['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']