Scrapy 找不到 div.title
Scrapy Not Finding div.title
import scrapy
class BookSpider(scrapy.Spider):
name = "books"
start_urls = [
'http://books.toscrape.com/catalogue/page-1.html'
]
def parse(self, response):
page = response.url.split(".")[-1]
filename = f'BooksHTML-{page}.html'
with open(filename, 'wb') as f:
f.write(response.body)
self.log(f'Saved file {filename}')
所以我正在使用这个蜘蛛来练习网络抓取,并且我正在尝试收集此页面上所有书籍的标题。当我进入终端并输入
scrapy shell 'http://books.toscrape.com/catalogue/page-1.html'
然后
response.css("div.title").getall()
它只是 returns 一个空列表。
[]
如有任何说明,我们将不胜感激。
正如 Tim Roberts 在评论中指出的那样,没有 class
为 title
的 div。
页面上每本书的完整标题在 a
标签(锚标签)的 title
属性 中,锚标签链接到该页面具体的书。
对于所有具有 title
属性 的锚标签,您可以获得 title
属性 的值,如下所示:
response.css("a::attr(title)").getall()
returns:
['A Light in the Attic', 'Tipping the Velvet', 'Soumission', 'Sharp Objects', 'Sapiens: A Brief History of Humankind', 'The Requiem Red', 'The Dirty Little Secrets of Getting Your Dream Job', 'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull', 'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics', 'The Black Maria', 'Starving Hearts (Triangular Trade Trilogy, #1)', "Shakespeare's Sonnets", 'Set Me Free', "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)", 'Rip it Up and Start Again', 'Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991', 'Olio', 'Mesaerion: The Best Science Fiction Stories 1800-1849', 'Libertarianism for Beginners', "It's Only the Himalayas"]
import scrapy
class BookSpider(scrapy.Spider):
name = "books"
start_urls = [
'http://books.toscrape.com/catalogue/page-1.html'
]
def parse(self, response):
page = response.url.split(".")[-1]
filename = f'BooksHTML-{page}.html'
with open(filename, 'wb') as f:
f.write(response.body)
self.log(f'Saved file {filename}')
所以我正在使用这个蜘蛛来练习网络抓取,并且我正在尝试收集此页面上所有书籍的标题。当我进入终端并输入
scrapy shell 'http://books.toscrape.com/catalogue/page-1.html'
然后
response.css("div.title").getall()
它只是 returns 一个空列表。
[]
如有任何说明,我们将不胜感激。
正如 Tim Roberts 在评论中指出的那样,没有 class
为 title
的 div。
页面上每本书的完整标题在 a
标签(锚标签)的 title
属性 中,锚标签链接到该页面具体的书。
对于所有具有 title
属性 的锚标签,您可以获得 title
属性 的值,如下所示:
response.css("a::attr(title)").getall()
returns:
['A Light in the Attic', 'Tipping the Velvet', 'Soumission', 'Sharp Objects', 'Sapiens: A Brief History of Humankind', 'The Requiem Red', 'The Dirty Little Secrets of Getting Your Dream Job', 'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull', 'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics', 'The Black Maria', 'Starving Hearts (Triangular Trade Trilogy, #1)', "Shakespeare's Sonnets", 'Set Me Free', "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)", 'Rip it Up and Start Again', 'Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991', 'Olio', 'Mesaerion: The Best Science Fiction Stories 1800-1849', 'Libertarianism for Beginners', "It's Only the Himalayas"]