Scrapy trouble : UnboundLocalError: local variable referenced before assignment
Scrapy trouble : UnboundLocalError: local variable referenced before assignment
我正在努力提高我在网络抓取方面的技能,但我受困于我的脚本。我想在亚马逊上抓取一些信息。
到目前为止,这是我的脚本:
import scrapy
from ..items import AmazontutorialItem
class AmazonSpiderSpider(scrapy.Spider):
name = 'amazon'
page_number = 2
start_urls = ['https://www.amazon.com/s?bbn=1&rh=n%3A283155%2Cn%3A%211000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&fst=as%3Aoff&qid=1606224210&rnid=1250225011&ref=lp_1_nr_p_n_publication_date_0']
def parse(self, response):
items = AmazontutorialItem()
product_name = response.css('.a-color-base.a-text-normal::text').extract()
product_author = response.css('.sg-col-12-of-28 span.a-size-base+ .a-size-base::text').extract()
product_price = response.css('.a-spacing-top-small .a-price-whole::text').extract()
product_imagelink = response.css('.s-image::attr(src)').extract()
items['product_name'] = product_name
items['product_author'] = product_author
items['product_price'] = product_price
items['product_imagelink'] = product_imagelink
yield items
next_page = 'https://www.amazon.com/s?i=stripbooks&bbn=1&rh=n%3A283155%2Cn%3A1000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&page=' + str(AmazonSpiderSpider.page_number) + '&fst=as%3Aoff&qid=1606229780&rnid=1250225011&ref=sr_pg_2'
if AmazonSpiderSpider.page_number <= 3:
AmazonSpiderSpider += 1
yield response.follow(next_page, callback = self.parse)
但是我得到这个错误:
UnboundLocalError: local variable 'AmazonSpiderSpider' referenced before assignment
我不明白,我以前从来没有遇到过这个错误,即使使用网页抓取也是如此。
有什么想法吗?谢谢
您正在尝试从 class 本身内部的 class AmazonSpiderSpider
访问 page_number
。您正在尝试使用 AmazonSpiderSpider.page_number
执行此操作,这肯定会失败。您打算做的可能是访问 self.page_number
.
以下应该可以解决您的问题:
import scrapy
from ..items import AmazontutorialItem
class AmazonSpiderSpider(scrapy.Spider):
name = 'amazon'
page_number = 2
start_urls = ['https://www.amazon.com/s?bbn=1&rh=n%3A283155%2Cn%3A%211000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&fst=as%3Aoff&qid=1606224210&rnid=1250225011&ref=lp_1_nr_p_n_publication_date_0']
def parse(self, response):
items = AmazontutorialItem()
product_name = response.css('.a-color-base.a-text-normal::text').extract()
product_author = response.css('.sg-col-12-of-28 span.a-size-base+ .a-size-base::text').extract()
product_price = response.css('.a-spacing-top-small .a-price-whole::text').extract()
product_imagelink = response.css('.s-image::attr(src)').extract()
items['product_name'] = product_name
items['product_author'] = product_author
items['product_price'] = product_price
items['product_imagelink'] = product_imagelink
yield items
next_page = 'https://www.amazon.com/s?i=stripbooks&bbn=1&rh=n%3A283155%2Cn%3A1000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&page=' + str(self.page_number) + '&fst=as%3Aoff&qid=1606229780&rnid=1250225011&ref=sr_pg_2'
if self.page_number <= 3:
self.page_number += 1
yield response.follow(next_page, callback = self.parse)
我正在努力提高我在网络抓取方面的技能,但我受困于我的脚本。我想在亚马逊上抓取一些信息。
到目前为止,这是我的脚本:
import scrapy
from ..items import AmazontutorialItem
class AmazonSpiderSpider(scrapy.Spider):
name = 'amazon'
page_number = 2
start_urls = ['https://www.amazon.com/s?bbn=1&rh=n%3A283155%2Cn%3A%211000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&fst=as%3Aoff&qid=1606224210&rnid=1250225011&ref=lp_1_nr_p_n_publication_date_0']
def parse(self, response):
items = AmazontutorialItem()
product_name = response.css('.a-color-base.a-text-normal::text').extract()
product_author = response.css('.sg-col-12-of-28 span.a-size-base+ .a-size-base::text').extract()
product_price = response.css('.a-spacing-top-small .a-price-whole::text').extract()
product_imagelink = response.css('.s-image::attr(src)').extract()
items['product_name'] = product_name
items['product_author'] = product_author
items['product_price'] = product_price
items['product_imagelink'] = product_imagelink
yield items
next_page = 'https://www.amazon.com/s?i=stripbooks&bbn=1&rh=n%3A283155%2Cn%3A1000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&page=' + str(AmazonSpiderSpider.page_number) + '&fst=as%3Aoff&qid=1606229780&rnid=1250225011&ref=sr_pg_2'
if AmazonSpiderSpider.page_number <= 3:
AmazonSpiderSpider += 1
yield response.follow(next_page, callback = self.parse)
但是我得到这个错误:
UnboundLocalError: local variable 'AmazonSpiderSpider' referenced before assignment
我不明白,我以前从来没有遇到过这个错误,即使使用网页抓取也是如此。
有什么想法吗?谢谢
您正在尝试从 class 本身内部的 class AmazonSpiderSpider
访问 page_number
。您正在尝试使用 AmazonSpiderSpider.page_number
执行此操作,这肯定会失败。您打算做的可能是访问 self.page_number
.
以下应该可以解决您的问题:
import scrapy
from ..items import AmazontutorialItem
class AmazonSpiderSpider(scrapy.Spider):
name = 'amazon'
page_number = 2
start_urls = ['https://www.amazon.com/s?bbn=1&rh=n%3A283155%2Cn%3A%211000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&fst=as%3Aoff&qid=1606224210&rnid=1250225011&ref=lp_1_nr_p_n_publication_date_0']
def parse(self, response):
items = AmazontutorialItem()
product_name = response.css('.a-color-base.a-text-normal::text').extract()
product_author = response.css('.sg-col-12-of-28 span.a-size-base+ .a-size-base::text').extract()
product_price = response.css('.a-spacing-top-small .a-price-whole::text').extract()
product_imagelink = response.css('.s-image::attr(src)').extract()
items['product_name'] = product_name
items['product_author'] = product_author
items['product_price'] = product_price
items['product_imagelink'] = product_imagelink
yield items
next_page = 'https://www.amazon.com/s?i=stripbooks&bbn=1&rh=n%3A283155%2Cn%3A1000%2Cn%3A1%2Cp_n_publication_date%3A1250226011&dc&page=' + str(self.page_number) + '&fst=as%3Aoff&qid=1606229780&rnid=1250225011&ref=sr_pg_2'
if self.page_number <= 3:
self.page_number += 1
yield response.follow(next_page, callback = self.parse)