如何迭代 scrapy spider 中的参数列表?
how to iterate over a list of arguments in scrapy spider?
嗨,我正在尝试在 scrapy spider 命令中传递参数列表。我可以 运行 它的 1 个参数。但无法针对参数列表执行此操作。请帮忙。这是我试过的。
# -*- coding: utf-8 -*-
import scrapy
import json
class AirbnbweatherSpider(scrapy.Spider):
name = 'airbnbweather'
allowed_domains = ['www.wunderground.com']
def __init__(self,geocode ):
self.geocode = geocode.split(',')
pass
def start_requests(self):
yield scrapy.Request(url="https://api.weather.com/v3/wx/forecast/daily/10day?apiKey=6532d6454b8aa370768e63d6ba5a832e&geocode={0}{1}{2}&units=e&language=en-US&format=json".format(self.geocode[0],"%2C",self.geocode[1]))
def parse(self, response):
resuturant = json.loads(response.body)
yield {
'temperatureMax' : resuturant.get('temperatureMax'),
'temperatureMin' : resuturant.get('temperatureMin'),
'validTimeLocal' : resuturant.get('validTimeLocal'),
}
我可以运行使用这个命令
scrapy crawl airbnbweather -o BOSTON.json -a geocode="42.361","-71.057"
一切正常。但是我如何遍历地理编码列表?
list = [("42.361","-71.057"),("29.384","-94.903"),("30.384", "-84.903")]
您只能使用字符串作为爬虫参数 (https://docs.scrapy.org/en/latest/topics/spiders.html#spider-arguments),因此您应该将列表作为字符串传递,并在您的代码中进行解析。
以下似乎可以解决问题:
import scrapy
import json
import ast
class AirbnbweatherSpider(scrapy.Spider):
name = 'airbnbweather'
allowed_domains = ['www.wunderground.com']
def __init__(self, geocode, *args, **kwargs):
super().__init__(*args, **kwargs)
self.geocodes = ast.literal_eval(geocode)
def start_requests(self):
for geocode in self.geocodes:
yield scrapy.Request(
url="https://api.weather.com/v3/wx/forecast/daily/10day?apiKey=6532d6454b8aa370768e63d6ba5a832e&geocode={0}{1}{2}&units=e&language=en-US&format=json".format(geocode[0],"%2C",geocode[1]))
然后您可以 运行 像这样的抓取工具:
scrapy crawl airbnbweather -o BOSTON.json -a geocodes='[("42.361","-71.057"),("29.384","-94.903"),("30.384", "-84.903")]'
嗨,我正在尝试在 scrapy spider 命令中传递参数列表。我可以 运行 它的 1 个参数。但无法针对参数列表执行此操作。请帮忙。这是我试过的。
# -*- coding: utf-8 -*-
import scrapy
import json
class AirbnbweatherSpider(scrapy.Spider):
name = 'airbnbweather'
allowed_domains = ['www.wunderground.com']
def __init__(self,geocode ):
self.geocode = geocode.split(',')
pass
def start_requests(self):
yield scrapy.Request(url="https://api.weather.com/v3/wx/forecast/daily/10day?apiKey=6532d6454b8aa370768e63d6ba5a832e&geocode={0}{1}{2}&units=e&language=en-US&format=json".format(self.geocode[0],"%2C",self.geocode[1]))
def parse(self, response):
resuturant = json.loads(response.body)
yield {
'temperatureMax' : resuturant.get('temperatureMax'),
'temperatureMin' : resuturant.get('temperatureMin'),
'validTimeLocal' : resuturant.get('validTimeLocal'),
}
我可以运行使用这个命令
scrapy crawl airbnbweather -o BOSTON.json -a geocode="42.361","-71.057"
一切正常。但是我如何遍历地理编码列表?
list = [("42.361","-71.057"),("29.384","-94.903"),("30.384", "-84.903")]
您只能使用字符串作为爬虫参数 (https://docs.scrapy.org/en/latest/topics/spiders.html#spider-arguments),因此您应该将列表作为字符串传递,并在您的代码中进行解析。 以下似乎可以解决问题:
import scrapy
import json
import ast
class AirbnbweatherSpider(scrapy.Spider):
name = 'airbnbweather'
allowed_domains = ['www.wunderground.com']
def __init__(self, geocode, *args, **kwargs):
super().__init__(*args, **kwargs)
self.geocodes = ast.literal_eval(geocode)
def start_requests(self):
for geocode in self.geocodes:
yield scrapy.Request(
url="https://api.weather.com/v3/wx/forecast/daily/10day?apiKey=6532d6454b8aa370768e63d6ba5a832e&geocode={0}{1}{2}&units=e&language=en-US&format=json".format(geocode[0],"%2C",geocode[1]))
然后您可以 运行 像这样的抓取工具:
scrapy crawl airbnbweather -o BOSTON.json -a geocodes='[("42.361","-71.057"),("29.384","-94.903"),("30.384", "-84.903")]'