为什么 Python Scrapy 库 class 没有被执行
Why the Python Scrapy library class isn't executed
我正在尝试将 IBM 云上的 Scrapy 作为函数使用。我的__main__.py
如下:
import scrapy
from scrapy.crawler import CrawlerProcess
class AutoscoutListSpider(scrapy.Spider):
name = "vehicles list"
def __init__(self, params, *args, **kwargs):
super(AutoscoutListSpider, self).__init__(*args, **kwargs)
make = params.get("make", None)
model = params.get("model", None)
mileage = params.get("mileage", None)
init_url = "https://www.autoscout24.be/nl/resultaten?sort=standard&desc=0&ustate=N%2CU&size=20&page=1&cy=B&mmvmd0={0}&mmvmk0={1}&kmto={2}&atype=C&".format(
model, make, mileage)
self.start_urls = [init_url]
def parse(self, response):
# Get total result on list load
init_total_results = int(response.css('.cl-filters-summary-counter::text').extract_first().replace('.', ''))
if init_total_results > 400:
yield {"message": "There are MORE then 400 results"}
else:
yield {"message": "There are LESS then 400 results"}
def main(params):
process = CrawlerProcess()
try:
process.crawl(AutoscoutListSpider, params)
process.start()
return {"Success ": "The crawler (make: {0}, model: {1}, mileage: {2}) is successfully executed.".format(
params['make'], params['model'], params['mileage'])}
except Exception as e:
return {"Error ": e, "params ": params}
添加该功能的整个过程如下:
zip -r ascrawler.zip __main__.py common.py
// 因此我创建了一个 zip 文件来上传它。 (还有一个 common.py 文件。为了简单起见,我从这里删除了它。)
ibmcloud wsk action create ascrawler --kind python:3 ascrawler.zip
// 创建函数并将其添加到云端
ibmcloud wsk action invoke --blocking --result ascrawler --param make 9 --param model 1624 --param mileage 2500
// 调用带参数的函数
执行第三步后得到如下结果:
{"Success ": "The crawler (make: 9, model: 1624, mileage: 2500) is successfully executed."}
因此我没有得到任何错误,但它根本没有进入 AutoscoutListSpider
class。为什么?
应该return也{"message": "There are MORE then 400 results"}
。有什么想法吗?
当我从 python 控制台 运行 如下:
main({"make":"9", "model":"1624", "mileage":"2500"})
它return的正确结果:
{"message": "There are MORE then 400 results"}
{"Success ": "The crawler (make: 9, model: 1624, mileage: 2500) is successfully executed."}
{"message": "There are MORE then 400 results"}
在调用的激活日志中可用,而不是操作结果。
获得运行 ibmcloud wsk action invoke
命令后,检索上次调用的激活标识符。
$ ibmcloud wsk activation list
activations
d13bd19b196d420dbbd19b196dc20d59 ascrawler
...
此激活标识符随后可用于从调用期间写入的 stdout 和 stderr 检索所有控制台日志。
$ ibmcloud wsk activation logs d13bd19b196d420dbbd19b196dc20d59 | grep LESS
2018-06-29T08:27:11.094873294Z stderr: {'message': 'There are LESS then 400 results'}
我正在尝试将 IBM 云上的 Scrapy 作为函数使用。我的__main__.py
如下:
import scrapy
from scrapy.crawler import CrawlerProcess
class AutoscoutListSpider(scrapy.Spider):
name = "vehicles list"
def __init__(self, params, *args, **kwargs):
super(AutoscoutListSpider, self).__init__(*args, **kwargs)
make = params.get("make", None)
model = params.get("model", None)
mileage = params.get("mileage", None)
init_url = "https://www.autoscout24.be/nl/resultaten?sort=standard&desc=0&ustate=N%2CU&size=20&page=1&cy=B&mmvmd0={0}&mmvmk0={1}&kmto={2}&atype=C&".format(
model, make, mileage)
self.start_urls = [init_url]
def parse(self, response):
# Get total result on list load
init_total_results = int(response.css('.cl-filters-summary-counter::text').extract_first().replace('.', ''))
if init_total_results > 400:
yield {"message": "There are MORE then 400 results"}
else:
yield {"message": "There are LESS then 400 results"}
def main(params):
process = CrawlerProcess()
try:
process.crawl(AutoscoutListSpider, params)
process.start()
return {"Success ": "The crawler (make: {0}, model: {1}, mileage: {2}) is successfully executed.".format(
params['make'], params['model'], params['mileage'])}
except Exception as e:
return {"Error ": e, "params ": params}
添加该功能的整个过程如下:
zip -r ascrawler.zip __main__.py common.py
// 因此我创建了一个 zip 文件来上传它。 (还有一个 common.py 文件。为了简单起见,我从这里删除了它。)ibmcloud wsk action create ascrawler --kind python:3 ascrawler.zip
// 创建函数并将其添加到云端ibmcloud wsk action invoke --blocking --result ascrawler --param make 9 --param model 1624 --param mileage 2500
// 调用带参数的函数
执行第三步后得到如下结果:
{"Success ": "The crawler (make: 9, model: 1624, mileage: 2500) is successfully executed."}
因此我没有得到任何错误,但它根本没有进入 AutoscoutListSpider
class。为什么?
应该return也{"message": "There are MORE then 400 results"}
。有什么想法吗?
当我从 python 控制台 运行 如下:
main({"make":"9", "model":"1624", "mileage":"2500"})
它return的正确结果:
{"message": "There are MORE then 400 results"}
{"Success ": "The crawler (make: 9, model: 1624, mileage: 2500) is successfully executed."}
{"message": "There are MORE then 400 results"}
在调用的激活日志中可用,而不是操作结果。
获得运行 ibmcloud wsk action invoke
命令后,检索上次调用的激活标识符。
$ ibmcloud wsk activation list
activations
d13bd19b196d420dbbd19b196dc20d59 ascrawler
...
此激活标识符随后可用于从调用期间写入的 stdout 和 stderr 检索所有控制台日志。
$ ibmcloud wsk activation logs d13bd19b196d420dbbd19b196dc20d59 | grep LESS
2018-06-29T08:27:11.094873294Z stderr: {'message': 'There are LESS then 400 results'}