Python Scrapy:你如何从一个单独的文件中 运行 你的蜘蛛?

Python Scrapy: How do you run your spider from a seperate file?

所以我在 scrapy 中创建了一个蜘蛛,它现在成功地定位了我想要的所有文本。

你如何在另一个 python 文件中执行这个爬虫?因为我希望能够向它传递新的 URLs/store 它在字典中找到的数据,然后是数据框。

因为目前我只能使用终端命令 'scrapy crawl SpiderName'

将其发送到 运行
from scrapy.spiders import Spider
from scrapy_splash import SplashRequest


class SpiderName(Spider):
    name = 'SpiderName'
    Page = 'https://www.urlname.com'

    def start_requests(self):
        yield SplashRequest(url=self.Page, callback=self.parse,
                            endpoint ='render.html',
                            args={'wait': 0.5},
                            )

    def parse(self, response):
        for x in response.css("div.row.list"):
            yield {
                'Entry': x.css("span[data-bind]::text").getall()

            }

谢谢

在 Scrapy 文档中 Common Practices you can see Run Scrapy from a script

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # ... Your spider definition ...

# ... run it ...

process = CrawlerProcess(settings={ ... })    
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

如果你自己添加__init__

class MySpider(scrapy.Spider):

    def __init__(self, urls, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.start_urls = urls

然后你可以 运行 它以 urls 作为参数

process.crawl(MySpider, urls=['http://books.toscrape.com/', 'http://quotes.toscrape.com/'])