为什么这个守护线程阻塞?
Why is this Daemon thread blocking?
为什么以下代码会阻塞 cc.start()? crawler.py 包含类似于 http://doc.scrapy.org/en/latest/topics/practices.html#run-from-script
的代码
import scrapy
import threading
from subprocess import Popen, PIPE
def worker():
crawler = Popen('python crawler.py', stdout=PIPE, stderr=PIPE, shell=True)
while True:
line = crawler.stderr.readline()
print(line.strip())
cc = threading.Thread(target=worker())
cc.setDaemon(True)
cc.start()
print "Here" # This is not printed
# Do more stuff
crawler.py 包含以下代码:
from scrapy.crawler import CrawlerProcess
import scrapy
class MySpider(scrapy.Spider):
name = 'Whosebug'
start_urls = ['http://whosebug.com/questions?sort=votes']
def parse(self, response):
for href in response.css('.question-summary h3 a::attr(href)'):
full_url = response.urljoin(href.extract())
yield scrapy.Request(full_url, callback=self.parse_question)
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
threading.Thread 将可调用对象作为参数(前函数名),你实际上是在创建线程实例时调用该函数
cc = threading.Thread(target=worker())
你需要做的只是将要调用的函数传递给thread
cc = threading.Thread(target=worker)
为什么以下代码会阻塞 cc.start()? crawler.py 包含类似于 http://doc.scrapy.org/en/latest/topics/practices.html#run-from-script
的代码import scrapy
import threading
from subprocess import Popen, PIPE
def worker():
crawler = Popen('python crawler.py', stdout=PIPE, stderr=PIPE, shell=True)
while True:
line = crawler.stderr.readline()
print(line.strip())
cc = threading.Thread(target=worker())
cc.setDaemon(True)
cc.start()
print "Here" # This is not printed
# Do more stuff
crawler.py 包含以下代码:
from scrapy.crawler import CrawlerProcess
import scrapy
class MySpider(scrapy.Spider):
name = 'Whosebug'
start_urls = ['http://whosebug.com/questions?sort=votes']
def parse(self, response):
for href in response.css('.question-summary h3 a::attr(href)'):
full_url = response.urljoin(href.extract())
yield scrapy.Request(full_url, callback=self.parse_question)
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
threading.Thread 将可调用对象作为参数(前函数名),你实际上是在创建线程实例时调用该函数
cc = threading.Thread(target=worker())
你需要做的只是将要调用的函数传递给thread
cc = threading.Thread(target=worker)