Python 每块多线程

Question

我正在执行以下 Python 代码，但是当我启动多个线程时，远程 API (Google API) returns:

 <HttpError 403 when requesting https://www.googleapis.com/prediction/v1.6/projects/project/trainedmodels/return_reason?alt=json returned "User Rate Limit Exceeded">

我需要一次启动大约 20K 个对象以供 API 处理。这适用于少量对象，如何减慢或按块发送请求？

from threading import *

collection_ = []
lock_object = Semaphore(value=1)

def connect_to_api(document):
    try:
        api_label = predictor.make_prediction(document)
        return_instance = ReturnReason(document=document) # Create Return Reason Object
        lock_object.acquire()                             # Lock object
        collection_.append(return_instance)
    except Exception, e:
        print e
    finally:
        lock_object.release()

def factory():
    """

    :return:
    """

    list_of_docs = file_reader.get_file_documents(file_contents)
    threads = [Thread(target=connect_to_api, args=(doc,)) for doc in list_of_docs]
    [t.start() for t in threads]
    [t.join() for t in threads]

Answer 1

速率处理是一个完整的问题，可能只是睡眠不足以完成长期任务。

我建议您查看队列（rq 非常简单），以下文章也会有所帮助：http://flask.pocoo.org/snippets/70/

Python 每块多线程

Python multithread per block

python

multithreading

multiprocessing