如何使用 python 中的线程模块来处理并发的 http 请求？

Question

我有一些代码可以为它处理的每个项目调用两次 api。它按预期工作，但每个项目需要 1-3 秒。为了加快速度，我尝试使用线程模块一次执行 10 个请求，但它的行为方式似乎与添加线程之前相同。处理来自 api 的数据所花费的时间是每次调用约 0.2 毫秒，因此这不应导致阻塞。

这是我的代码的相关部分：

import threading

.
.
.

def secondary():
    global queue
    item = queue.pop()
    queue|=func1(item)# func1 returns data from an api using the requests module
    with open('data/'+item,'w+') as f:
        f.write(func2(item))# func2 also returns data from an api using the requests module
    global num_procs
    num_procs-=1
    
def primary():
    t=[]# threads
    global num_procs
    num_procs+=min(len(queue),10-num_procs)
    for i in range(min(len(queue),10-num_procs)):
        t+=[threading.Thread(target=secondary)]
    for i in t:
        i.start()
        i.join()

queue = {'initial_data'}
num_procs=0# number of currently running processes - when it reaches 10, stop creating new ones
while num_procs or len(queue):
    primary()

我需要做什么才能同时运行？我宁愿使用线程，但如果异步更好，我该如何实现？

Answer 1

启动每个线程后，立即等待线程完成：

for i in t:
    i.start()
    i.join()

线程永远没有机会并行执行。相反，只有在启动所有线程后才等待线程完成。

如何使用 python 中的线程模块来处理并发的 http 请求？

How do I use the threading module in python to handle simultaneous http requests?

python

python-multithreading

python-3.x