如何限制并发worker的数量?
How to limit the number of concurrent workers?
我有一个函数,我想并行执行几次,但同时只有定义数量的实例。
执行此操作的自然方法似乎是使用 multiprocessing.Pool
。具体来说,文档说
A frequent pattern (...) is to allow a worker within a
pool to complete only a set amount of work before being exiting, being
cleaned up and a new process spawned to replace the old one. The
maxtasksperchild
argument to the Pool exposes this ability to the end
user.
maxtasksperchild
定义为:
maxtasksperchild
is the number of tasks a worker process can
complete before it will exit and be replaced with a fresh worker
process, to enable unused resources to be freed. The default
maxtasksperchild is None, which means worker processes will live as
long as the pool.
我不清楚这里的task是什么意思。例如,如果我只想同时拥有最多 4 个 worker 运行 实例,我应该将 multiprocessing.Pool
启动为
pool = multiprocessing.Pool(processes=4, maxtasksperchild=4)
processes
和 maxtasksperchild
如何协同工作?我可以将 processes
设置为 10 并且仍然只有 4 个工人 运行(实际上有 6 个进程空闲吗?)
正如doc所说(也在你的描述中),
processes 是 parallel worker 的数量,可以 运行 在一起,如果不设置,它将与您计算机中的 CPU 数量相同。
maxtasksperchild 是每个进程可以处理的最大任务数,这意味着如果完成的任务数达到 maxtasksperchild,该进程将被杀死,并启动一个新进程并添加到 Pool
让我检查代码:
def f(x):
print "pid: ", os.getpid(), " deal with ", x
sys.stdout.flush()
if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=2)
keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = pool.map(f, keys)
这里我们使用了4个进程,每个进程在执行完2个任务后就会被杀死。代码执行后,您可以看到:
pid: 10899 deal with 1
pid: 10900 deal with 2
pid: 10901 deal with 3
pid: 10899 deal with 5
pid: 10900 deal with 6
pid: 10901 deal with 7
pid: 10902 deal with 4
pid: 10902 deal with 8
pid: 10907 deal with 9
pid: 10907 deal with 10
进程[10899-10902]在每执行完2个task后被kill掉,新的进程10907会被用来执行最后一个
作为比较,如果我们使用更大的maxtasksperchild或默认值(这意味着进程永远不会被杀死并且只要Pool就活着),如下代码:
if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=10)
keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = pool.map(f, keys)
结果:
pid: 13352 deal with 1
pid: 13353 deal with 2
pid: 13352 deal with 4
pid: 13354 deal with 3
pid: 13353 deal with 6
pid: 13352 deal with 7
pid: 13355 deal with 5
pid: 13354 deal with 8
pid: 13353 deal with 9
pid: 13355 deal with 10
如您所见,没有创建新进程,所有任务都使用原来的 4 个进程完成。
希望有用~
我有一个函数,我想并行执行几次,但同时只有定义数量的实例。
执行此操作的自然方法似乎是使用 multiprocessing.Pool
。具体来说,文档说
A frequent pattern (...) is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The
maxtasksperchild
argument to the Pool exposes this ability to the end user.
maxtasksperchild
定义为:
maxtasksperchild
is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.
我不清楚这里的task是什么意思。例如,如果我只想同时拥有最多 4 个 worker 运行 实例,我应该将 multiprocessing.Pool
启动为
pool = multiprocessing.Pool(processes=4, maxtasksperchild=4)
processes
和 maxtasksperchild
如何协同工作?我可以将 processes
设置为 10 并且仍然只有 4 个工人 运行(实际上有 6 个进程空闲吗?)
正如doc所说(也在你的描述中),
processes 是 parallel worker 的数量,可以 运行 在一起,如果不设置,它将与您计算机中的 CPU 数量相同。
maxtasksperchild 是每个进程可以处理的最大任务数,这意味着如果完成的任务数达到 maxtasksperchild,该进程将被杀死,并启动一个新进程并添加到 Pool
让我检查代码:
def f(x):
print "pid: ", os.getpid(), " deal with ", x
sys.stdout.flush()
if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=2)
keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = pool.map(f, keys)
这里我们使用了4个进程,每个进程在执行完2个任务后就会被杀死。代码执行后,您可以看到:
pid: 10899 deal with 1
pid: 10900 deal with 2
pid: 10901 deal with 3
pid: 10899 deal with 5
pid: 10900 deal with 6
pid: 10901 deal with 7
pid: 10902 deal with 4
pid: 10902 deal with 8
pid: 10907 deal with 9
pid: 10907 deal with 10
进程[10899-10902]在每执行完2个task后被kill掉,新的进程10907会被用来执行最后一个
作为比较,如果我们使用更大的maxtasksperchild或默认值(这意味着进程永远不会被杀死并且只要Pool就活着),如下代码:
if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=10)
keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = pool.map(f, keys)
结果:
pid: 13352 deal with 1
pid: 13353 deal with 2
pid: 13352 deal with 4
pid: 13354 deal with 3
pid: 13353 deal with 6
pid: 13352 deal with 7
pid: 13355 deal with 5
pid: 13354 deal with 8
pid: 13353 deal with 9
pid: 13355 deal with 10
如您所见,没有创建新进程,所有任务都使用原来的 4 个进程完成。
希望有用~