在 Python 中使用 Pool 进行并行处理

Question

我已经尝试运行对本地定义的函数进行并行处理，如下所示：

import multiprocessing as mp                                                                                               
import numpy as np
import pdb


def testFunction():                                                                                                        
  x = np.asarray( range(1,10) )
  y = np.asarray( range(1,10) )

  def myFunc( i ):
    return np.sum(x[0:i]) * y[i]

  p = mp.Pool( mp.cpu_count() )
  out = p.map( myFunc, range(0,x.size) )
  print( out )


if __name__ == '__main__':
  print( 'I got here' )                                                                                                         
  testFunction()

当我这样做时，出现以下错误：

cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

如何使用多处理并行处理多个数组，就像我在这里尝试做的那样？ x 和 y 必须在函数内部定义；我不想让它们成为全局变量。

感谢所有帮助。

Answer 1

只需将处理函数设为全局并传递成对的数组值，而不是在函数中通过索引引用它们：

import multiprocessing as mp

import numpy as np


def process(inputs):
    x, y = inputs

    return x * y


def main():
    x = np.asarray(range(10))
    y = np.asarray(range(10))

    with mp.Pool(mp.cpu_count()) as pool:
        out = pool.map(process, zip(x, y))

    print(out)


if __name__ == '__main__':
    main()

输出：

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

更新：根据提供的新细节，您必须在不同进程之间共享数组。这正是 multiprocessing.Manager 的用途。

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

因此生成的代码将如下所示：

from functools import partial
import multiprocessing as mp

import numpy as np


def process(i, x, y):
    return np.sum(x[:i]) * y[i]


def main():
    manager = mp.Manager()

    x = manager.Array('i', range(10))
    y = manager.Array('i', range(10))

    func = partial(process, x=x, y=y)

    with mp.Pool(mp.cpu_count()) as pool:
        out = pool.map(func, range(len(x)))

    print(out)


if __name__ == '__main__':
    main()

输出：

[0, 0, 2, 9, 24, 50, 90, 147, 224, 324]

在 Python 中使用 Pool 进行并行处理

Parallel processing with Pool in Python

python

parallel-processing

python-multiprocessing