Python 2.7 concurrent.futures.ThreadPoolExecutor 不并行化

Question

我运行在基于 Intel i3 的机器上使用以下代码，该机器具有 4 个虚拟内核（2 hyperthreads/physical 内核，64 位）并安装了 Ubuntu 14.04：

n = multiprocessing.cpu_count()
executor = ThreadPoolExecutor(n)
tuple_mapper = lambda i: (i, func(i))
results = dict(executor.map(tuple_mapper, range(10)))

该代码似乎没有以并行方式执行，因为 CPU 仅持续使用 25%。在利用率图中，4 个虚拟核心中只有一个被 100% 使用。使用的核心每 10 秒左右交替一次。

但是并行化在具有相同软件设置的服务器计算机上运行良好。我不知道内核的确切数量，也不知道处理器的确切类型，但我确定它有多个内核，利用率为 100%，并且计算速度很快（使用并行化后速度提高了 10 倍，使一些实验）。

我希望并行化也能在我的机器上运行，而不仅仅是在服务器上。

为什么不起作用？它与我的操作系统设置有关吗？我必须更改它们吗？

提前致谢！

更新： 有关背景信息，请参阅下面的正确答案。为了完整起见，我想给出一个解决问题的示例代码：

tuple_mapper = lambda i: (i, func(i))
n = multiprocessing.cpu_count()
with concurrent.futures.ProcessPoolExecutor(n) as executor:
    results = dict(executor.map(tuple_mapper, range(10)))

在您重复使用它之前，请注意您使用的所有函数都定义在模块的顶层，如下所述： Python multiprocessing pickling error

Answer 1

听起来您看到的是 Python 的 Global Interpreter Lock (a.k.a GIL) 的结果。

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once.

由于您的所有线程都是运行ning 纯 Python 代码，因此实际上只有一个线程可以运行并行。那应该只会导致一个 CPU 处于活动状态并且与您对问题的描述相符。

您可以通过使用来自同一模块的 ProcessPoolExecutor 的多个进程来绕过它。其他解决方案包括切换到没有 GIL 的 Jython 或 IronPython。

The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

Python 2.7 concurrent.futures.ThreadPoolExecutor 不并行化

Python 2.7 concurrent.futures.ThreadPoolExecutor does not parallelize

python

linux

parallel-processing

ubuntu