Python 多处理 - 如何提高效率

Question

考虑以下两个短程序。

normal_test.py:

import time

if __name__ == '__main__':
    t_end = time.time() + 1
    loop_iterations = 0
    while time.time() < t_end:
        loop_iterations += 1

    print(loop_iterations)

输出（在我的机器上）：

mp_test.py:

from multiprocessing import Process
from multiprocessing import Manager
import time


def loop1(ns):
    t_end = time.time() + 1
    while time.time() < t_end:
        ns.loop_iterations1 += 1


def loop2(ns):
    t_end = time.time() + 1
    while time.time() < t_end:
        ns.loop_iterations2 += 1


if __name__ == '__main__':
    manager = Manager()
    ns = manager.Namespace()
    ns.loop_iterations1 = 0
    ns.loop_iterations2 = 0

    p1 = Process(target=loop1, args=(ns,))
    p2 = Process(target=loop2, args=(ns,))
    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print(ns.loop_iterations1)
    print(ns.loop_iterations2)

输出（在我的机器上）：

5533
5527

我希望在 Raspberry Pi 上使用 Python 多处理从多个 ADC 并行读取值。因此，速度很重要。我运行这两个程序所用的笔记本电脑有四个内核，所以我不明白为什么第二个程序中创建的进程只能运行比单个进程少近 900 次迭代第一个程序。我是否错误地使用了 Python 多处理库？我怎样才能使流程更快？

Answer 1

Am I using the Python multiprocessing library incorrectly?

不正确？不，效率低下？是的

请记住，多处理创建合作的，但在其他方面独立的 Python 实例。把他们想象成工厂里的工人，或者从事一项重要工作的朋友。

如果只有一个人在做一个项目，那个人可以在工厂车间自由走动，拿起工具，使用它，放下它，移动在其他地方，拿起下一个工具，依此类推。添加第二个人——或者更糟，更多的人，甚至数百人——这个人现在必须协调：如果共享某个区域，或者共享某个工具，Bob 不能直接去拿东西，他必须问 Alice首先，如果她已经完成了。

一个 Manager 对象是 Python 多进程共享的通用包装器。将变量放入管理器 Namespace 意味着 这些是共享的，因此在使用它们之前自动与其他人核实 。（更准确地说，它们保存在一个位置——一个进程——并通过代理从其他位置访问或更改。）

在这里，您完成了将 "Bob: count as fast as you can" 替换为 "Bob: constantly interrupt Alice to ask if she's counting, then count; Alice: count, but be constantly interrupted by Bob." 的隐喻等价物，Bob 和 Alice 现在大部分时间都在互相交谈，而不是数数。

作为the documentation says:

... when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes.

（它以短语 "as mentioned above" 开头，但上面没有提到！）。

有很多标准技巧，例如 批处理 来完成大量工作 between 共享事件，或使用 共享内存 以加快共享速度——但使用共享内存时，您需要锁定项目。

Answer 2

看起来实现并行处理（当不需要共享状态时）的更好方法是使用多处理 Queue。 OP的两个循环不需要共享状态。

这是测试。

规格：

Python 版本：3.7.6。
机器有两个 2.3 GHz 的 Intel i-9 9880H CPU。

当我在问题中执行normal_test.py时，得到：

$ python normal_test.py
7601322

然后我测试了 multiprocessing Queue 如下（两个并行进程）：

import time
from multiprocessing import Process, Queue


def loop(n, q):
    n_iter = 0
    t_end = time.time() + 1
    while time.time() < t_end:
        n_iter += 1
    q.put((n, n_iter))


if __name__ == '__main__':
    results = []

    q = Queue()
    procs = []
    for i in range(2):
        procs.append(Process(target=loop, args=(i, q)))

    for proc in procs:
        proc.start()

    for proc in procs:
        n, loop_count = q.get()
        results.append((n, loop_count))

    for proc in procs:
        proc.join()

    del procs, q

    for r in results:
        print(r)

当我执行这个时，我得到：

$ python multiproc2.py
(1, 10570043)
(0, 10580648)

看起来运行两个并行的进程比运行一个进程能够做更多的工作。

Python 多处理 - 如何提高效率

Python multiprocessing - how to make it more efficient

python

performance

multiprocessing

python-3.x

python-multiprocessing