通过多线程和多处理并行化比串行化花费更多的时间

Parallelizing through Multi-threading and Multi-processing taking significantly more time than serial

我正在尝试学习如何在 python 中进行并行编程。我写了一个简单的 int square 函数,然后 运行 它在串行、多线程和多进程中:

import time
import multiprocessing, threading
import random


def calc_square(numbers):
    sq = 0
    for n in numbers:
        sq = n*n

def splita(list, n):
    a = [[] for i in range(n)]
    counter = 0
    for i in range(0,len(list)):
        a[counter].append(list[i])
        if len(a[counter]) == len(list)/n:
            counter = counter +1
            continue
    return a


if __name__ == "__main__":

    random.seed(1)
    arr = [random.randint(1, 11) for i in xrange(1000000)]
    print "init completed"

    start_time2 = time.time()
    calc_square(arr)
    end_time2 = time.time()

    print "serial: " + str(end_time2 - start_time2)

    newarr = splita(arr,8)
    print 'split complete'

    start_time = time.time()

    for i in range(8):
        t1 = threading.Thread(target=calc_square, args=(newarr[i],))

        t1.start()
        t1.join()

    end_time = time.time()

    print "mt: " + str(end_time - start_time)

    start_time = time.time()

    for i in range(8):
        p1 = multiprocessing.Process(target=calc_square, args=(newarr[i],))
        p1.start()
        p1.join()

    end_time = time.time()

    print "mp: " + str(end_time - start_time)

输出:

init completed
serial: 0.0640001296997
split complete
mt: 0.0599999427795
mp: 2.97099995613

但是,如您所见,发生了一些奇怪的事情,mt 花费的时间与 serial 花费的时间相同,而 mp 实际上花费的时间明显更长(几乎长了 50 倍)。

我做错了什么?有人能指导我在 python 中学习并行编程的正确方向吗?

编辑 01

看了评论,我发现也许不返回任何东西的函数似乎毫无意义。我什至尝试这样做的原因是因为之前我尝试过以下添加功能:

def addi(numbers):
    sq = 0
    for n in numbers:
        sq = sq + n
    return sq

我尝试将每个部分的加法返回到序列号加法器,因此至少我可以看到比纯串行实现有一些性能改进。但是,我无法弄清楚如何存储和使用返回值,这就是我试图找出比这更简单的东西的原因,它只是划分数组和 运行 一个简单的函数就可以了。

谢谢!

我认为 multiprocessing 需要相当长的时间来创建和启动每个进程。我已将程序更改为 arr 大小的 10 倍,并更改了进程启动的方式,并略有加速:

(另请注意python 3)

import time
import multiprocessing, threading
from multiprocessing import Queue
import random

def calc_square_q(numbers,q):
    while q.empty():
        pass
    return calc_square(numbers)

if __name__ == "__main__":

    random.seed(1)   # note how big arr is now vvvvvvv
    arr = [random.randint(1, 11) for i in range(10000000)]
    print("init completed")

    # ...
    # other stuff as before
    # ...

    processes=[]
    q=Queue()
    for arrs in newarr:
        processes.append(multiprocessing.Process(target=calc_square_q, args=(arrs,q)))

    print('start processes')
    for p in processes:
        p.start()  # even tho' each process is started it waits...

    print('join processes')
    q.put(None)   # ... for q to become not empty.
    start_time = time.time()
    for p in processes:
        p.join()

    end_time = time.time()

    print("mp: " + str(end_time - start_time))

还要注意上面我如何在两个不同的循环中创建和启动进程,然后最终在第三个循环中加入进程。

输出:

init completed
serial: 0.53214430809021
split complete
start threads
mt: 0.5551605224609375
start processes
join processes
mp: 0.2800724506378174

arr 大小增加 10 的另一个因素:

init completed
serial: 5.8455305099487305
split complete
start threads
mt: 5.411392450332642
start processes
join processes
mp: 1.9705185890197754

是的,我也在 python 2.7 中尝试过,尽管 Threads 似乎更慢。