如何在字典上使用线程来提高时间复杂度?

How to use threading on dictionaries to improve time complexity?

我是线程新手。我所知道的是我们可以在函数上调用线程,但我想在字典上调用它。

我有一本字典,在不同的索引中有随机数。我想找到所有这些数字的总和。我想要做的基本上是为该字典的每个 row/index 使用一个线程。该单个线程将找到该特定行中所有数字的总和,然后将所有线程的这些总和加在一起得到最终结果。

import random
import time

li = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u"
, "v", "w", "x", "y"]

arr = {}

for k in range(0, 25):
    arr[li[k]] = [random.randrange(1, 10, 1) for i in range(1000000)]

start = time.perf_counter()

sum = 0
for k, v in arr.items():
    for value in v:
        sum += value 

end = time.perf_counter()

print(sum)

print("Finished in: ", round(end-start, 2), " seconds")

我以前用简单的方法来做,总共花了我大约 86 秒(由于将数字分配给字典),总共花了 5 秒来计算总和。

我想通过为字典的每个索引创建线程来改进这 5 秒的总和计算。谁能帮我解决这个问题?

因此,这是一个示例,说明如何使用 multiprocessing 解决“map-reduce”样式的求和问题。

这很大程度上假设每个子问题(由 process_key 表示)与其余问题无关。

最终归约(将所有关键结果加在一起)由主程序完成。

import multiprocessing
import os
import string
import time
from typing import Tuple, List


def get_key_data(key: str) -> List[int]:
    # Get data for a given key from a database or wherever;
    # here we just get a big blob of random bytes.
    return list(os.urandom(1_000_000))


def process_key(key: str) -> Tuple[str, int]:
    # This function is run in a separate process,
    # so it can't access global data in the same way a function
    # in the same process could.  Program accordingly.
    key_data = get_key_data(key)
    result_for_key = sum(key_data)  # Could be heavier computation here...

    # Returning a tuple makes it easier to work with the keyed data in the main program.
    return (key, result_for_key)


def main():
    start = time.perf_counter()
    keys = list(string.ascii_lowercase)
    with multiprocessing.Pool() as p:
        results = {}
        # Since result order doesn't matter, we can use `imap_unordered` to optimize performance.
        # It would also be worth adding `chunksize=...` to spend less time in serializers.
        for key, result in p.imap_unordered(process_key, keys):  # unpacking result tuples here
            print(f"Got result {result} for key {key}")
            results[key] = result
    grand_total = sum(results.values())
    end = time.perf_counter()

    print(f"Grand total: {grand_total} in {end - start:.2f} seconds")


if __name__ == '__main__':
    main()

这打印出来(类似于)

Got result 127439637 for key y
Got result 127521766 for key z
Got result 127410016 for key a
Got result 127618358 for key b
Got result 127510624 for key c
Got result 127525228 for key d
Got result 127471359 for key e
Got result 127535553 for key f
Got result 127457231 for key m
Got result 127547738 for key n
Got result 127567059 for key o
Got result 127470823 for key g
Got result 127465435 for key h
Got result 127497010 for key i
Got result 127432593 for key j
Got result 127555330 for key k
Got result 127402226 for key l
Got result 127534939 for key p
Got result 127558057 for key q
Got result 127474231 for key r
Got result 127491137 for key v
Got result 127520358 for key w
Got result 127490582 for key x
Got result 127489005 for key s
Got result 127485159 for key t
Got result 127503702 for key u
Grand total: 3314975156 in 0.60 seconds

I know...we can call threads on functions.

没有。您不能调用 任何话题。当你这样写时:

thread = threading.Thread(foobar, args=(x, y, z))

您没有调用线程。您正在调用 Thread class 的 构造函数 。构造函数创建一个新的 Thread 对象,然后是 Thread 执行调用: Thread 调用 foobar(x, y, z).

What I want to do is basically to use a thread for every single row/index of that dictionary. That single thread will find sum of all the numbers in that specific row and...

线程 运行 代码,您必须以函数的形式提供线程将 运行 的代码。如果你想让一个线程“找到特定行中所有数字的总和......”* 那么你必须编写一个函数来找到所有数字的总和,然后你必须创建将调用您的函数的新 Thread


* 关于您问题的其他一些答案和评论解释了 Python 的全局解释器锁(a.k.a., GIL)如何阻止您使用线程来让你的程序 运行 更快。所以,这个答案的其余部分是幻想,因为它不会让你的程序更快,但它确实说明了如何创建线程。


您可能希望将字典和行号作为参数传递给函数。也许您还想向它传递一些可变的结果结构(例如数组),函数可以将结果保存到其中。

def FindRowSum(dictionary, row, results):
    sum = 0
    for ...:
        sum = sum + ...
    results[row] = sum

...

allThreads = []
results = []
for row in range(...):
    thread = threading.Thread(FindRowSum, args=(myDictionary, row, results))
    allThreads.append(thread)

然后,再往下,如果你想等待所有线程完成他们的工作:

for thread in allThreads:
    thread.join()