如何在字典上使用线程来提高时间复杂度?
How to use threading on dictionaries to improve time complexity?
我是线程新手。我所知道的是我们可以在函数上调用线程,但我想在字典上调用它。
我有一本字典,在不同的索引中有随机数。我想找到所有这些数字的总和。我想要做的基本上是为该字典的每个 row/index 使用一个线程。该单个线程将找到该特定行中所有数字的总和,然后将所有线程的这些总和加在一起得到最终结果。
import random
import time
li = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u"
, "v", "w", "x", "y"]
arr = {}
for k in range(0, 25):
arr[li[k]] = [random.randrange(1, 10, 1) for i in range(1000000)]
start = time.perf_counter()
sum = 0
for k, v in arr.items():
for value in v:
sum += value
end = time.perf_counter()
print(sum)
print("Finished in: ", round(end-start, 2), " seconds")
我以前用简单的方法来做,总共花了我大约 86 秒(由于将数字分配给字典),总共花了 5 秒来计算总和。
我想通过为字典的每个索引创建线程来改进这 5 秒的总和计算。谁能帮我解决这个问题?
因此,这是一个示例,说明如何使用 multiprocessing
解决“map-reduce”样式的求和问题。
这很大程度上假设每个子问题(由 process_key
表示)与其余问题无关。
最终归约(将所有关键结果加在一起)由主程序完成。
import multiprocessing
import os
import string
import time
from typing import Tuple, List
def get_key_data(key: str) -> List[int]:
# Get data for a given key from a database or wherever;
# here we just get a big blob of random bytes.
return list(os.urandom(1_000_000))
def process_key(key: str) -> Tuple[str, int]:
# This function is run in a separate process,
# so it can't access global data in the same way a function
# in the same process could. Program accordingly.
key_data = get_key_data(key)
result_for_key = sum(key_data) # Could be heavier computation here...
# Returning a tuple makes it easier to work with the keyed data in the main program.
return (key, result_for_key)
def main():
start = time.perf_counter()
keys = list(string.ascii_lowercase)
with multiprocessing.Pool() as p:
results = {}
# Since result order doesn't matter, we can use `imap_unordered` to optimize performance.
# It would also be worth adding `chunksize=...` to spend less time in serializers.
for key, result in p.imap_unordered(process_key, keys): # unpacking result tuples here
print(f"Got result {result} for key {key}")
results[key] = result
grand_total = sum(results.values())
end = time.perf_counter()
print(f"Grand total: {grand_total} in {end - start:.2f} seconds")
if __name__ == '__main__':
main()
这打印出来(类似于)
Got result 127439637 for key y
Got result 127521766 for key z
Got result 127410016 for key a
Got result 127618358 for key b
Got result 127510624 for key c
Got result 127525228 for key d
Got result 127471359 for key e
Got result 127535553 for key f
Got result 127457231 for key m
Got result 127547738 for key n
Got result 127567059 for key o
Got result 127470823 for key g
Got result 127465435 for key h
Got result 127497010 for key i
Got result 127432593 for key j
Got result 127555330 for key k
Got result 127402226 for key l
Got result 127534939 for key p
Got result 127558057 for key q
Got result 127474231 for key r
Got result 127491137 for key v
Got result 127520358 for key w
Got result 127490582 for key x
Got result 127489005 for key s
Got result 127485159 for key t
Got result 127503702 for key u
Grand total: 3314975156 in 0.60 seconds
I know...we can call threads on functions.
没有。您不能调用 任何话题。当你这样写时:
thread = threading.Thread(foobar, args=(x, y, z))
您没有调用线程。您正在调用 Thread
class 的 构造函数 。构造函数创建一个新的 Thread
对象,然后是 Thread
执行调用: Thread
调用 foobar(x, y, z)
.
What I want to do is basically to use a thread for every single row/index of that dictionary. That single thread will find sum of all the numbers in that specific row and...
线程 运行 代码,您必须以函数的形式提供线程将 运行 的代码。如果你想让一个线程“找到特定行中所有数字的总和......”* 那么你必须编写一个函数来找到所有数字的总和,然后你必须创建将调用您的函数的新 Thread
。
* 关于您问题的其他一些答案和评论解释了 Python 的全局解释器锁(a.k.a., GIL)如何阻止您使用线程来让你的程序 运行 更快。所以,这个答案的其余部分是幻想,因为它不会让你的程序更快,但它确实说明了如何创建线程。
您可能希望将字典和行号作为参数传递给函数。也许您还想向它传递一些可变的结果结构(例如数组),函数可以将结果保存到其中。
def FindRowSum(dictionary, row, results):
sum = 0
for ...:
sum = sum + ...
results[row] = sum
...
allThreads = []
results = []
for row in range(...):
thread = threading.Thread(FindRowSum, args=(myDictionary, row, results))
allThreads.append(thread)
然后,再往下,如果你想等待所有线程完成他们的工作:
for thread in allThreads:
thread.join()
我是线程新手。我所知道的是我们可以在函数上调用线程,但我想在字典上调用它。
我有一本字典,在不同的索引中有随机数。我想找到所有这些数字的总和。我想要做的基本上是为该字典的每个 row/index 使用一个线程。该单个线程将找到该特定行中所有数字的总和,然后将所有线程的这些总和加在一起得到最终结果。
import random
import time
li = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u"
, "v", "w", "x", "y"]
arr = {}
for k in range(0, 25):
arr[li[k]] = [random.randrange(1, 10, 1) for i in range(1000000)]
start = time.perf_counter()
sum = 0
for k, v in arr.items():
for value in v:
sum += value
end = time.perf_counter()
print(sum)
print("Finished in: ", round(end-start, 2), " seconds")
我以前用简单的方法来做,总共花了我大约 86 秒(由于将数字分配给字典),总共花了 5 秒来计算总和。
我想通过为字典的每个索引创建线程来改进这 5 秒的总和计算。谁能帮我解决这个问题?
因此,这是一个示例,说明如何使用 multiprocessing
解决“map-reduce”样式的求和问题。
这很大程度上假设每个子问题(由 process_key
表示)与其余问题无关。
最终归约(将所有关键结果加在一起)由主程序完成。
import multiprocessing
import os
import string
import time
from typing import Tuple, List
def get_key_data(key: str) -> List[int]:
# Get data for a given key from a database or wherever;
# here we just get a big blob of random bytes.
return list(os.urandom(1_000_000))
def process_key(key: str) -> Tuple[str, int]:
# This function is run in a separate process,
# so it can't access global data in the same way a function
# in the same process could. Program accordingly.
key_data = get_key_data(key)
result_for_key = sum(key_data) # Could be heavier computation here...
# Returning a tuple makes it easier to work with the keyed data in the main program.
return (key, result_for_key)
def main():
start = time.perf_counter()
keys = list(string.ascii_lowercase)
with multiprocessing.Pool() as p:
results = {}
# Since result order doesn't matter, we can use `imap_unordered` to optimize performance.
# It would also be worth adding `chunksize=...` to spend less time in serializers.
for key, result in p.imap_unordered(process_key, keys): # unpacking result tuples here
print(f"Got result {result} for key {key}")
results[key] = result
grand_total = sum(results.values())
end = time.perf_counter()
print(f"Grand total: {grand_total} in {end - start:.2f} seconds")
if __name__ == '__main__':
main()
这打印出来(类似于)
Got result 127439637 for key y
Got result 127521766 for key z
Got result 127410016 for key a
Got result 127618358 for key b
Got result 127510624 for key c
Got result 127525228 for key d
Got result 127471359 for key e
Got result 127535553 for key f
Got result 127457231 for key m
Got result 127547738 for key n
Got result 127567059 for key o
Got result 127470823 for key g
Got result 127465435 for key h
Got result 127497010 for key i
Got result 127432593 for key j
Got result 127555330 for key k
Got result 127402226 for key l
Got result 127534939 for key p
Got result 127558057 for key q
Got result 127474231 for key r
Got result 127491137 for key v
Got result 127520358 for key w
Got result 127490582 for key x
Got result 127489005 for key s
Got result 127485159 for key t
Got result 127503702 for key u
Grand total: 3314975156 in 0.60 seconds
I know...we can call threads on functions.
没有。您不能调用 任何话题。当你这样写时:
thread = threading.Thread(foobar, args=(x, y, z))
您没有调用线程。您正在调用 Thread
class 的 构造函数 。构造函数创建一个新的 Thread
对象,然后是 Thread
执行调用: Thread
调用 foobar(x, y, z)
.
What I want to do is basically to use a thread for every single row/index of that dictionary. That single thread will find sum of all the numbers in that specific row and...
线程 运行 代码,您必须以函数的形式提供线程将 运行 的代码。如果你想让一个线程“找到特定行中所有数字的总和......”* 那么你必须编写一个函数来找到所有数字的总和,然后你必须创建将调用您的函数的新 Thread
。
* 关于您问题的其他一些答案和评论解释了 Python 的全局解释器锁(a.k.a., GIL)如何阻止您使用线程来让你的程序 运行 更快。所以,这个答案的其余部分是幻想,因为它不会让你的程序更快,但它确实说明了如何创建线程。
您可能希望将字典和行号作为参数传递给函数。也许您还想向它传递一些可变的结果结构(例如数组),函数可以将结果保存到其中。
def FindRowSum(dictionary, row, results):
sum = 0
for ...:
sum = sum + ...
results[row] = sum
...
allThreads = []
results = []
for row in range(...):
thread = threading.Thread(FindRowSum, args=(myDictionary, row, results))
allThreads.append(thread)
然后,再往下,如果你想等待所有线程完成他们的工作:
for thread in allThreads:
thread.join()