最大 cpu 使用嵌套 for 循环的最简单方法是什么?
What is the easiest way to make maximum cpu usage for nested for-loops?
我有代码可以对元素进行独特的组合。共有 6 种,每种约有 100 种。所以有 100^6 种组合。必须计算每个组合,检查相关性,然后丢弃或保存。
代码的相关部分如下所示:
def modconffactory():
for transmitter in totaltransmitterdict.values():
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
现在这需要很长时间,但没关系,但现在我意识到这个过程(进行配置和计算以供以后使用)一次只使用我的 8 个处理器内核中的 1 个。
我一直在阅读有关多线程和多处理的内容,但我只看到了不同进程的示例,而没有看到如何对一个进程进行多线程处理。在我的代码中,我调用了两个函数:'dosomethingwith()' 和 'saveforlateruse_if_useful()'。我可以将 those 分成单独的进程,并使 those 运行 与 for 循环同时进行,对吗?
但是 for 循环本身呢?我可以加快那个过程吗?因为那是时间消耗所在。 (<-- 这是我的主要问题)
有外挂吗?例如编译为 C 然后 os 自动多线程?
您可以 运行 您的函数是这样的:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
我只看到不同进程的例子,没有看到一个进程如何多线程
Python里面有多线程,但是因为GIL(Global Interpreter Lock),所以效率很低。所以如果你想使用你所有的处理器核心,如果你想要并发,你别无选择,只能使用多个进程,这可以通过 multiprocessing
模块来完成(好吧,你也可以使用另一种语言而不会出现这样的问题)
您的案例的多处理用法的大致示例:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(generator, step, offset, conn):
"""
Function to be invoked by every worker process.
generator: iterable object, the very top one of all you are iterating over,
in your case, totalrecieverdict.values()
We are passing a whole iterable object to every worker, they all will iterate
over it. To ensure they will not waste time by doing the same things
concurrently, we will assume this: each worker will process only each stepTH
item, starting with offsetTH one. step must be equal to the WORKERS_NUMBER,
and offset must be a unique number for each worker, varying from 0 to
WORKERS_NUMBER - 1
conn: a multiprocessing.Connection object, allowing the worker to communicate
with the main process
"""
for i, transmitter in enumerate(generator):
if i % step == offset:
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
conn.send('done')
def modconffactory():
"""
Function to launch all the worker processes and wait until they all complete
their tasks
"""
processes = []
generator = totaltransmitterdict.values()
for i in range(WORKERS_NUMBER):
conn, childConn = multiprocessing.Pipe()
process = multiprocessing.Process(target=modconffactoryProcess, args=(generator, WORKERS_NUMBER, i, childConn))
process.start()
processes.append((process, conn))
# Here we have created, started and saved to a list all the worker processes
working = True
finishedProcessesNumber = 0
try:
while working:
for process, conn in processes:
if conn.poll(): # Check if any messages have arrived from a worker
message = conn.recv()
if message == 'done':
finishedProcessesNumber += 1
if finishedProcessesNumber == WORKERS_NUMBER:
working = False
except KeyboardInterrupt:
print('Aborted')
您可以根据需要调整WORKERS_NUMBER
。
与multiprocessing.Pool
相同:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(transmitter):
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
def modconffactory():
pool = multiprocessing.Pool(WORKERS_NUMBER)
pool.map(modconffactoryProcess, totaltransmitterdict.values())
您可能想使用 .map_async
而不是 .map
两个片段的作用相同,但我会说在第一个片段中您对程序有更多的控制权。
不过我想第二个是最简单的:)
但是第一个应该让您了解第二个中发生的事情
multiprocessing
文档:https://docs.python.org/3/library/multiprocessing.html
我有代码可以对元素进行独特的组合。共有 6 种,每种约有 100 种。所以有 100^6 种组合。必须计算每个组合,检查相关性,然后丢弃或保存。
代码的相关部分如下所示:
def modconffactory():
for transmitter in totaltransmitterdict.values():
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
现在这需要很长时间,但没关系,但现在我意识到这个过程(进行配置和计算以供以后使用)一次只使用我的 8 个处理器内核中的 1 个。
我一直在阅读有关多线程和多处理的内容,但我只看到了不同进程的示例,而没有看到如何对一个进程进行多线程处理。在我的代码中,我调用了两个函数:'dosomethingwith()' 和 'saveforlateruse_if_useful()'。我可以将 those 分成单独的进程,并使 those 运行 与 for 循环同时进行,对吗?
但是 for 循环本身呢?我可以加快那个过程吗?因为那是时间消耗所在。 (<-- 这是我的主要问题)
有外挂吗?例如编译为 C 然后 os 自动多线程?
您可以 运行 您的函数是这样的:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
我只看到不同进程的例子,没有看到一个进程如何多线程
Python里面有多线程,但是因为GIL(Global Interpreter Lock),所以效率很低。所以如果你想使用你所有的处理器核心,如果你想要并发,你别无选择,只能使用多个进程,这可以通过 multiprocessing
模块来完成(好吧,你也可以使用另一种语言而不会出现这样的问题)
您的案例的多处理用法的大致示例:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(generator, step, offset, conn):
"""
Function to be invoked by every worker process.
generator: iterable object, the very top one of all you are iterating over,
in your case, totalrecieverdict.values()
We are passing a whole iterable object to every worker, they all will iterate
over it. To ensure they will not waste time by doing the same things
concurrently, we will assume this: each worker will process only each stepTH
item, starting with offsetTH one. step must be equal to the WORKERS_NUMBER,
and offset must be a unique number for each worker, varying from 0 to
WORKERS_NUMBER - 1
conn: a multiprocessing.Connection object, allowing the worker to communicate
with the main process
"""
for i, transmitter in enumerate(generator):
if i % step == offset:
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
conn.send('done')
def modconffactory():
"""
Function to launch all the worker processes and wait until they all complete
their tasks
"""
processes = []
generator = totaltransmitterdict.values()
for i in range(WORKERS_NUMBER):
conn, childConn = multiprocessing.Pipe()
process = multiprocessing.Process(target=modconffactoryProcess, args=(generator, WORKERS_NUMBER, i, childConn))
process.start()
processes.append((process, conn))
# Here we have created, started and saved to a list all the worker processes
working = True
finishedProcessesNumber = 0
try:
while working:
for process, conn in processes:
if conn.poll(): # Check if any messages have arrived from a worker
message = conn.recv()
if message == 'done':
finishedProcessesNumber += 1
if finishedProcessesNumber == WORKERS_NUMBER:
working = False
except KeyboardInterrupt:
print('Aborted')
您可以根据需要调整WORKERS_NUMBER
。
与multiprocessing.Pool
相同:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(transmitter):
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
def modconffactory():
pool = multiprocessing.Pool(WORKERS_NUMBER)
pool.map(modconffactoryProcess, totaltransmitterdict.values())
您可能想使用 .map_async
而不是 .map
两个片段的作用相同,但我会说在第一个片段中您对程序有更多的控制权。
不过我想第二个是最简单的:)
但是第一个应该让您了解第二个中发生的事情
multiprocessing
文档:https://docs.python.org/3/library/multiprocessing.html