运行 具有不同参数的并行函数 - python
Run a function in parallel with different arguments - python
我有一个函数 slow_function
,它需要大约 200 秒来处理一个 job_title
,它从全局变量读取和写入。
使用此代码没有提高性能。我是不是遗漏了什么,但是 returns 结果相同。
并行编码运行五个工作类别:
from threading import Thread
threads = []
start = time.time()
for job_title in self.job_titles:
t = Thread(target=self.slow_function, args=(job_title,))
threads.append(t)
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
end = time.time()
print "New time taken for all jobs:", end - start
您需要使用多处理 (https://docs.python.org/2/library/multiprocessing.html) module, since the threading module is limited by the GIL (https://docs.python.org/2/glossary.html#term-global-interpreter-lock)。
但是您不能使用全局变量在生成的进程之间交换数据!!! ... 参见 https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes
您应该从 class 方法中提取 slow_function,因为不可能在进程之间共享本地上下文。然后你可以使用这个代码:
from multiprocessing import Pool
start = time.time()
pool = Pool()
results = pool.map(slow_function, self.job_titles)
for r in results:
# update your `global` variables here
end = time.time()
print "New time taken for all jobs:", end - start
我有一个函数 slow_function
,它需要大约 200 秒来处理一个 job_title
,它从全局变量读取和写入。
使用此代码没有提高性能。我是不是遗漏了什么,但是 returns 结果相同。
并行编码运行五个工作类别:
from threading import Thread
threads = []
start = time.time()
for job_title in self.job_titles:
t = Thread(target=self.slow_function, args=(job_title,))
threads.append(t)
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
end = time.time()
print "New time taken for all jobs:", end - start
您需要使用多处理 (https://docs.python.org/2/library/multiprocessing.html) module, since the threading module is limited by the GIL (https://docs.python.org/2/glossary.html#term-global-interpreter-lock)。
但是您不能使用全局变量在生成的进程之间交换数据!!! ... 参见 https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes
您应该从 class 方法中提取 slow_function,因为不可能在进程之间共享本地上下文。然后你可以使用这个代码:
from multiprocessing import Pool
start = time.time()
pool = Pool()
results = pool.map(slow_function, self.job_titles)
for r in results:
# update your `global` variables here
end = time.time()
print "New time taken for all jobs:", end - start