使用多处理时 python 中的慢泡菜转储
Slow pickle dump in python when using multiprocessing
所以,我正在尝试使用多处理模块并行化一个用 python 3.7 解决 pyomo 实例的函数。该代码有效,但启动时间很荒谬(每个进程约 25 秒)。奇怪的是,我在另一台计算机上尝试了相同的代码,但功能较弱的计算机下降了约 2 秒(相同的代码,相同数量的并行进程,除了 Python 之外的所有版本都相同,这是 3.6那台电脑)。
使用cProfile,我发现pickler 的dump 方法是消耗那么多时间的方法,但我似乎无法理解为什么要花这么长时间。数据很小,我使用 sys.getsizeof() 检查并行化函数的任何参数是否比预期的大,但事实并非如此。
有谁知道泡菜转储缓慢的原因是什么?
代码:
from pyomo.environ import *
from pyomo.opt import SolverFactory, TerminationCondition
from pyomo.opt.parallel import SolverManagerFactory
import sys
import multiprocessing
def worker(init_nodes[i_nodo][j_nodo], data, optsolver, queue, shared_incumbent_data):
#[pyomo instances solving and constraining]
return
def foo(model, data, optsolver, processes = multiprocessing.cpu_count()):
queue = multiprocessing.Queue()
process_dict = {}
for i_node in range(len(init_nodes)): #init_nodes is a list containing lists of pyomo instances
for j_node in range(len(init_nodes[i_node])):
process_name = str(i_node) + str(j_node)
print(" - Data size:", sys.getsizeof(data)) #same for all of the args
process_dict[process_name] = multiprocessing.Process(target=worker, args=(init_nodes[i_nodo][j_nodo], data, optsolver, queue, shared_incumbent_data))
pr = cProfile.Profile()
pr.enable()
process_dict[process_name].start()
pr.disable()
ps = pstats.Stats(pr)
ps.sort_stats('time').print_stats(5)
for n_nodo in process_dict:
process_dict[n_nodo].join(timeout=0)
#imports
#[model definition]
#[data is obtained from 3 .tab files, the biggest one has a 30 x 40 matrix, with 1 to 3 digit integers]
optsolver = SolverFactory("gurobi")
if __name__ == "__main__":
foo(model, data, optsolver, 4)
sys.getsizeof() 获得的参数大小和第一台计算机上 .start() 的配置文件
- Data size: 56
- Init_nodes size: 72
- Queue size: 56
- Shared incumbent data size: 56
7150 function calls (7139 primitive calls) in 25.275 seconds
Ordered by: internal time
List reduced from 184 to 5 due to restriction <5>
ncalls tottime percall cumtime percall filename:lineno(function)
2 25.262 12.631 25.267 12.634 {method 'dump' of '_pickle.Pickler' objects}
1 0.004 0.004 0.004 0.004 {built-in method _winapi.CreateProcess}
1265 0.002 0.000 0.004 0.000 C:\Users\OLab\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\expr\numeric_expr.py:186(__getstate__)
2 0.001 0.001 0.002 0.001 <frozen importlib._bootstrap_external>:914(get_data)
1338 0.001 0.000 0.002 0.000 C:\Users\OLab\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\expr\numvalue.py:545(__getstate__)
通过 sys.getsizeof() 获得的参数大小和第二台计算机上 .start() 的配置文件
- Data size: 56
- Init_nodes size: 72
- Queue size: 56
- Shared incumbent data size: 56
7257 function calls (7247 primitive calls) in 1.742 seconds
Ordered by: internal time
List reduced from 184 to 5 due to restriction <5>
ncalls tottime percall cumtime percall filename:lineno(function)
2 1.722 0.861 1.730 0.865 {method 'dump' of '_pickle.Pickler' objects}
1 0.009 0.009 0.009 0.009 {built-in method _winapi.CreateProcess}
1265 0.002 0.000 0.005 0.000 C:\Users\Palbo\Anaconda2\envs\py3\lib\site-packages\pyomo\core\expr\numeric_expr.py:186(__getstate__)
1339 0.002 0.000 0.003 0.000 C:\Users\Palbo\Anaconda2\envs\py3\lib\site-packages\pyomo\core\expr\numvalue.py:545(__getstate__)
1523 0.001 0.000 0.001 0.000 {built-in method builtins.hasattr}
干杯!
第一台应该更快但实际上没有的计算机的规格:
- Windows 10 Pro for Workstations
- Intel Xeon Silver 4114 CPU @2.20 GHz 2.19 GHz(每个 10 核)
- 64 GB 内存
第二台计算机规格:
- Windows 8.1
- 英特尔酷睿 i3-2348M CPU @2.30 Ghz 2.30 Ghz(每个 2 核)
- 6 GB 内存
通过将函数参数的 pickle 转储到一个文件中,然后将文件名作为 worker() 函数的参数传递,然后在每个函数中打开每个文件,最终找到了解决方案并行处理。
转储时间从 ~24[s] 减少到 ~0.005[s]!
def worker(pickled_file_name, queue, shared_incumbent):
with open(pickled_file_name, "rb") as f:
data_tuple = pickle.load(f, encoding='bytes')
instance, data, optsolver, int_var_list, process_name, relaxed_incumbent = data_tuple
return
def foo():
[...]
picklefile = open("pickled_vars"+str(i_nodo)+str(j_nodo)+".p", "wb")
picklefile.write(pickle.dumps(variables_,-1))
picklefile.close()
process_dict[process_name] = multiprocessing.Process(target=bnbparallelbranching, args=("pickled_vars"+str(i_nodo)+str(j_nodo)+".p", q, shared_incumbent_data))
process_dict[process_name].start()
所以,我正在尝试使用多处理模块并行化一个用 python 3.7 解决 pyomo 实例的函数。该代码有效,但启动时间很荒谬(每个进程约 25 秒)。奇怪的是,我在另一台计算机上尝试了相同的代码,但功能较弱的计算机下降了约 2 秒(相同的代码,相同数量的并行进程,除了 Python 之外的所有版本都相同,这是 3.6那台电脑)。
使用cProfile,我发现pickler 的dump 方法是消耗那么多时间的方法,但我似乎无法理解为什么要花这么长时间。数据很小,我使用 sys.getsizeof() 检查并行化函数的任何参数是否比预期的大,但事实并非如此。
有谁知道泡菜转储缓慢的原因是什么?
代码:
from pyomo.environ import *
from pyomo.opt import SolverFactory, TerminationCondition
from pyomo.opt.parallel import SolverManagerFactory
import sys
import multiprocessing
def worker(init_nodes[i_nodo][j_nodo], data, optsolver, queue, shared_incumbent_data):
#[pyomo instances solving and constraining]
return
def foo(model, data, optsolver, processes = multiprocessing.cpu_count()):
queue = multiprocessing.Queue()
process_dict = {}
for i_node in range(len(init_nodes)): #init_nodes is a list containing lists of pyomo instances
for j_node in range(len(init_nodes[i_node])):
process_name = str(i_node) + str(j_node)
print(" - Data size:", sys.getsizeof(data)) #same for all of the args
process_dict[process_name] = multiprocessing.Process(target=worker, args=(init_nodes[i_nodo][j_nodo], data, optsolver, queue, shared_incumbent_data))
pr = cProfile.Profile()
pr.enable()
process_dict[process_name].start()
pr.disable()
ps = pstats.Stats(pr)
ps.sort_stats('time').print_stats(5)
for n_nodo in process_dict:
process_dict[n_nodo].join(timeout=0)
#imports
#[model definition]
#[data is obtained from 3 .tab files, the biggest one has a 30 x 40 matrix, with 1 to 3 digit integers]
optsolver = SolverFactory("gurobi")
if __name__ == "__main__":
foo(model, data, optsolver, 4)
sys.getsizeof() 获得的参数大小和第一台计算机上 .start() 的配置文件
- Data size: 56
- Init_nodes size: 72
- Queue size: 56
- Shared incumbent data size: 56
7150 function calls (7139 primitive calls) in 25.275 seconds
Ordered by: internal time
List reduced from 184 to 5 due to restriction <5>
ncalls tottime percall cumtime percall filename:lineno(function)
2 25.262 12.631 25.267 12.634 {method 'dump' of '_pickle.Pickler' objects}
1 0.004 0.004 0.004 0.004 {built-in method _winapi.CreateProcess}
1265 0.002 0.000 0.004 0.000 C:\Users\OLab\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\expr\numeric_expr.py:186(__getstate__)
2 0.001 0.001 0.002 0.001 <frozen importlib._bootstrap_external>:914(get_data)
1338 0.001 0.000 0.002 0.000 C:\Users\OLab\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\expr\numvalue.py:545(__getstate__)
通过 sys.getsizeof() 获得的参数大小和第二台计算机上 .start() 的配置文件
- Data size: 56
- Init_nodes size: 72
- Queue size: 56
- Shared incumbent data size: 56
7257 function calls (7247 primitive calls) in 1.742 seconds
Ordered by: internal time
List reduced from 184 to 5 due to restriction <5>
ncalls tottime percall cumtime percall filename:lineno(function)
2 1.722 0.861 1.730 0.865 {method 'dump' of '_pickle.Pickler' objects}
1 0.009 0.009 0.009 0.009 {built-in method _winapi.CreateProcess}
1265 0.002 0.000 0.005 0.000 C:\Users\Palbo\Anaconda2\envs\py3\lib\site-packages\pyomo\core\expr\numeric_expr.py:186(__getstate__)
1339 0.002 0.000 0.003 0.000 C:\Users\Palbo\Anaconda2\envs\py3\lib\site-packages\pyomo\core\expr\numvalue.py:545(__getstate__)
1523 0.001 0.000 0.001 0.000 {built-in method builtins.hasattr}
干杯!
第一台应该更快但实际上没有的计算机的规格:
- Windows 10 Pro for Workstations
- Intel Xeon Silver 4114 CPU @2.20 GHz 2.19 GHz(每个 10 核)
- 64 GB 内存
第二台计算机规格:
- Windows 8.1
- 英特尔酷睿 i3-2348M CPU @2.30 Ghz 2.30 Ghz(每个 2 核)
- 6 GB 内存
通过将函数参数的 pickle 转储到一个文件中,然后将文件名作为 worker() 函数的参数传递,然后在每个函数中打开每个文件,最终找到了解决方案并行处理。
转储时间从 ~24[s] 减少到 ~0.005[s]!
def worker(pickled_file_name, queue, shared_incumbent):
with open(pickled_file_name, "rb") as f:
data_tuple = pickle.load(f, encoding='bytes')
instance, data, optsolver, int_var_list, process_name, relaxed_incumbent = data_tuple
return
def foo():
[...]
picklefile = open("pickled_vars"+str(i_nodo)+str(j_nodo)+".p", "wb")
picklefile.write(pickle.dumps(variables_,-1))
picklefile.close()
process_dict[process_name] = multiprocessing.Process(target=bnbparallelbranching, args=("pickled_vars"+str(i_nodo)+str(j_nodo)+".p", q, shared_incumbent_data))
process_dict[process_name].start()