为什么 "pickle" 和 "multiprocessing picklability" 在 Python 中如此不同?
Why is "pickle" and "multiprocessing picklability" so different in Python?
在 Windows 上使用 Python 的 multiprocessing
将要求许多参数在传递给子进程时是“可挑选的”。
import multiprocessing
class Foobar:
def __getstate__(self):
print("I'm being pickled!")
def worker(foobar):
print(foobar)
if __name__ == "__main__":
# Uncomment this on Linux
# multiprocessing.set_start_method("spawn")
foobar = Foobar()
process = multiprocessing.Process(target=worker, args=(foobar, ))
process.start()
process.join()
Picklability
Ensure that the arguments to the methods of proxies are picklable.
[...]
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from multiprocessing
need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
[...]
More picklability
Ensure that all arguments to Process.__init__()
are picklable. Also, if you subclass Process
then make sure that instances will be picklable when the Process.start
method is called.
但是,我注意到“multiprocessing
pickle”和标准 pickle
模块之间的两个主要区别,我很难理解所有这些。
multiprocessing.Queue()
不是“可拾取的”但可传递给子进程
import pickle
from multiprocessing import Queue, Process
def worker(queue):
pass
if __name__ == "__main__":
queue = Queue()
# RuntimeError: Queue objects should only be shared between processes through inheritance
pickle.dumps(queue)
# Works fine
process = Process(target=worker, args=(queue, ))
process.start()
process.join()
如果在“main”
中定义则不可 pickle
import pickle
from multiprocessing import Process
def worker(foo):
pass
if __name__ == "__main__":
class Foo:
pass
foo = Foo()
# Works fine
pickle.dumps(foo)
# AttributeError: Can't get attribute 'Foo' on <module '__mp_main__' from 'C:\Users\Delgan\test.py'>
process = Process(target=worker, args=(foo, ))
process.start()
process.join()
如果multiprocessing
在内部不使用pickle
,那么这两种序列化对象的方式有什么内在区别?
此外,在多处理上下文中,“继承”是什么意思?我怎么会更喜欢它而不是泡菜?
当一个multiprocessing.Queue
被传递给一个child进程时,实际发送的是一个从pipe
获得的文件描述符(或句柄),它必须是由parent 在创建 child 之前。来自 pickle
的错误是为了防止尝试通过另一个 Queue
(或类似通道)发送 Queue
,因为那时使用它已经太晚了。 (Unix 系统确实支持通过某些类型的套接字发送管道,但 multiprocessing
不使用此类功能。)预计某些 multiprocessing
类型可以发送到 child 进程本来是无用的,所以没有提到 apparent 矛盾。
由于“spawn”启动方法无法创建任何已创建 Python objects 的新进程,它必须 re-import 主脚本获取相关的 function/class 定义。由于显而易见的原因,它没有像原来的 运行 那样设置 __name__
,因此任何依赖于该设置的东西都将不可用。 (在这里,失败的是 unpickling,这就是为什么你的手动 pickling 起作用的原因。)
fork 方法仍然使用 parent 的 objects(仅在 fork 时)启动 children现存的;这就是继承的意思。
在 Windows 上使用 Python 的 multiprocessing
将要求许多参数在传递给子进程时是“可挑选的”。
import multiprocessing
class Foobar:
def __getstate__(self):
print("I'm being pickled!")
def worker(foobar):
print(foobar)
if __name__ == "__main__":
# Uncomment this on Linux
# multiprocessing.set_start_method("spawn")
foobar = Foobar()
process = multiprocessing.Process(target=worker, args=(foobar, ))
process.start()
process.join()
Picklability
Ensure that the arguments to the methods of proxies are picklable.
[...]
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from
multiprocessing
need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.[...]
More picklability
Ensure that all arguments to
Process.__init__()
are picklable. Also, if you subclassProcess
then make sure that instances will be picklable when theProcess.start
method is called.
但是,我注意到“multiprocessing
pickle”和标准 pickle
模块之间的两个主要区别,我很难理解所有这些。
multiprocessing.Queue()
不是“可拾取的”但可传递给子进程
import pickle
from multiprocessing import Queue, Process
def worker(queue):
pass
if __name__ == "__main__":
queue = Queue()
# RuntimeError: Queue objects should only be shared between processes through inheritance
pickle.dumps(queue)
# Works fine
process = Process(target=worker, args=(queue, ))
process.start()
process.join()
如果在“main”
中定义则不可 pickleimport pickle
from multiprocessing import Process
def worker(foo):
pass
if __name__ == "__main__":
class Foo:
pass
foo = Foo()
# Works fine
pickle.dumps(foo)
# AttributeError: Can't get attribute 'Foo' on <module '__mp_main__' from 'C:\Users\Delgan\test.py'>
process = Process(target=worker, args=(foo, ))
process.start()
process.join()
如果multiprocessing
在内部不使用pickle
,那么这两种序列化对象的方式有什么内在区别?
此外,在多处理上下文中,“继承”是什么意思?我怎么会更喜欢它而不是泡菜?
当一个multiprocessing.Queue
被传递给一个child进程时,实际发送的是一个从pipe
获得的文件描述符(或句柄),它必须是由parent 在创建 child 之前。来自 pickle
的错误是为了防止尝试通过另一个 Queue
(或类似通道)发送 Queue
,因为那时使用它已经太晚了。 (Unix 系统确实支持通过某些类型的套接字发送管道,但 multiprocessing
不使用此类功能。)预计某些 multiprocessing
类型可以发送到 child 进程本来是无用的,所以没有提到 apparent 矛盾。
由于“spawn”启动方法无法创建任何已创建 Python objects 的新进程,它必须 re-import 主脚本获取相关的 function/class 定义。由于显而易见的原因,它没有像原来的 运行 那样设置 __name__
,因此任何依赖于该设置的东西都将不可用。 (在这里,失败的是 unpickling,这就是为什么你的手动 pickling 起作用的原因。)
fork 方法仍然使用 parent 的 objects(仅在 fork 时)启动 children现存的;这就是继承的意思。