为什么 "pickle" 和 "multiprocessing picklability" 在 Python 中如此不同?

Why is "pickle" and "multiprocessing picklability" so different in Python?

在 Windows 上使用 Python 的 multiprocessing 将要求许多参数在传递给子进程时是“可挑选的”。

import multiprocessing

class Foobar:

   def __getstate__(self):
       print("I'm being pickled!")

def worker(foobar):
   print(foobar)

if __name__ == "__main__":
    # Uncomment this on Linux
    # multiprocessing.set_start_method("spawn")

    foobar = Foobar()
    process = multiprocessing.Process(target=worker, args=(foobar, ))
    process.start()
    process.join()

文档mentions this explicitly多次:

Picklability

Ensure that the arguments to the methods of proxies are picklable.

[...]

Better to inherit than pickle/unpickle

When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

[...]

More picklability

Ensure that all arguments to Process.__init__() are picklable. Also, if you subclass Process then make sure that instances will be picklable when the Process.start method is called.

但是,我注意到“multiprocessing pickle”和标准 pickle 模块之间的两个主要区别,我很难理解所有这些。


multiprocessing.Queue() 不是“可拾取的”但可传递给子进程

import pickle
from multiprocessing import Queue, Process

def worker(queue):
    pass

if __name__ == "__main__":
    queue = Queue()

    # RuntimeError: Queue objects should only be shared between processes through inheritance
    pickle.dumps(queue)

    # Works fine
    process = Process(target=worker, args=(queue, ))
    process.start()
    process.join()
                                                                                                                                                                      

如果在“main

中定义则不可 pickle
import pickle
from multiprocessing import Process

def worker(foo):
    pass

if __name__ == "__main__":
    class Foo:
        pass

    foo = Foo()

    # Works fine
    pickle.dumps(foo)

    # AttributeError: Can't get attribute 'Foo' on <module '__mp_main__' from 'C:\Users\Delgan\test.py'>
    process = Process(target=worker, args=(foo, ))
    process.start()
    process.join()

如果multiprocessing在内部不使用pickle,那么这两种序列化对象的方式有什么内在区别?

此外,在多处理上下文中,“继承”是什么意思?我怎么会更喜欢它而不是泡菜?

当一个multiprocessing.Queue被传递给一个child进程时,实际发送的是一个从pipe获得的文件描述符(或句柄),它必须是由parent 在创建 child 之前。来自 pickle 的错误是为了防止尝试通过另一个 Queue(或类似通道)发送 Queue,因为那时使用它已经太晚了。 (Unix 系统确实支持通过某些类型的套接字发送管道,但 multiprocessing 不使用此类功能。)预计某些 multiprocessing 类型可以发送到 child 进程本来是无用的,所以没有提到 apparent 矛盾。

由于“spawn”启动方法无法创建任何已创建 Python objects 的新进程,它必须 re-import 主脚本获取相关的 function/class 定义。由于显而易见的原因,它没有像原来的 运行 那样设置 __name__,因此任何依赖于该设置的东西都将不可用。 (在这里,失败的是 unpickling,这就是为什么你的手动 pickling 起作用的原因。)

fork 方法仍然使用 parent 的 objects(仅在 fork 时)启动 children现存的;这就是继承的意思。