Python 多处理池在创建时与命名空间交互

Python multiprocessing Pool Interaction With Namespace At Creation

我们知道multiprocessing.Pool必须在函数定义后初始化运行。但是我发现下面的代码对我来说是难以理解的

import os
from multiprocessing import Pool

def func(i): print('first')

pool1 = Pool(2)
pool1.map(func, range(2))         #map-1

def func(i): print('second')
func2 = func

print('------')
pool1.map(func,  range(2))        #map-2
pool1.map(func2,  range(2))       #map-3

pool2 = Pool(2)
print('------')
pool2.map(func,   range(2))       #map-4
pool2.map(func2,  range(2))       #map-5

输出(python2.7 和 python3.4 在 linux 上)是

first         #map-1
first
------
first         #map-2
first
first         #map-3
first
------
second        #map-4
second
second        #map-5
second

map-2 打印出 'first' 正如我们预期的那样。 但是 map-3 是如何找到名字 func2 的呢?我的意思是 pool1func2 第一次出现之前被初始化。所以 func2 = func 确实被执行了,而 def func(i): print('second') 没有。为什么?

如果我直接通过

定义func2
def func2(i): print('second')

然后 map-3 将找不到许多帖子提到的名称 func2,例如。 this one。两种情况有什么区别?

据我所知,参数是通过酸洗传递给从属进程的,但是 pool如何将调用的函数传递给其他进程?或者子进程如何找到调用的函数?

tl;drmap-3 处的问题,其中第一个 func 被调用,而人们期望第二个 func是因为 Pool.map() 使用 pickle 序列化 func.__name__,即使它被分配给 func2 引用,它也解析为 func,并被发送到 child 进程,它在本地查找 func 到 child 进程。



好的,所以我可以数出下面列出的四个不同的问题,我认为您已经讲过名称空间和分叉过程,直接进入您的问题的乐趣☺

① But how does map-3 find the name func2?

② So func2 = func is indeed executed, while def func(i): print('second') is not. Why?

③ Then map-3 won't find name func2 as mentioned by many posts, eg. this one. What's the difference between two cases?

④ As I understand the arguments are passed to the slave processes by pickling, but how does pool pass the called function to other processes? Or how do sub-processes find the called function?

所以我添加了更多代码,以展示更多内部结构:

import os
from multiprocessing import Pool

print(os.getpid(), 'parent')

def func(i):
    print(os.getpid(), 'first', end=" | ")
    if 'func' in globals():
        print(globals()['func'], end=" | ")
    else:
        print("no func in globals", end=" | ")
    if 'func2' in globals():
        print(globals()['func2'])
    else:
        print("no func2 in globals")

print('------ map-1')
pool1 = Pool(2)
pool1.map(func, range(2))         #map-1

def func(i):
    print(os.getpid(), 'second', end=" | ")
    if 'func' in globals():
        print(globals()['func'], end=" | ")
    else:
        print("no func in globals", end=" | ")
    if 'func2' in globals():
        print(globals()['func2'])
    else:
        print("no func2 in globals")
func2 = func

print('------ map-2')
pool1.map(func,  range(2))        #map-2
print('------ map-3')
pool1.map(func2,  range(2))       #map-3

pool2 = Pool(2)
print('------ map-4')
pool2.map(func,   range(2))       #map-4
print('------ map-5')
pool2.map(func2,  range(2))       #map-5

我的系统输出:

21512 parent
------ map-1
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-2
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-3
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-4
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
------ map-5
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>

所以,我们可以看到对于 pool1,从来没有将 func2 添加到命名空间。所以那里肯定有什么可疑的事情发生,我已经太晚了,无法彻底查看 multiprocessing 的源代码和调试器以了解正在发生的事情。

因此,如果我不得不猜测 ① 的答案,pickle 模块会以某种方式发现 func2 解析为 0x7f62d531bed8,它已经与标签 [=25] 一起存在=],因此它会在 children 一侧腌制已知的“标签”func,解析为 0x7f62d67f7cf8。即:

func2 → 0x7f62d531bed8 → func → [PICKLE] → globals()['func'] → 0x7f62d67f7cf8

为了检验我的理论,我稍微更改了您的代码,将第二个 func() 重命名为 func2(),这就是我得到的结果:

------ map-3
Process PoolWorker-1:
Process PoolWorker-2:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
    return recv()
AttributeError: 'module' object has no attribute 'func2'
AttributeError: 'module' object has no attribute 'func2'

然后把func = func2也改成func2 = func

------ map-2
Process PoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process PoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
    return recv()
AttributeError: 'module' object has no attribute 'func2'
AttributeError: 'module' object has no attribute 'func2'

所以我相信我开始表达观点了。而且,它还显示了在 children 进程方面阅读代码以了解正在发生的事情的位置。

以便更多的线索来回答②和③!

为了更进一步,我在 pool.py 第 114 行中添加了一个打印语句:

    job, i, func, args, kwds = task
    print("XXX", os.getpid(), job, i, func, args, kwds)

显示正在发生的事情。我们可以看到 func 被解析为 0x7f2d0238fcf8,这与 parent 函数中的地址相同:

23432 parent
------ map-1
('XXX', 23433, 0, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 0, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-2
('XXX', 23433, 1, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 1, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-3
('XXX', 23433, 2, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 2, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-4
('XXX', 23438, 3, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {})
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60>
('XXX', 23439, 3, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {})
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60>
------ map-5
('XXX', 23438, 4, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {})
('XXX', 23439, 4, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {})
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60>
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60>

所以要回答 ④,我们需要进一步挖掘多处理源,甚至可能在 pickle 源中。

但我想我对决议的感觉可能是正确的…… 然后唯一剩下的问题是 为什么 将标签解析为地址并再次返回标签,然后再将其推送到 children 进程!


编辑:我想我知道为什么了!当我要睡觉的时候,我突然想到了原因,所以我回到了我的键盘:

当 pickle 函数时,pickles 获取包含函数的参数,并从函数的 object 本身获取其名称:

所以即使您确实创建了一个新函数 object,您也会在内存中获得不同的地址:

>>> print(func)
<function func at 0x7fc6174e3ed8>

pickles 不在乎,因为如果 child 无法访问该函数,它将永远无法访问。所以 pickle 只解析 func.__name__:

>>> print("func.__name__:", func.__name__)
func.__name__: func
>>> print("func2.__name__:", func2.__name__)
func2.__name__: func

然后,即使您在 parent 线程上更改了函数的主体,并且对该函数进行了新的引用,真正被 pickle 的是函数的内部名称,它是在lambda 被赋值或函数被定义。

这解释了为什么在 map-3 阶段将 func2 赋给 pool1 时得到旧的 func 函数。

所以作为结论,因为①map-3没有找到名字func2,它在func2引用的函数中找到了名字func。因此,这也回答了 ② 和 ③,因为找到的 func 正在执行原始的 func 函数。机制是 func.__name__ 用于 pickle 和解析两个进程之间的函数名称,回答 ④.


最后更新,来自您:

pickle._Pickler.save_global中,它使用

获取名称
if name is None: name = getattr(obj, '__qualname__', None)

然后再

if name is None: name = obj.__name__. 

所以如果obj没有__qualname__那么会使用__name__

However it will check if the object passed is same with the one in subprocess:

if obj2 is not obj: raise PicklingError(...) 

其中 obj2, parent = _getattribute(module, name).

是的,但请记住,传递的 object 只是函数的(内部)名称,而不是函数本身。 child 进程 没有 方法来确定他的 func() 是否与内存中 parent 的 func() 相同。


来自@SyrtisMajor 的编辑:

好的,让我们更改上面的第一个代码:

import os
from multiprocessing import Pool

print(os.getpid(), 'parent')

def func(i):
    print(os.getpid(), 'first', end=" | ")
    if 'func' in globals():
        print(globals()['func'], end=" | ")
    else:
        print("no func in globals", end=" | ")
    if 'func2' in globals():
        print(globals()['func2'])
    else:
        print("no func2 in globals")

print('------ map-1')
pool1 = Pool(2)
pool1.map(func, range(2))         #map-1

def func2(i):
    print(os.getpid(), 'second', end=" | ")
    if 'func' in globals():
        print(globals()['func'], end=" | ")
    else:
        print("no func in globals", end=" | ")
    if 'func2' in globals():
        print(globals()['func2'])
    else:
        print("no func2 in globals")

func2.__qualname__ = func.__qualname__   

func = func2

print('------ map-2')
pool1.map(func,  range(2))        #map-2
print('------ map-3')
pool1.map(func2,  range(2))       #map-3

pool2 = Pool(2)
print('------ map-4')
pool2.map(func,   range(2))       #map-4
print('------ map-5')
pool2.map(func2,  range(2))       #map-5

输出结果如下:

38130 parent
------ map-1
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-2
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-3
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-4
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>
------ map-5
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>

这和我们的第一个输出完全一样。请注意 func2 定义之后的 func = func2 是关键,因为 pickle 将检查 func2(名称为 func)是否与 __main__.func 相同。如果不是,则 pickling 将失败。