在 (i)python 脚本中从 jupyter 内核获取输出

Question

我想在单个 ipython 会话中打开多个内核，运行在这些内核上编写代码，然后收集结果。但是我想不出如何收集结果，甚至看不到stdout/stderr。我该如何做这些事情？

到目前为止我得到了什么

我已经使用如下代码完成了前两个步骤（打开内核和运行代码）：

from jupyter_client import MultiKernelManager
kernelmanager = MultiKernelManager()
remote_id = kernelmanager.start_kernel('python3')
remote_kernel = kernelmanager.get_kernel(remote_id)
remote = remote_kernel.client()
sent_msg_id = remote.execute('2+2')

[我欢迎任何有关如何改进它或如何关闭这些内核和客户端的建议。]

这里，python3可以是我设置的任何内核的名称（可以在命令行中用jupyter-kernelspec list列出）。而且我似乎能够运行任何合理的代码来代替 '2+2'。例如，我可以写入一个文件，然后那个文件就真的被创建了。

现在，问题是如何得到结果。我可以得到一些看似相关的消息

reply = remote.get_shell_msg(sent_msg_id)

那个回复是这样的字典：

{'buffers': [],
 'content': {'execution_count': 2,
  'payload': [],
  'status': 'ok',
  'user_expressions': {}},
 'header': {'date': datetime.datetime(2015, 10, 19, 14, 34, 34, 378577),
  'msg_id': '98e216b4-3251-4085-8eb1-bfceedbae3b0',
  'msg_type': 'execute_reply',
  'session': 'ca4d615d-82b7-487f-88ff-7076c2bdd109',
  'username': 'me',
  'version': '5.0'},
 'metadata': {'dependencies_met': True,
  'engine': '868de9dd-054b-4630-99b7-0face61915a6',
  'started': '2015-10-19T14:34:34.265718',
  'status': 'ok'},
 'msg_id': '98e216b4-3251-4085-8eb1-bfceedbae3b0',
 'msg_type': 'execute_reply',
 'parent_header': {'date': datetime.datetime(2015, 10, 19, 14, 34, 34, 264508),
  'msg_id': '2674c61a-c79a-48a6-b88a-1f2e8da68a80',
  'msg_type': 'execute_request',
  'session': '767ae562-38d6-41a3-a9dc-6faf37d83222',
  'username': 'me',
  'version': '5.0'}}

这在 Messaging in Jupyter. What isn't documented is how to actually use this -- i.e., which functions do I use, when and where do I find messages, etc. I've seen and its answer, which has useful related information, but doesn't quite get me to the answer. And this answer 中有记录，也没有得到任何有用的输出。

因此，例如，我也尝试使用上面结果中给出的 msg_id 获取消息，但它只是挂起。我已经尝试了我能想到的一切，但无法弄清楚如何从内核中取回任何东西。我该怎么做？我可以以某种字符串的形式从内核传回数据吗？我可以看到它的标准输出和标准错误吗？

背景

我正在为远程内核上的运行代码片段编写 ipython 魔法。 [编辑：这现在存在并且可用 here。]我的想法是，我将在我的笔记本电脑上放一个笔记本，并通过像这样的小魔法单元从多个远程服务器收集数据：

%%remote_exec -kernels server1,server2
2+2
! hostname

我使用 remote_ikernel to connect to those remote kernels easily and automatically. That seems to work just fine; I've got my magic command with all its bells and whistles working great, opening up these remote kernels, and running the code. Now I want to get some of that data from the remote sent back to my laptop -- presumably by serializing it in some way. At the moment, I think pickle.dumps and pickle.loads 非常适合这部分；我只需要将这些函数创建和使用的那些字节从一个内核传递到另一个内核。我宁愿不使用实际文件进行酸洗，尽管这可能是可以接受的。

编辑：

看起来像这样的怪物是可能的：

remote.get_shell_msg(remote.execute('import pickle'))
sent_msg_id = remote.execute('a=2+2', user_expressions={'output':'pickle.dumps({"a":a})'})
reply = remote.get_shell_msg(sent_msg_id)
output_bytes = reply['content']['user_expressions']['output']['data']['text/plain']
variable_dict = pickle.loads(eval(output_bytes))

而现在，variable_dict['a'] 只是 4。但是请注意，output_bytes 是表示这些字节的字符串，因此必须对它进行 evaled。这看起来很荒谬（而且仍然没有显示我是如何获得标准输出的）。有没有更好的办法？我如何获得标准输出？

编辑 2：

虽然我对上面的 hack 不满意，但我已经成功地使用它编写了一个名为 remote_exec 的小模块，托管在 github 上，如上所述。该模块给了我一点 ipython 魔法，我可以用它在一个或多个其他内核上远程运行编码。这是一个或多或少的自动过程，我绝对满意——除了对下面发生的事情的唠叨知识。

Answer 1

你好像在重新发明轮子。您不想自己管理内核。使用类似 ipyparallel which is made to spawn many kernels and scatter/gather data (basically you are reinventing how it works). You will likely also be interested in dask and read one introduction from the author 的内容。 IPyparallel 和 dask 作者正在共同努力，使这两个项目能够很好地协同工作。不要管理内核，而是使用 ipyparallel。

Answer 2

我的问题可能不够清楚，但我的主要用例是运行多台远程机器（使用大量并行代码计算数据的集群）上的一些代码，这样我就可以运行对远程存储的大型数据集的相当简单的命令，配置最少。为此，ipyparallel 不起作用。我基本上必须重写代码才能使用它。相反，我的模块 remote_exec 是完美的，允许我简单地添加集群的名称和工作目录，但在其他方面使用与我在本地使用的完全相同的代码。

在 (i)python 脚本中从 jupyter 内核获取输出

Getting output from jupyter kernel in (i)python script

python

ipython

jupyter

到目前为止我得到了什么

背景

编辑：

编辑 2：