分布式TensorFlow：关于tf.train.Supervisor.start_queue_runners的使用

Distributed TensorFlow: about the using of tf.train.Supervisor.start_queue_runners

我正在查看 distributed inception model in TF, in which I have below questions about the use of tf.train.Supervisor.start_queue_runners in inception_distributed_train.py 的代码：

为什么我们需要在行中显式调用sv.start_queue_runners() 264 和 inception_distributed_train.py 中的第 269 行？在API 医生。 start_queue_runners，我看没有必要这样来电原因：

Note that the queue runners collected in the graph key QUEUE_RUNNERS are already started automatically when you create a session with the supervisor, so unless you have non-collected queue runners to start you do not need to call this explicitly.
我注意到 queue_runners 在调用中的值 sv.start_queue_runners行和264行不同 269 在 inception_distributed_train.py 中。但不是 chief_queue_runners 也在 collection 的 tf.GraphKeys.QUEUE_RUNNERS（所有QUEUE_RUNNERS都是在263行获取的）？如果所以，那么就不需要行 269 因为 chief_queue_runners 已经在 264.
此外，您能否向我解释一下或向我展示一些有关在 tf.train.Supervisor 中创建的队列的参考资料？

感谢您的宝贵时间！

不是答案，而是一些关于如何找到答案的一般说明:)

首先，利用github的指责，inception_distributed是在4月13日签到的，而start_queue_runners中的评论是在4月15日添加的，所以有可能功能已更改，但并未在所有使用它的地方进行更新。

您可以 comment-out 该行，看看是否仍然有效。如果没有，您可以在创建队列运行器的地方添加 import pdb; pdb.set_trace()（即 here），看看是谁在创建那些额外的无人值守队列运行器。

此外，Supervisor 的开发似乎已经放缓，事情正在转移到 FooSession（来自评论 here）。这些提供了更强大的训练架构（你的工作人员不会因为暂时的网络错误而崩溃），但是关于如何使用它们的例子还不多。