具有模型并行执行的 tf.Estimator 示例

Question

我目前正在试验分布式张量流。我将 tf.estimator.Estimator class（自定义模型函数）与 tf.contrib.learn.Experiment 一起使用并管理它以获得工作数据并行执行。

但是，我现在想尝试模型并行执行。除了，我找不到任何例子。但是我不确定如何使用 tf.estimators 来实现它（例如如何处理输入函数？）。

有没有人对此有任何经验或可以提供一个工作示例？

Answer 1

首先，您应该停止使用 tf.contrib.learn.Estimator in favor of tf.estimator.Estimator, because contrib is an experimental module，并且已经升级到核心 API（例如 Estimator）的类会自动被弃用。

现在，回到您的主要问题，您可以创建分布式模型并通过 tf.estimator.Estimator.__init__ 的 model_fn 参数传递它。

def my_model(features, labels, mode):
  net = features[X_FEATURE]
  with tf.device('/device:GPU:1'):
    for units in [10, 20, 10]:
      net = tf.layers.dense(net, units=units, activation=tf.nn.relu)
      net = tf.layers.dropout(net, rate=0.1)

  with tf.device('/device:GPU:2'):
    logits = tf.layers.dense(net, 3, activation=None)
    onehot_labels = tf.one_hot(labels, 3, 1, 0)
    loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, 
                                           logits=logits)

  optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
  train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
  return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

[...]

classifier = tf.estimator.Estimator(model_fn=my_model)

上面的模型定义了 6 个层 /device:GPU:1 放置和 3 个其他层 /device:GPU:2 放置。 my_model 函数的 return 值应该是 EstimatorSpec instance. A complete working example can be found in tensorflow examples。

具有模型并行执行的 tf.Estimator 示例

Example of tf.Estimator with model parallel execution

parallel-processing

machine-learning

distributed-computing

deep-learning

tensorflow