Tensorflow 的 Between-graph replication 是数据并行的一个例子吗?
Is Tensorflow's Between-graph replication an example of data parallelism?
我已阅读distributed tensorflow documentation and 。
根据this,在数据并行方法中:
- The algorithm distributes the data between various cores.
- Each core independently tries to estimate the same parameter(s)
- Cores then exchange their estimate(s) with each other to come up with the right estimate for the step.
在模型并行方法中:
- The algorithm sends the same data to all the cores.
- Each core is responsible for estimating different parameter(s)
- Cores then exchange their estimate(s) with each other to come up with the right estimate for all the parameters.
In-graph replication
和 Between-graph replication
与这些方法有什么关系?
This article 说:
For example, different layers in a network may be trained in parallel
on different GPUs. This training procedure is commonly known as "model
parallelism" (or "in-graph replication" in the TensorFlow
documentation).
并且:
In "data parallelism" (or “between-graph replication” in the
TensorFlow documentation), you use the same model for every device,
but train the model in each device using different training samples.
准确吗?
来自 tensorflow 文档页面中链接的 Tensorflow DevSummit video:
看起来数据被拆分并分发给每个工作人员。那么 In-graph replication
不是遵循数据并行方法吗?
图内复制和图间复制与数据并行度和模型并行度没有直接关系。数据并行和模型并行是将并行化算法分为两类的术语,如您 link 的 quora 答案中所述。但是图内复制和图间复制是tensorflow中实现并行的两种方式。例如,数据并行性可以通过图内复制和图间复制来实现。
如视频中所示,图内复制是通过将单个图的不同部分分配给不同的设备来实现的。 In between-graph replication 有多个并行的图运行,这是通过使用分布式张量流来实现的。
我已阅读distributed tensorflow documentation and
根据this,在数据并行方法中:
- The algorithm distributes the data between various cores.
- Each core independently tries to estimate the same parameter(s)
- Cores then exchange their estimate(s) with each other to come up with the right estimate for the step.
在模型并行方法中:
- The algorithm sends the same data to all the cores.
- Each core is responsible for estimating different parameter(s)
- Cores then exchange their estimate(s) with each other to come up with the right estimate for all the parameters.
In-graph replication
和 Between-graph replication
与这些方法有什么关系?
This article 说:
For example, different layers in a network may be trained in parallel on different GPUs. This training procedure is commonly known as "model parallelism" (or "in-graph replication" in the TensorFlow documentation).
并且:
In "data parallelism" (or “between-graph replication” in the TensorFlow documentation), you use the same model for every device, but train the model in each device using different training samples.
准确吗?
来自 tensorflow 文档页面中链接的 Tensorflow DevSummit video:
In-graph replication
不是遵循数据并行方法吗?
图内复制和图间复制与数据并行度和模型并行度没有直接关系。数据并行和模型并行是将并行化算法分为两类的术语,如您 link 的 quora 答案中所述。但是图内复制和图间复制是tensorflow中实现并行的两种方式。例如,数据并行性可以通过图内复制和图间复制来实现。
如视频中所示,图内复制是通过将单个图的不同部分分配给不同的设备来实现的。 In between-graph replication 有多个并行的图运行,这是通过使用分布式张量流来实现的。