使用分布式 TensorFlow 学习 Keras 模型

Question

我在两台不同的机器上安装了两个 GPU。我想构建一个集群，让我可以通过同时使用两个 GPU 来学习 Keras 模型。 Keras 博客在分布式训练部分和 link 官方 Tensorflow 文档中显示了两段代码。

我的问题是我不知道如何学习我的模型并将 Tensorflow 文档中报告的内容付诸实践。比如我想在多个GPU的集群上执行下面的代码怎么办？

# For a single-input model with 2 classes (binary classification):

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)

Answer 1

在博客的第一部分和第二部分中，他解释了如何将 keras 模型与 tensorflow 一起使用。

我还发现这个 keras 的例子 distributed training。

这是另一个horovod。

使用分布式 TensorFlow 学习 Keras 模型

Learning Keras model by using Distributed Tensorflow

python

gpu

distributed-computing

cluster-computing

keras