如何在tensorflow 2.0中累积梯度?
How to accumulate gradients in tensorflow 2.0?
我正在使用 tensorflow 2.0
训练模型。我训练集中的图像具有不同的分辨率。我构建的模型可以处理可变分辨率(conv 层后跟全局平均)。我的训练集很小,我想在一个批次中使用完整的训练集。
由于我的图片分辨率不同,我无法使用model.fit()
。因此,我计划通过网络单独传递每个样本,累积 errors/gradients,然后应用一个优化器步骤。我能够计算损失值,但我不知道如何累加 losses/gradients。如何累积 losses/gradients 然后应用单个优化器步骤?
代码:
for i in range(num_epochs):
print(f'Epoch: {i + 1}')
total_loss = 0
for j in tqdm(range(num_samples)):
sample = samples[j]
with tf.GradientTape as tape:
prediction = self.model(sample)
loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
gradients = tape.gradient(loss_value, self.model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
total_loss += loss_value
epoch_loss = total_loss / num_samples
print(f'Epoch loss: {epoch_loss}')
配合 and the explanation provided in Tensorflow Website,下面是Tensorflow 2.0版本中Accumulating Gradients的代码:
def train(epochs):
for epoch in range(epochs):
for (batch, (images, labels)) in enumerate(dataset):
with tf.GradientTape() as tape:
logits = mnist_model(images, training=True)
tvs = mnist_model.trainable_variables
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
loss_value = loss_object(labels, logits)
loss_history.append(loss_value.numpy().mean())
grads = tape.gradient(loss_value, tvs)
#print(grads[0].shape)
#print(accum_vars[0].shape)
accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]
optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
print ('Epoch {} finished'.format(epoch))
# Call the above function
train(epochs = 3)
完整代码见Github Gist.
如果我从这句话中理解正确:
How can I accumulate the losses/gradients and then apply a single optimizer step?
@Nagabhushan 正在尝试累积梯度,然后对(平均)累积梯度应用优化。 @TensorflowSupport 提供的答案没有回答。
为了仅执行一次优化,并从多个磁带中累积梯度,您可以执行以下操作:
for i in range(num_epochs):
print(f'Epoch: {i + 1}')
total_loss = 0
# get trainable variables
train_vars = self.model.trainable_variables
# Create empty gradient list (not a tf.Variable list)
accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]
for j in tqdm(range(num_samples)):
sample = samples[j]
with tf.GradientTape as tape:
prediction = self.model(sample)
loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
total_loss += loss_value
# get gradients of this tape
gradients = tape.gradient(loss_value, train_vars)
# Accumulate the gradients
accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]
# Now, after executing all the tapes you needed, we apply the optimization step
# (but first we take the average of the gradients)
accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
# apply optimization step
self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
epoch_loss = total_loss / num_samples
print(f'Epoch loss: {epoch_loss}')
应避免在训练循环中使用 tf.Variable(),因为它会在尝试将代码作为图形执行时产生错误。如果您在训练函数中使用 tf.Variable(),然后用“@tf.function”修饰它或应用“tf.function(my_train_fcn)”以获得图形函数(即为了提高性能),执行会出现错误。
发生这种情况是因为 tf.Variable 函数的跟踪导致与急切执行(分别为重新利用或创建)中观察到的行为不同的行为。您可以在 tensorflow help page.
中找到更多相关信息
我正在使用 tensorflow 2.0
训练模型。我训练集中的图像具有不同的分辨率。我构建的模型可以处理可变分辨率(conv 层后跟全局平均)。我的训练集很小,我想在一个批次中使用完整的训练集。
由于我的图片分辨率不同,我无法使用model.fit()
。因此,我计划通过网络单独传递每个样本,累积 errors/gradients,然后应用一个优化器步骤。我能够计算损失值,但我不知道如何累加 losses/gradients。如何累积 losses/gradients 然后应用单个优化器步骤?
代码:
for i in range(num_epochs):
print(f'Epoch: {i + 1}')
total_loss = 0
for j in tqdm(range(num_samples)):
sample = samples[j]
with tf.GradientTape as tape:
prediction = self.model(sample)
loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
gradients = tape.gradient(loss_value, self.model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
total_loss += loss_value
epoch_loss = total_loss / num_samples
print(f'Epoch loss: {epoch_loss}')
配合
def train(epochs):
for epoch in range(epochs):
for (batch, (images, labels)) in enumerate(dataset):
with tf.GradientTape() as tape:
logits = mnist_model(images, training=True)
tvs = mnist_model.trainable_variables
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
loss_value = loss_object(labels, logits)
loss_history.append(loss_value.numpy().mean())
grads = tape.gradient(loss_value, tvs)
#print(grads[0].shape)
#print(accum_vars[0].shape)
accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]
optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
print ('Epoch {} finished'.format(epoch))
# Call the above function
train(epochs = 3)
完整代码见Github Gist.
如果我从这句话中理解正确:
How can I accumulate the losses/gradients and then apply a single optimizer step?
@Nagabhushan 正在尝试累积梯度,然后对(平均)累积梯度应用优化。 @TensorflowSupport 提供的答案没有回答。 为了仅执行一次优化,并从多个磁带中累积梯度,您可以执行以下操作:
for i in range(num_epochs):
print(f'Epoch: {i + 1}')
total_loss = 0
# get trainable variables
train_vars = self.model.trainable_variables
# Create empty gradient list (not a tf.Variable list)
accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]
for j in tqdm(range(num_samples)):
sample = samples[j]
with tf.GradientTape as tape:
prediction = self.model(sample)
loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
total_loss += loss_value
# get gradients of this tape
gradients = tape.gradient(loss_value, train_vars)
# Accumulate the gradients
accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]
# Now, after executing all the tapes you needed, we apply the optimization step
# (but first we take the average of the gradients)
accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
# apply optimization step
self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
epoch_loss = total_loss / num_samples
print(f'Epoch loss: {epoch_loss}')
应避免在训练循环中使用 tf.Variable(),因为它会在尝试将代码作为图形执行时产生错误。如果您在训练函数中使用 tf.Variable(),然后用“@tf.function”修饰它或应用“tf.function(my_train_fcn)”以获得图形函数(即为了提高性能),执行会出现错误。 发生这种情况是因为 tf.Variable 函数的跟踪导致与急切执行(分别为重新利用或创建)中观察到的行为不同的行为。您可以在 tensorflow help page.
中找到更多相关信息