自定义TF2.4训练循环中应该如何使用指数移动平均线
How should Exponential Moving Average be used in custom TF2.4 training loop
我有一个自定义训练循环,可以简化如下
inputs = tf.keras.Input(dtype=tf.float32, shape=(None, None, 3))
model = tf.keras.Model({"inputs": inputs}, {"loss": f(inputs)})
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9, nesterov=True)
for inputs in batches:
with tf.GradientTape() as tape:
results = model(inputs, training=True)
grads = tape.gradient(results["loss"], model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
TensorFlow documentation of ExponentialMovingAverage is not clear on how it should be used in from-scratch training loop。有人用过这个吗?
此外,如果影子变量仍在内存中,应如何将其恢复到模型中,以及如何检查训练变量是否已正确更新?
在训练循环之前创建 EMA 对象:
ema = tf.train.ExponentialMovingAverage(decay=0.9999)
然后在优化步骤后应用 EMA。 ema 对象将保留模型变量的影子变量。 (此处不需要调用 tf.control_dependencies
,请参阅 documentation 中的注释)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
ema.apply(model.trainable_variables)
然后,在您的模型中使用影子变量的一种方法是通过调用 EMA 对象的 average
方法将影子变量分配给您的模型变量:
for var in model.trainable_variables:
var.assign(ema.average(var))
model.save("model_with_shadow_variables.h5")
具有自定义的 EMA model.fit
这里是 指数移动平均线 的工作示例,其中自定义了 fit
。 Ref.
from tensorflow import keras
import tensorflow as tf
class EMACustomModel(keras.Model):
def __init__(self,*args, **kwargs):
super().__init__(*args, **kwargs)
self.ema = tf.train.ExponentialMovingAverage(decay=0.999)
def train_step(self, data):
x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
gradients = tape.gradient(loss, self.trainable_variables)
opt_op = self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
'''About: tf.control_dependencies:
Note: In TensorFlow 2 with eager and/or Autograph, you should not
require this method, as code executes in the expected order. Only use
tf.control_dependencies when working with v1-style code or in a graph
context such as inside Dataset.map.
'''
with tf.control_dependencies([opt_op]):
self.ema.apply(self.trainable_variables)
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
DummyModel
import numpy as np
input = keras.Input(shape=(28, 28))
flat = tf.keras.layers.Flatten()(input)
outputs = keras.layers.Dense(1)(flat)
model = EMACustomModel(input, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
虚拟数据
np.random.seed(101)
x = np.random.randint(0, 256, size=(50, 28, 28)).astype("float32")
y = np.random.random((50, 1))
print(x.shape, y.shape)
# train the model
model.fit(x, y, epochs=10, verbose=2)
...
...
Epoch 49/50
2/2 - 0s - loss: 189.8506 - mae: 10.8830
Epoch 50/50
2/2 - 0s - loss: 170.3690 - mae: 10.1046
model.trainable_weights[:1][:1]
我有一个自定义训练循环,可以简化如下
inputs = tf.keras.Input(dtype=tf.float32, shape=(None, None, 3))
model = tf.keras.Model({"inputs": inputs}, {"loss": f(inputs)})
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9, nesterov=True)
for inputs in batches:
with tf.GradientTape() as tape:
results = model(inputs, training=True)
grads = tape.gradient(results["loss"], model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
TensorFlow documentation of ExponentialMovingAverage is not clear on how it should be used in from-scratch training loop。有人用过这个吗?
此外,如果影子变量仍在内存中,应如何将其恢复到模型中,以及如何检查训练变量是否已正确更新?
在训练循环之前创建 EMA 对象:
ema = tf.train.ExponentialMovingAverage(decay=0.9999)
然后在优化步骤后应用 EMA。 ema 对象将保留模型变量的影子变量。 (此处不需要调用 tf.control_dependencies
,请参阅 documentation 中的注释)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
ema.apply(model.trainable_variables)
然后,在您的模型中使用影子变量的一种方法是通过调用 EMA 对象的 average
方法将影子变量分配给您的模型变量:
for var in model.trainable_variables:
var.assign(ema.average(var))
model.save("model_with_shadow_variables.h5")
具有自定义的 EMA model.fit
这里是 指数移动平均线 的工作示例,其中自定义了 fit
。 Ref.
from tensorflow import keras
import tensorflow as tf
class EMACustomModel(keras.Model):
def __init__(self,*args, **kwargs):
super().__init__(*args, **kwargs)
self.ema = tf.train.ExponentialMovingAverage(decay=0.999)
def train_step(self, data):
x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
gradients = tape.gradient(loss, self.trainable_variables)
opt_op = self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
'''About: tf.control_dependencies:
Note: In TensorFlow 2 with eager and/or Autograph, you should not
require this method, as code executes in the expected order. Only use
tf.control_dependencies when working with v1-style code or in a graph
context such as inside Dataset.map.
'''
with tf.control_dependencies([opt_op]):
self.ema.apply(self.trainable_variables)
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
DummyModel
import numpy as np
input = keras.Input(shape=(28, 28))
flat = tf.keras.layers.Flatten()(input)
outputs = keras.layers.Dense(1)(flat)
model = EMACustomModel(input, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
虚拟数据
np.random.seed(101)
x = np.random.randint(0, 256, size=(50, 28, 28)).astype("float32")
y = np.random.random((50, 1))
print(x.shape, y.shape)
# train the model
model.fit(x, y, epochs=10, verbose=2)
...
...
Epoch 49/50
2/2 - 0s - loss: 189.8506 - mae: 10.8830
Epoch 50/50
2/2 - 0s - loss: 170.3690 - mae: 10.1046
model.trainable_weights[:1][:1]