自定义TF2.4训练循环中应该如何使用指数移动平均线

How should Exponential Moving Average be used in custom TF2.4 training loop

我有一个自定义训练循环,可以简化如下

inputs = tf.keras.Input(dtype=tf.float32, shape=(None, None, 3))
model = tf.keras.Model({"inputs": inputs}, {"loss": f(inputs)})
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9, nesterov=True)

for inputs in batches:
    with tf.GradientTape() as tape:
        results = model(inputs, training=True)
    grads = tape.gradient(results["loss"], model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

TensorFlow documentation of ExponentialMovingAverage is not clear on how it should be used in from-scratch training loop。有人用过这个吗?

此外,如果影子变量仍在内存中,应如何将其恢复到模型中,以及如何检查训练变量是否已正确更新?

在训练循环之前创建 EMA 对象:

ema = tf.train.ExponentialMovingAverage(decay=0.9999)

然后在优化步骤后应用 EMA。 ema 对象将保留模型变量的影子变量。 (此处不需要调用 tf.control_dependencies,请参阅 documentation 中的注释)

optimizer.apply_gradients(zip(grads, model.trainable_variables))
ema.apply(model.trainable_variables)

然后,在您的模型中使用影子变量的一种方法是通过调用 EMA 对象的 average 方法将影子变量分配给您的模型变量:

for var in model.trainable_variables:
    var.assign(ema.average(var))
model.save("model_with_shadow_variables.h5")

具有自定义的 EMA model.fit

这里是 指数移动平均线 的工作示例,其中自定义了 fitRef.

from tensorflow import keras
import tensorflow as tf 

class EMACustomModel(keras.Model):
    def __init__(self,*args, **kwargs):
        super().__init__(*args, **kwargs)
        self.ema = tf.train.ExponentialMovingAverage(decay=0.999)

    def train_step(self, data):
        x, y = data

        with tf.GradientTape() as tape:
            y_pred = self(x, training=True)  
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)

        gradients = tape.gradient(loss, self.trainable_variables)
        opt_op = self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

        '''About: tf.control_dependencies: 
        Note: In TensorFlow 2 with eager and/or Autograph, you should not 
        require this method, as code executes in the expected order. Only use 
        tf.control_dependencies when working with v1-style code or in a graph 
        context such as inside Dataset.map.
        '''
        with tf.control_dependencies([opt_op]):
            self.ema.apply(self.trainable_variables)

        self.compiled_metrics.update_state(y, y_pred)
        return {m.name: m.result() for m in self.metrics}

DummyModel

import numpy as np

input = keras.Input(shape=(28, 28))
flat = tf.keras.layers.Flatten()(input)
outputs = keras.layers.Dense(1)(flat)

model = EMACustomModel(input, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])

虚拟数据

np.random.seed(101)
x = np.random.randint(0, 256, size=(50, 28, 28)).astype("float32")
y = np.random.random((50, 1))
print(x.shape, y.shape)

# train the model 
model.fit(x, y, epochs=10, verbose=2)
...
...
Epoch 49/50
2/2 - 0s - loss: 189.8506 - mae: 10.8830
Epoch 50/50
2/2 - 0s - loss: 170.3690 - mae: 10.1046

model.trainable_weights[:1][:1]