回调 ReduceLROnPlateau 的判别层训练问题
Discriminative layer training issue with callback ReduceLROnPlateau
我正在尝试使用 tensorflow addon 的 multioptimizer 进行判别层训练,不同层的不同学习率,但它不适用于回调 ReduceLROnPlateau。
from tensorflow.keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(patience=5, min_delta=1e-4, min_lr=1e-7, verbose=0)
with tpu_strategy.scope():
roberta_model = create_model(512)
optimizers = [
AdamWeightDecay(learning_rate=0.00001, weight_decay_rate=0.00001),
AdamWeightDecay(learning_rate=0.0001, weight_decay_rate=0.0001)
]
# specifying the optimizers and layers in which it will operate
optimizers_and_layers = [
(optimizers[0], roberta_model.layers[:3]),
(optimizers[1], roberta_model.layers[3:])
]
# Using Multi Optimizer from Tensorflow Addons
opt = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
roberta_model.compile(optimizer=opt,
loss=tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1), metrics=["accuracy"])
history=roberta_model.fit(train, epochs=50, validation_data=val, callbacks=[reduce_lr])
在第一个纪元结束时,它产生了这个错误:
AttributeError: 'MultiOptimizer' object has no attribute 'lr'
没有 ReduceLROnPlateau 回调也能正常工作。
我尝试了几种方法来解决这个问题,最后一次尝试是修改回调 - 在高原回调上编写我自己的降低学习率。但这远远超出了我的编码技能。我已经评论了我对原始回调所做的一些更改。
我这样试过:
class My_ReduceLROnPlateau(tf.keras.callbacks.Callback):
def __init__(self,
monitor='val_loss',
factor=0.1,
patience=10,
verbose=0,
mode='auto',
min_delta=1e-4,
cooldown=0,
min_lr=0,
**kwargs):
super(My_ReduceLROnPlateau, self).__init__()
self.monitor = monitor
if factor >= 1.0:
raise ValueError(
f'ReduceLROnPlateau does not support a factor >= 1.0. Got {factor}')
if 'epsilon' in kwargs:
min_delta = kwargs.pop('epsilon')
logging.warning('`epsilon` argument is deprecated and '
'will be removed, use `min_delta` instead.')
self.factor = factor
self.min_lr = min_lr
self.min_delta = min_delta
self.patience = patience
self.verbose = verbose
self.cooldown = cooldown
self.cooldown_counter = 0 # Cooldown counter.
self.wait = 0
self.best = 0
self.mode = mode
self.monitor_op = None
self._reset()
def _reset(self):
"""Resets wait counter and cooldown counter.
"""
if self.mode not in ['auto', 'min', 'max']:
logging.warning('Learning rate reduction mode %s is unknown, '
'fallback to auto mode.', self.mode)
self.mode = 'auto'
if (self.mode == 'min' or
(self.mode == 'auto' and 'acc' not in self.monitor)):
self.monitor_op = lambda a, b: np.less(a, b - self.min_delta)
self.best = np.Inf
else:
self.monitor_op = lambda a, b: np.greater(a, b + self.min_delta)
self.best = -np.Inf
self.cooldown_counter = 0
self.wait = 0
def on_train_begin(self, logs=None):
self._reset()
def on_epoch_end(self, epoch, logs=None):
logs = logs or {}
logs['lr'] = backend.get_value(self.model.optimizer[1].lr)
current = logs.get(self.monitor)
if current is None:
logging.warning('Learning rate reduction is conditioned on metric `%s` '
'which is not available. Available metrics are: %s',
self.monitor, ','.join(list(logs.keys())))
else:
if self.in_cooldown():
self.cooldown_counter -= 1
self.wait = 0
if self.monitor_op(current, self.best):
self.best = current
self.wait = 0
elif not self.in_cooldown():
self.wait += 1
if self.wait >= self.patience:
# Here below i tried to subscript the self.model.optimizer
#, guessing that each pointed to one of the optimzers.
# And using the same code as in the original ReduceLROnPlateau to
# update the optimizers.
old_lr1 = backend.get_value(self.model.optimizer[1].lr)
old_lr0 = backend.get_value(self.model.optimizer[0].lr)
if old_lr1 > np.float32(self.min_lr):
new_lr1 = old_lr1 * self.factor
new_lr1 = max(new_lr1, self.min_lr)
backend.set_value(self.model.optimizer[1].lr, new_lr1)
new_lr0 = old_lr0 * self.factor
new_lr0 = max(new_lr0, self.min_lr)
backend.set_value(self.model.optimizer[0].lr, new_lr0)
if self.verbose > 0:
io_utils.print_msg(
f'\nEpoch {epoch +1}: '
f'ReduceLROnPlateau reducing learning rate to {new_lr0} and {new_lr1}.')
self.cooldown_counter = self.cooldown
self.wait = 0
def in_cooldown(self):
return self.cooldown_counter > 0
然后我创建了回调
reduce_lr = My_ReduceLROnPlateau(patience=5, min_delta=1e-4, min_lr=1e-7, verbose=0)
又开始训练了。在第一个时期结束时,我收到以下错误。
TypeError: 'MultiOptimizer' object is not subscriptable
即你不能这样做 self.model.optimizer[1], self.model.optimizer[0].
所以我的问题是如何解决这个问题?即使用 ReduceLROnPlateau 进行判别层训练。
通过其他方法或修改我创建新回调的尝试 class.
这是 orginal ReduceLROnPlateau callback 的 link,即没有我在自定义回调中所做的一些更改。
也许可以使用 this:
来解决
注意:目前,tfa.optimizers.MultiOptimizer 不支持修改优化器的回调。但是,您可以使用 tf.keras.optimizers.schedules.LearningRateSchedule 而不是静态学习率
实例化优化器层对
查看 tfa.optimizers.MultiOptimizer
的代码(在方法 create_optimizer_spec 中,似乎可以通过访问优化器
self.model.optimizer.optimizer_specs[0]["optimizer"]
和 self.model.optimizer.optimizer_specs[1]["optimizer"]
来改变学习率(这就是 self.model.optimizer[1]
引发错误的原因)。
那么您的自定义回调似乎有效。
我正在尝试使用 tensorflow addon 的 multioptimizer 进行判别层训练,不同层的不同学习率,但它不适用于回调 ReduceLROnPlateau。
from tensorflow.keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(patience=5, min_delta=1e-4, min_lr=1e-7, verbose=0)
with tpu_strategy.scope():
roberta_model = create_model(512)
optimizers = [
AdamWeightDecay(learning_rate=0.00001, weight_decay_rate=0.00001),
AdamWeightDecay(learning_rate=0.0001, weight_decay_rate=0.0001)
]
# specifying the optimizers and layers in which it will operate
optimizers_and_layers = [
(optimizers[0], roberta_model.layers[:3]),
(optimizers[1], roberta_model.layers[3:])
]
# Using Multi Optimizer from Tensorflow Addons
opt = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
roberta_model.compile(optimizer=opt,
loss=tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1), metrics=["accuracy"])
history=roberta_model.fit(train, epochs=50, validation_data=val, callbacks=[reduce_lr])
在第一个纪元结束时,它产生了这个错误:
AttributeError: 'MultiOptimizer' object has no attribute 'lr'
没有 ReduceLROnPlateau 回调也能正常工作。
我尝试了几种方法来解决这个问题,最后一次尝试是修改回调 - 在高原回调上编写我自己的降低学习率。但这远远超出了我的编码技能。我已经评论了我对原始回调所做的一些更改。 我这样试过:
class My_ReduceLROnPlateau(tf.keras.callbacks.Callback):
def __init__(self,
monitor='val_loss',
factor=0.1,
patience=10,
verbose=0,
mode='auto',
min_delta=1e-4,
cooldown=0,
min_lr=0,
**kwargs):
super(My_ReduceLROnPlateau, self).__init__()
self.monitor = monitor
if factor >= 1.0:
raise ValueError(
f'ReduceLROnPlateau does not support a factor >= 1.0. Got {factor}')
if 'epsilon' in kwargs:
min_delta = kwargs.pop('epsilon')
logging.warning('`epsilon` argument is deprecated and '
'will be removed, use `min_delta` instead.')
self.factor = factor
self.min_lr = min_lr
self.min_delta = min_delta
self.patience = patience
self.verbose = verbose
self.cooldown = cooldown
self.cooldown_counter = 0 # Cooldown counter.
self.wait = 0
self.best = 0
self.mode = mode
self.monitor_op = None
self._reset()
def _reset(self):
"""Resets wait counter and cooldown counter.
"""
if self.mode not in ['auto', 'min', 'max']:
logging.warning('Learning rate reduction mode %s is unknown, '
'fallback to auto mode.', self.mode)
self.mode = 'auto'
if (self.mode == 'min' or
(self.mode == 'auto' and 'acc' not in self.monitor)):
self.monitor_op = lambda a, b: np.less(a, b - self.min_delta)
self.best = np.Inf
else:
self.monitor_op = lambda a, b: np.greater(a, b + self.min_delta)
self.best = -np.Inf
self.cooldown_counter = 0
self.wait = 0
def on_train_begin(self, logs=None):
self._reset()
def on_epoch_end(self, epoch, logs=None):
logs = logs or {}
logs['lr'] = backend.get_value(self.model.optimizer[1].lr)
current = logs.get(self.monitor)
if current is None:
logging.warning('Learning rate reduction is conditioned on metric `%s` '
'which is not available. Available metrics are: %s',
self.monitor, ','.join(list(logs.keys())))
else:
if self.in_cooldown():
self.cooldown_counter -= 1
self.wait = 0
if self.monitor_op(current, self.best):
self.best = current
self.wait = 0
elif not self.in_cooldown():
self.wait += 1
if self.wait >= self.patience:
# Here below i tried to subscript the self.model.optimizer
#, guessing that each pointed to one of the optimzers.
# And using the same code as in the original ReduceLROnPlateau to
# update the optimizers.
old_lr1 = backend.get_value(self.model.optimizer[1].lr)
old_lr0 = backend.get_value(self.model.optimizer[0].lr)
if old_lr1 > np.float32(self.min_lr):
new_lr1 = old_lr1 * self.factor
new_lr1 = max(new_lr1, self.min_lr)
backend.set_value(self.model.optimizer[1].lr, new_lr1)
new_lr0 = old_lr0 * self.factor
new_lr0 = max(new_lr0, self.min_lr)
backend.set_value(self.model.optimizer[0].lr, new_lr0)
if self.verbose > 0:
io_utils.print_msg(
f'\nEpoch {epoch +1}: '
f'ReduceLROnPlateau reducing learning rate to {new_lr0} and {new_lr1}.')
self.cooldown_counter = self.cooldown
self.wait = 0
def in_cooldown(self):
return self.cooldown_counter > 0
然后我创建了回调
reduce_lr = My_ReduceLROnPlateau(patience=5, min_delta=1e-4, min_lr=1e-7, verbose=0)
又开始训练了。在第一个时期结束时,我收到以下错误。
TypeError: 'MultiOptimizer' object is not subscriptable
即你不能这样做 self.model.optimizer[1], self.model.optimizer[0].
所以我的问题是如何解决这个问题?即使用 ReduceLROnPlateau 进行判别层训练。 通过其他方法或修改我创建新回调的尝试 class.
这是 orginal ReduceLROnPlateau callback 的 link,即没有我在自定义回调中所做的一些更改。
也许可以使用 this:
来解决注意:目前,tfa.optimizers.MultiOptimizer 不支持修改优化器的回调。但是,您可以使用 tf.keras.optimizers.schedules.LearningRateSchedule 而不是静态学习率
实例化优化器层对查看 tfa.optimizers.MultiOptimizer
的代码(在方法 create_optimizer_spec 中,似乎可以通过访问优化器
self.model.optimizer.optimizer_specs[0]["optimizer"]
和 self.model.optimizer.optimizer_specs[1]["optimizer"]
来改变学习率(这就是 self.model.optimizer[1]
引发错误的原因)。
那么您的自定义回调似乎有效。