TF 2.0 中的神经网络:无法以 float64 精度进行训练
Neural network in TF 2.0: cannot train in float64 precision
我有一个可用的神经网络(内置于 Tensorflow 2.0 和 Keras API),我使用 float32
精度(默认精度)进行训练。现在我想以 float64 精度进行训练。在开始执行神经网络之前,我使用 tensorflow.keras.backend.set_floatx('float64)
启用它。训练开始,但在第一个时期的最后一批,我收到以下错误:
File "Z:\Z_MASTER\DL_Reconstruction\train_stage_1.py", line 49, in train_vae
validation_split=1/19, callbacks=callbacks) # CHANGE val split
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 674, in fit
steps_name='steps_per_epoch')
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 449, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 298, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 1614, in on_epoch_end
self._log_weights(epoch)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 1696, in _log_weights
self._log_weight_as_image(weight, weight_name, epoch)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 1721, in _log_weight_as_image
summary_ops_v2.image(weight_name, w_img, step=epoch)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 820, in image
return summary_writer_function(name, tensor, function, family=family)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 730, in summary_writer_function
should_record_summaries(), record, _nothing, name="")
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\framework\smart_cond.py", line 54, in smart_cond
return true_fn()
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 723, in record
with ops.control_dependencies([function(tag, scope)]):
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 818, in function
name=scope)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\gen_summary_ops.py", line 654, in write_image_summary
name=name, ctx=_ctx)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\gen_summary_ops.py", line 698, in write_image_summary_eager_fallback
attrs=_attrs, ctx=_ctx, name=name)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Value for attr 'T' of double is not in the list of allowed values: uint8, float, half
; NodeDef: {{node WriteImageSummary}}; Op<name=WriteImageSummary; signature=writer:resource, step:int64, tag:string, tensor:T, bad_color:uint8 -> ; attr=max_images:int,default=3,min=1; attr=T:type,default=DT_FLOAT,allowed=[DT_UINT8, DT_FLOAT, DT_HALF]; is_stateful=true> [Op:WriteImageSummary] name: enc_0_conv/kernel_0/
Process finished with exit code 1
长话短说,我想错误消息的最后一行对查找错误最有帮助:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Value for attr 'T' of double is not in the list of allowed values: uint8, float, half
我尝试通过以下方式将图层的 dtype
参数更改为 float64
来解决此问题(仅片段):
conv = Conv2D(..., dtype='float64')(input)
...
output = ReLU(dtype='float64')(input)
...
lat_var = Lambda(... dtype='float64')([z_mean, z_log_var])
...
代码在这一行崩溃:
history = model.fit(x=images, y=images, epochs=200, batch_size=32,
validation_split=1/19, callbacks=callbacks)
其中 images
是 float64
类型的 numpy 数组,由 images = images.astype('float64')
实现。
有人知道我如何训练 float64
精度吗?
错误的原因是两个 Tensorboard 回调,它们在每个 epoch 结束时被调用以记录训练。更具体地说,设置Tensorboard回调的参数write_images=False
解决了问题。
这里是回调的完整工作代码
TensorBoard(log_dir='logs_1', profile_batch=0, histogram_freq=1, write_images=True)
TensorBoard(log_dir='current_logs', profile_batch=0, histogram_freq=1, write_images=True)
我有一个可用的神经网络(内置于 Tensorflow 2.0 和 Keras API),我使用 float32
精度(默认精度)进行训练。现在我想以 float64 精度进行训练。在开始执行神经网络之前,我使用 tensorflow.keras.backend.set_floatx('float64)
启用它。训练开始,但在第一个时期的最后一批,我收到以下错误:
File "Z:\Z_MASTER\DL_Reconstruction\train_stage_1.py", line 49, in train_vae
validation_split=1/19, callbacks=callbacks) # CHANGE val split
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 674, in fit
steps_name='steps_per_epoch')
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 449, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 298, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 1614, in on_epoch_end
self._log_weights(epoch)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 1696, in _log_weights
self._log_weight_as_image(weight, weight_name, epoch)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 1721, in _log_weight_as_image
summary_ops_v2.image(weight_name, w_img, step=epoch)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 820, in image
return summary_writer_function(name, tensor, function, family=family)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 730, in summary_writer_function
should_record_summaries(), record, _nothing, name="")
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\framework\smart_cond.py", line 54, in smart_cond
return true_fn()
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 723, in record
with ops.control_dependencies([function(tag, scope)]):
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\summary_ops_v2.py", line 818, in function
name=scope)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\gen_summary_ops.py", line 654, in write_image_summary
name=name, ctx=_ctx)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\ops\gen_summary_ops.py", line 698, in write_image_summary_eager_fallback
attrs=_attrs, ctx=_ctx, name=name)
File "Z:\Z_MASTER\Envs\p37_new_clone\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Value for attr 'T' of double is not in the list of allowed values: uint8, float, half
; NodeDef: {{node WriteImageSummary}}; Op<name=WriteImageSummary; signature=writer:resource, step:int64, tag:string, tensor:T, bad_color:uint8 -> ; attr=max_images:int,default=3,min=1; attr=T:type,default=DT_FLOAT,allowed=[DT_UINT8, DT_FLOAT, DT_HALF]; is_stateful=true> [Op:WriteImageSummary] name: enc_0_conv/kernel_0/
Process finished with exit code 1
长话短说,我想错误消息的最后一行对查找错误最有帮助:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Value for attr 'T' of double is not in the list of allowed values: uint8, float, half
我尝试通过以下方式将图层的 dtype
参数更改为 float64
来解决此问题(仅片段):
conv = Conv2D(..., dtype='float64')(input)
...
output = ReLU(dtype='float64')(input)
...
lat_var = Lambda(... dtype='float64')([z_mean, z_log_var])
...
代码在这一行崩溃:
history = model.fit(x=images, y=images, epochs=200, batch_size=32,
validation_split=1/19, callbacks=callbacks)
其中 images
是 float64
类型的 numpy 数组,由 images = images.astype('float64')
实现。
有人知道我如何训练 float64
精度吗?
错误的原因是两个 Tensorboard 回调,它们在每个 epoch 结束时被调用以记录训练。更具体地说,设置Tensorboard回调的参数write_images=False
解决了问题。
这里是回调的完整工作代码
TensorBoard(log_dir='logs_1', profile_batch=0, histogram_freq=1, write_images=True)
TensorBoard(log_dir='current_logs', profile_batch=0, histogram_freq=1, write_images=True)