当 gpu 上的会话已经运行时使用 tensorflow

Question

我正在本地机器上使用 tensorflow 2 (gpu) 训练神经网络，我想并行执行一些 tensorflow 代码（只需加载模型并保存它的图形）。

加载模型时出现 cuda 错误。当另一个 tensorflow 实例正在 gpu 上训练时，如何在 cpu 上使用 tensorflow 2 加载和保存模型？

    132         self._config = config
    133         self._hyperparams['feature_extractor'] = self._get_feature_extractor(hyperparams['feature_extractor'])
--> 134         self._input_shape_tensor = tf.constant([input_shape[0], input_shape[1]])
    135         self._build(**self._hyperparams)
    136         # save parameter dict for serialization

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in constant(value, dtype, shape, name)
    225   """
    226   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 227                         allow_broadcast=True)
    228 
    229 

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    233   ctx = context.context()
    234   if ctx.executing_eagerly():
--> 235     t = convert_to_eager_tensor(value, ctx, dtype)
    236     if shape is None:
    237       return t

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
     93     except AttributeError:
     94       dtype = dtypes.as_dtype(dtype).as_datatype_enum
---> 95   ctx.ensure_initialized()
     96   return ops.EagerTensor(value, ctx.device_name, dtype)
     97 

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py in ensure_initialized(self)
    490         if self._default_is_async == ASYNC:
    491           pywrap_tensorflow.TFE_ContextOptionsSetAsync(opts, True)
--> 492         self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
    493       finally:
    494         pywrap_tensorflow.TFE_DeleteContextOptions(opts)

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Answer 1

您正在 GPU 上加载模型，由于它已用于训练，因此它是运行 out of memory。您需要将负载放在 CPU 上。尝试在里面加载模型

with tf.device('/CPU:0'):

Answer 2

默认情况下，TensorFlow 2 在启动时分配 90% 的 GPU:0 内存。如果你设置

import tensorflow as tf
tf.config.experimental.set_memory_growth(tf.config.experimental.list_physical_devices('GPU')[0], True)

您将能够将 GPU 用于这两项任务（当然，前提是您的 GPU 有足够的内存）。
如果您想更好地控制 GPU 内存的使用，您可以创建一个具有硬编码视频内存大小的虚拟 GPU：

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 2 GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)]) # limit in megabytes
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Answer 3

我花了一段时间才找到这个答案：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf

用这些行开始你的代码允许你运行你的 tf 代码在 CPU 上（显然避免使用 CUDA 是解决方案）同时运行宁一个重度 GPU 负载训练。

当 gpu 上的会话已经运行时使用 tensorflow

Using tensorflow when a session is already running on the gpu

python

tensorflow

tensorflow2.0

当 gpu 上的会话已经 运行 时使用 tensorflow

Using tensorflow when a session is already running on the gpu

python

tensorflow

tensorflow2.0

当 gpu 上的会话已经运行时使用 tensorflow