Colab pro 不提供超过 16 GB 的内存
Colab pro does not provide more than 16 gb of ram
今天我将帐户升级到了 Colab pro。虽然它将 ram 打印为:
Your runtime has 27.3 gigabytes of available RAM
You are using a high-RAM runtime!
当我开始训练我的模型时,出现以下错误。
RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 15.90 GiB total capacity; 14.75 GiB already allocated; 75.75 MiB free; 14.95 GiB reserved in total by PyTorch)
我的模型的超参数:
args_dict = dict(
#data_dir="", # path for data files
output_dir="", # path to save the checkpoints
model_name_or_path='t5-large',
tokenizer_name_or_path='t5-large',
max_seq_length=600,
learning_rate=3e-4,
weight_decay=0.0,
adam_epsilon=1e-8,
warmup_steps=0,
train_batch_size=4,
eval_batch_size=4,
num_train_epochs=2,
gradient_accumulation_steps=16,
n_gpu=1,
early_stop_callback=False,
fp_16=True, # if you want to enable 16-bit training then install apex and set this to true
opt_level='O1', # you can find out more on optimisation levels here https://nvidia.github.io/apex/amp.html#opt-levels-and-properties
max_grad_norm=1.0, # if you enable 16-bit training then set this to a sensible value, 0.5 is a good default
seed=42,
)
Colab pro 未提供所有内存。我的代码仅在 train_batch_size = 1 时有效。这是什么原因造成的?有什么想法吗?
注意:当我 运行 Kaggle (16Gb) 中的代码时,我得到了同样的错误。那么,我从 colab pro 得到了什么?
看你的错误,16GB指的是显卡,不是内存。
据我所知,使用 colab-pro 可以让您使用最高 16GB VRAM 的显卡。
您可以通过运行以下代码查看VRAM数量。
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
print('and then re-execute this cell.')
else:
print(gpu_info)
也许您使用的批量小于 4?
今天我将帐户升级到了 Colab pro。虽然它将 ram 打印为:
Your runtime has 27.3 gigabytes of available RAM
You are using a high-RAM runtime!
当我开始训练我的模型时,出现以下错误。
RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 15.90 GiB total capacity; 14.75 GiB already allocated; 75.75 MiB free; 14.95 GiB reserved in total by PyTorch)
我的模型的超参数:
args_dict = dict(
#data_dir="", # path for data files
output_dir="", # path to save the checkpoints
model_name_or_path='t5-large',
tokenizer_name_or_path='t5-large',
max_seq_length=600,
learning_rate=3e-4,
weight_decay=0.0,
adam_epsilon=1e-8,
warmup_steps=0,
train_batch_size=4,
eval_batch_size=4,
num_train_epochs=2,
gradient_accumulation_steps=16,
n_gpu=1,
early_stop_callback=False,
fp_16=True, # if you want to enable 16-bit training then install apex and set this to true
opt_level='O1', # you can find out more on optimisation levels here https://nvidia.github.io/apex/amp.html#opt-levels-and-properties
max_grad_norm=1.0, # if you enable 16-bit training then set this to a sensible value, 0.5 is a good default
seed=42,
)
Colab pro 未提供所有内存。我的代码仅在 train_batch_size = 1 时有效。这是什么原因造成的?有什么想法吗?
注意:当我 运行 Kaggle (16Gb) 中的代码时,我得到了同样的错误。那么,我从 colab pro 得到了什么?
看你的错误,16GB指的是显卡,不是内存。
据我所知,使用 colab-pro 可以让您使用最高 16GB VRAM 的显卡。
您可以通过运行以下代码查看VRAM数量。
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
print('and then re-execute this cell.')
else:
print(gpu_info)
也许您使用的批量小于 4?