如何处理 Torch 中的 GPU 内存泄漏问题?
How to deal with GPU memory leakage issues in Torch?
我机器的 GPU 有 2 GB 内存。当我第一次 运行 下面的代码时,我没有得到任何错误。但是,我第二次 运行 代码时出现内存错误。作为短期补救措施,我唯一能做的就是使用 torch.Tensor.float()
将数据转换为 float32。但是问题依旧,进程结束后占用的内存没有释放,或者进程在运行ning时终止。机器 RAM 也是如此。 Torch中应该如何防止内存泄漏或释放内存?
require 'nn'
require 'image'
require 'cunn'
require 'paths'
collectgarbage(); collectgarbage()
if (not paths.filep("cifar10torchsmall.zip")) then
os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}
setmetatable(trainset,
{__index = function(t, i)
return {t.data[i], t.label[i]}
end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.
function trainset:size()
return self.data:size(1)
end
mean = {} -- store the mean, to normalize the test set in the future
stdv = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
mean[i] = trainset.data[{ {}, {i}, {}, {} }]:mean() -- mean estimation
print('Channel ' .. i .. ', Mean: ' .. mean[i])
trainset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
stdv[i] = trainset.data[{ {}, {i}, {}, {} }]:std() -- std estimation
print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
testset.data = testset.data:double() -- convert from Byte tensor to Double tensor
for i=1,3 do -- over each image channel
testset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
testset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
trainset.data = trainset.data:cuda()
testset.data = testset.data:cuda()
net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2)) -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120)) -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(120, 84))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(84, 10)) -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())
net = net:cuda()
criterion = nn.ClassNLLCriterion()
criterion = criterion:cuda()
pred = net:forward(trainset.data)
outputEr = criterion:forward(pred, trainset.label:cuda())
net:zeroGradParameters()
outputGrad = criterion:backward(pred, trainset.label:cuda())
collectgarbage()
inputGrad = net:backward(trainset.data, outputGrad)
附带问题:为什么 Torch 将网络参数初始化为双精度,尽管 GPU 在计算双精度运算时非常慢,而且实际上几乎所有神经网络应用程序都不需要 64 位参数值?如何使用浮点(32 位)参数向量初始化模型?
我找到了副题的答案。您可以在代码开头使用以下内容轻松地将 torch 的默认数据类型设置为 float:
torch.setdefaulttensortype('torch.FloatTensor')
我可以通过在我进行上述实验的机器上从 CUDA 6.5 升级到 CUDA 7.5 来解决这个问题(几乎)。现在,在大多数情况下,当程序崩溃时 运行 GPU 内存被释放。但是,有时仍然没有发生,我必须重新启动机器。
此外,我会执行以下操作以确保程序在成功运行时清除 GPU 内存:
net = nil
trainset = nil
testset = nil
pred = nil
inputGrad = nil
criterion = nil
collectgarbage()
我机器的 GPU 有 2 GB 内存。当我第一次 运行 下面的代码时,我没有得到任何错误。但是,我第二次 运行 代码时出现内存错误。作为短期补救措施,我唯一能做的就是使用 torch.Tensor.float()
将数据转换为 float32。但是问题依旧,进程结束后占用的内存没有释放,或者进程在运行ning时终止。机器 RAM 也是如此。 Torch中应该如何防止内存泄漏或释放内存?
require 'nn'
require 'image'
require 'cunn'
require 'paths'
collectgarbage(); collectgarbage()
if (not paths.filep("cifar10torchsmall.zip")) then
os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}
setmetatable(trainset,
{__index = function(t, i)
return {t.data[i], t.label[i]}
end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.
function trainset:size()
return self.data:size(1)
end
mean = {} -- store the mean, to normalize the test set in the future
stdv = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
mean[i] = trainset.data[{ {}, {i}, {}, {} }]:mean() -- mean estimation
print('Channel ' .. i .. ', Mean: ' .. mean[i])
trainset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
stdv[i] = trainset.data[{ {}, {i}, {}, {} }]:std() -- std estimation
print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
testset.data = testset.data:double() -- convert from Byte tensor to Double tensor
for i=1,3 do -- over each image channel
testset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
testset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
trainset.data = trainset.data:cuda()
testset.data = testset.data:cuda()
net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2)) -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120)) -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(120, 84))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(84, 10)) -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())
net = net:cuda()
criterion = nn.ClassNLLCriterion()
criterion = criterion:cuda()
pred = net:forward(trainset.data)
outputEr = criterion:forward(pred, trainset.label:cuda())
net:zeroGradParameters()
outputGrad = criterion:backward(pred, trainset.label:cuda())
collectgarbage()
inputGrad = net:backward(trainset.data, outputGrad)
附带问题:为什么 Torch 将网络参数初始化为双精度,尽管 GPU 在计算双精度运算时非常慢,而且实际上几乎所有神经网络应用程序都不需要 64 位参数值?如何使用浮点(32 位)参数向量初始化模型?
我找到了副题的答案。您可以在代码开头使用以下内容轻松地将 torch 的默认数据类型设置为 float:
torch.setdefaulttensortype('torch.FloatTensor')
我可以通过在我进行上述实验的机器上从 CUDA 6.5 升级到 CUDA 7.5 来解决这个问题(几乎)。现在,在大多数情况下,当程序崩溃时 运行 GPU 内存被释放。但是,有时仍然没有发生,我必须重新启动机器。
此外,我会执行以下操作以确保程序在成功运行时清除 GPU 内存:
net = nil
trainset = nil
testset = nil
pred = nil
inputGrad = nil
criterion = nil
collectgarbage()