有没有办法将 torch.nn.DataParallel 与 CPU 一起使用？

Question

我正在尝试更改一些 PyTorch 代码，以便它可以运行在 CPU。

模型是用 torch.nn.DataParallel() 训练的，所以当我加载预训练模型并尝试使用它时，我必须使用 nn.DataParallel()，我目前正在这样做：

device = torch.device("cuda:0")
net = nn.DataParallel(net, device_ids=[0])
net.load_state_dict(torch.load(PATH))
net.to(device)

然而，在我将我的手电筒设备切换到 cpu 之后：

device = torch.device('cpu')
net = nn.DataParallel(net, device_ids=[0])
net.load_state_dict(torch.load(PATH))
net.to(device)

我收到这个错误：

File "C:\My\Program\win-py362-venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 156, in forward
    "them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

我假设它仍在寻找 CUDA，因为那是 device_ids 的设置，但是有没有办法让它使用 CPU？ This post from the PyTorch repo 让我觉得我可以，但它没有解释如何。

如果没有，是否有任何其他方法可以在您的 CPU 上使用通过 DataParallel 训练的模型？

Answer 1

当您使用 torch.nn.DataParallel() 时，它在模块级别实现数据并行。

According to the doc:

The parallelized module must have its parameters and buffers on device_ids[0] before running this DataParallel module.

所以即使你正在做 .to(torch.device('cpu')) 它仍然期望将数据传递给 GPU。

然而，由于 DataParallel 是一个容器，您可以绕过它并通过这样做只获得原始模块：

net = net.module.to(device)

现在它将访问您在应用 DataParallel 容器之前定义的原始模块。

有没有办法将 torch.nn.DataParallel 与 CPU 一起使用？

Is there a way to use torch.nn.DataParallel with CPU?

python

parallel-processing

pytorch