Python3/Theano 导致 CUDA 获取设备属性错误的可能原因？

Question

我正在尝试在 Python3 中使用多个 GPU 进行多处理。我可以运行一个简单的测试用例，如下所示：

import theano
import theano.tensor as T
import multiprocessing as mp
import time
# import lasagne

def target():
    import theano.sandbox.cuda
    print("target about to use")
    theano.sandbox.cuda.use('gpu1')
    print("target is using")
    import lasagne
    time.sleep(15)
    print("target is exiting")

x = T.scalar('x', dtype='float32')

p = mp.Process(target=target)

p.start()

time.sleep(1)
import theano.sandbox.cuda
print("master about to use")
theano.sandbox.cuda.use('gpu0')
print("master is using")
import lasagne
time.sleep(4)
print("master will join")

p.join()
print("master is exiting")

当我运行这样做时，我成功地使用了 GPU 获得了主进程和派生进程：

>> target about to use
>> master about to use
>> Using gpu device 1: GeForce GTX 1080 (CNMeM is enabled with initial size: 50.0% of memory, cuDNN 5105)
>> target is using
>> Using gpu device 0: GeForce GTX 1080 (CNMeM is enabled with initial size: 50.0% of memory, cuDNN 5105)
>> master is using
>> master will join
>> target is exiting
>> master is exiting

但是在更复杂的代码库中，当我尝试设置相同的方案时，生成的工作程序失败并显示：

ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device 1 failed:
Unable to get properties of gpu 1: initialization error
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
Not able to select available GPU from 2 cards (initialization error).

我很难找出造成这种情况的原因。在上面的代码片段中，如果 lasagne 在分叉之前在顶部导入，则会重新创建问题。但是我已经设法阻止我的代码导入 lasagne 直到分叉并尝试使用 GPU（我检查了 sys.modules.keys()），但问题仍然存在。我没有看到任何与 Theano 相关的东西，除了 theano 本身和 theano.tensor 在分叉之前被导入，但在上面的例子中这很好。

有没有其他人追过类似的东西？

Answer 1

我之前在 Windows PC 和 GTX-980 上尝试用 Python3 配置 Theano 时遇到过类似的问题。它在 CPU 上运行良好，但它只是不使用 GPU。

之后，我尝试用Python2/Theano配置，问题解决了。我想这可能是 CUDA 版本有问题。您可以尝试 Python2/Theano（如果需要，可以使用虚拟环境）。

Answer 2

好的，事实证明这很简单...我在分叉前的位置有一个 import theano.sandbox.cuda，但这只需要在分叉之后发生.仍然有必要将 lasagne 导入移动到分叉之后，以防对其他人有所帮助。

（在我的例子中，我实际上需要来自基于 lasagne 的代码的信息，所以我必须生成一个一次性进程来加载它并将相关值返回给主线程. 然后 master 可以相应地构建共享对象，fork，随后每个进程构建自己的基于 lasagne 的对象，这些对象在自己的 GPU 上工作。）

Python3/Theano 导致 CUDA 获取设备属性错误的可能原因？

Possible causes of CUDA get device properties error with Python3 / Theano?

python

theano

python-multiprocessing

lasagne