无法再找到附加到 google 个云实例的 GPU

Question

过去几个月我一直在使用 google 云平台，没有任何问题。但是，我运行陷入了一个相当混乱的问题。我附有一个 GPU，我们将其用于深度学习模型。由于某种原因，此 GPU 不再显示在实例上。

当我运行

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 963983047914027708, name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 11201145405798739252
 physical_device_desc: "device: XLA_CPU device"]

我得到的输出表明没有可用的 GPU。当我尝试训练模型时，很明显它没有使用 GPU，因为训练速度急剧下降。

我最近所做的唯一更改是安装 miniconda 并为不同的项目创建一个新的 conda env；有什么方法可以干扰我当前代码识别 GPU 的能力吗？

在创建 conda env 期间，我运行遇到了当前 cuda 驱动程序和 cuda 版本的一些问题，但所有这些都发生在专用的 conda env 中，所以我看不到我怎么可能搞砸了会阻止 GPU 识别的东西。

提前致谢，诺亚

Answer 1

使用 GPU 时，您需要考虑到 restrictions, so I would recommend you to read through them and try to determine if any restriction has affected your particular case. As far as I know, installing new libraries doesn't have any impact on your code being incapable of recognising the GPUs. If you want to restore them, however, you can refer to this documentation link。

Answer 2

原来问题出在我们使用的 docker 文件中更新了 tensorflow 的默认版本。新版本是 CPU 版本，默认情况下不会寻找 GPU

无法再找到附加到 google 个云实例的 GPU

GPUs attached to google cloud instance no longer findable

python

google-compute-engine

tensorflow