Tensorflow 无法打开 libcuda.so.1
Tensorflow cannot open libcuda.so.1
我有一台配备 GeForce 940 MX 的笔记本电脑。我想在 gpu 上启动 Tensorflow 和 运行。我从他们的教程页面安装了所有内容,现在当我导入 Tensorflow 时,我得到
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
>>>
之后我认为它只是在 cpu 上切换到 运行。
编辑:在我核对了一切之后,从头开始。现在我明白了:
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
libcuda.so.1 是特定于您的 NVIDIA 驱动程序版本的文件的符号 link。它可能指向错误的版本或者它可能不存在。
# See where the link is pointing.
ls /usr/lib/x86_64-linux-gnu/libcuda.so.1 -la
# My result:
# lrwxrwxrwx 1 root root 19 Feb 22 20:40 \
# /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> ./libcuda.so.375.39
# Make sure it is pointing to the right version.
# Compare it with the installed NVIDIA driver.
nvidia-smi
# Replace libcuda.so.1 with a link to the correct version
cd /usr/lib/x86_64-linux-gnu
sudo ln -f -s libcuda.so.<yournvidia.version> libcuda.so.1
现在以同样的方式,从 libcuda.so.1 到 LD_LIBRARY_PATH directory.[=12= 中同名的 link 创建另一个 symlink ]
您可能还发现需要在 /usr/lib/x86_64-linux-gnu 中创建一个 link 到 libcuda.so.1,名为 libcuda.so
我刚刚解决的案例,是更新GPU驱动到最新,安装cuda工具包。首先,添加 ppa 并安装 GPU 驱动程序:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-390
添加 ppa 后,显示驱动程序版本选项,390 是显示的最新 'stable' 版本。
然后安装cuda工具包:
sudo apt install nvidia-cuda-toolkit
然后重启:
sudo reboot
它将驱动程序更新到比第一步中最初安装的 390 更新的版本(它是 410;这是 AWS 上的一个 p2.xlarge 实例)。
以防万一还有人遇到这个问题。首先确保添加 --runtime=nvidia
参数以便 运行 您的容器。
docker run --runtime=nvidia -t tensorflow/serving:latest-gpu
其中 tensorflow/serving:latest-gpu
是 docker 图片的名称。
我有一台配备 GeForce 940 MX 的笔记本电脑。我想在 gpu 上启动 Tensorflow 和 运行。我从他们的教程页面安装了所有内容,现在当我导入 Tensorflow 时,我得到
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
>>>
之后我认为它只是在 cpu 上切换到 运行。
编辑:在我核对了一切之后,从头开始。现在我明白了:
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
libcuda.so.1 是特定于您的 NVIDIA 驱动程序版本的文件的符号 link。它可能指向错误的版本或者它可能不存在。
# See where the link is pointing.
ls /usr/lib/x86_64-linux-gnu/libcuda.so.1 -la
# My result:
# lrwxrwxrwx 1 root root 19 Feb 22 20:40 \
# /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> ./libcuda.so.375.39
# Make sure it is pointing to the right version.
# Compare it with the installed NVIDIA driver.
nvidia-smi
# Replace libcuda.so.1 with a link to the correct version
cd /usr/lib/x86_64-linux-gnu
sudo ln -f -s libcuda.so.<yournvidia.version> libcuda.so.1
现在以同样的方式,从 libcuda.so.1 到 LD_LIBRARY_PATH directory.[=12= 中同名的 link 创建另一个 symlink ]
您可能还发现需要在 /usr/lib/x86_64-linux-gnu 中创建一个 link 到 libcuda.so.1,名为 libcuda.so
我刚刚解决的案例,是更新GPU驱动到最新,安装cuda工具包。首先,添加 ppa 并安装 GPU 驱动程序:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-390
添加 ppa 后,显示驱动程序版本选项,390 是显示的最新 'stable' 版本。
然后安装cuda工具包:
sudo apt install nvidia-cuda-toolkit
然后重启:
sudo reboot
它将驱动程序更新到比第一步中最初安装的 390 更新的版本(它是 410;这是 AWS 上的一个 p2.xlarge 实例)。
以防万一还有人遇到这个问题。首先确保添加 --runtime=nvidia
参数以便 运行 您的容器。
docker run --runtime=nvidia -t tensorflow/serving:latest-gpu
其中 tensorflow/serving:latest-gpu
是 docker 图片的名称。