Python.NET & TensorFlow & CUDA:无法加载动态库 'cublas64_11.dll'

Python.NET & TensorFlow & CUDA: Could not load dynamic library 'cublas64_11.dll'

我目前正在使用 Python.NET to build C# environments for interaction TensorFlow Agents 并在尝试加载 Cuda DLL 时收到 TensorFlow 错误。

当我 运行 纯 python 示例张量流加载 CUDA DLL 时没有问题:

2021-04-19 03:22:41.062449: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-19 03:22:41.062943: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 03:22:41.063347: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 03:22:41.063709: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-19 03:22:41.064088: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-19 03:22:41.064455: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-19 03:22:41.064832: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-19 03:22:41.065202: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll

但是,当我 运行 一个使用 Python 环境的环境本质上是使用 Python.Net 用 C# 编写的环境的包装器时收到错误,Cuda DLL 不是发现:

2021-04-19 03:15:14.884746: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2021-04-19 03:15:14.885031: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2021-04-19 03:15:14.885281: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2021-04-19 03:15:14.885586: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2021-04-19 03:15:14.885851: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-04-19 03:15:14.886174: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2021-04-19 03:15:14.886454: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found

重现问题的最少代码:


import tensorflow as tf
from TicTacToeSharpEnvironmentWrapper import TicTacToeEnvironment
env = TicTacToeEnvironment()
physical_devices = tf.config.list_physical_devices('GPU')

与TicTacToeSharpEnvironmentWrapper.py

import tensorflow as tf
from tf_agents.environments import py_environment
from tf_agents.specs import BoundedArraySpec
from tf_agents.trajectories.time_step import StepType
from tf_agents.trajectories.time_step import TimeStep
import numpy as np

assembly_path1 = r"C:\DesktopGym\bin\Debug"
import sys

sys.path.append(assembly_path1)
import clr
clr.AddReference("GymSharp")
from GymSharp import TicTacToeSharpEnvironment

"""A CSharp environment for Tic-Tac-Toe game."""
class TicTacToeEnvironment(py_environment.PyEnvironment):
  """A state-settable environment for Tic-Tac-Toe game.
  """

def __init__(self):
     super(TicTacToeEnvironment, self).__init__()
     self.sharp_env = TicTacToeSharpEnvironment()   

TicTacToeSharpEnvironment 是一个编译为 64 位 dll 的 c#class 库

public class TicTacToeSharpEnvironment
{
    static TicTacToeSharpEnvironment()
    {
        PythonInitiliazer.InitializePython();
    }
}    

而PythonInitiliazer用于初始化Python.Net

public class PythonInitiliazer
{
    static PythonInitiliazer()
    {
        InitializePython();
    }
    static bool initialized;
    public static void InitializePython()
    {
        if (!initialized)
        {
            initPython();
            initialized = true;
        }
    }
    private static void initPython()
    {

        string pathToVirtualEnv = @"C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\";

        Environment.SetEnvironmentVariable("PATH", pathToVirtualEnv, EnvironmentVariableTarget.Process);
        Environment.SetEnvironmentVariable("PYTHONHOME", pathToVirtualEnv, EnvironmentVariableTarget.Process);
        Environment.SetEnvironmentVariable("PYTHONPATH", $"{pathToVirtualEnv}\Lib\site-packages;{pathToVirtualEnv}\Lib;{pathToVirtualEnv}\scripts", EnvironmentVariableTarget.Process);
        Runtime.PythonDLL = "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\python37.dll";

        PythonEngine.PythonHome = pathToVirtualEnv;
        PythonEngine.PythonPath = Environment.GetEnvironmentVariable("PYTHONPATH", EnvironmentVariableTarget.Process);
        PythonEngine.Initialize();
    }
}    

完整代码有效。 C#环境的Python Wrapper通过了Tensorflow Agents unit tests for the Tic Tac Toe environment。 C# 环境可以封装为 Python 或 Tensor 流环境,各种代理可以针对该环境进行训练。

我不认为这是使用 x64 .Net DLL 的兼容性问题,因为我使用的是 64 位python,但我不能确定。

其他详细信息:

  1. GeForce RTX 3080(计算能力:8.6)
  2. Windows 10 20H2
  3. Net Framework 4.8(编译自 Visual Studio 2019)
  4. Python 3.7(64 位)
  5. 英伟达 CUDA 11.
  6. Tensforflow-GPU 2.4.1
  7. Tf-Agents 0.7.1
  8. pythonnet 3.0.0-preview2021-04-03

还有什么可能导致此问题?

我解决了这个问题。这是由于显示如何使用 Python.Net in a virtual environment.

的错误 Python.Net wiki 文档

对于面临此问题或非常相似问题的其他人,解决方法是不使用 Wiki 中的代码:

var pathToVirtualEnv = @"path\to\env";

Environment.SetEnvironmentVariable("PATH", pathToVirtualEnv, EnvironmentVariableTarget.Process);

这样做会覆盖您的 PATH 环境变量。

相反,将 python 虚拟环境的路径附加到现有 PATH 环境变量。

相反,将虚拟环境的路径附加到 PATH 环境变量。

string pathToVirtualEnv = @"path\to\env";

var path = Environment.GetEnvironmentVariable("PATH");
Environment.SetEnvironmentVariable("PATH", path + ";" + pathToVirtualEnv, EnvironmentVariableTarget.Process);