如何在没有 CUDA 的情况下为 PyTorch 构建 CUDA 自定义 C++ 扩展?

How to build CUDA custom C++ extension for PyTorch without CUDA?

我的任务是创建一个 CI 工作流来为此应用程序构建 PyTorch CUDA 扩展。到目前为止,该应用程序是通过使用 CUDA GPU 创建目标 AWS VM,将所有源推送到那里和 运行 setup.py 来部署的,但我想在我们的 [=22= 中进行构建] 系统并将预构建的二进制文件部署到生产环境。

当 运行 setup.py 在 CI 系统中时,我收到错误“没有可用的 CUDA GPU” - 这是真的,[= 中没有 CUDA GPU 22=] 系统。有没有一种方法可以在没有 CUDA GPU 的情况下构建 CUDA 扩展?

这是错误信息:

gcc -pthread -shared -B /usr/local/miniconda/envs/build/compiler_compat -L/usr/local/miniconda/envs/build/lib -Wl,-rpath=/usr/local/miniconda/envs/build/lib -Wl,--no-as-needed -Wl,--sysroot=/ /app/my-app/build/temp.linux-x86_64-3.6/my-extension/my-module.o -L/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.6/my-extension/my-module.cpython-36m-x86_64-linux-gnu.so
building 'my-extension.my-module._cuda_ext' extension
creating /app/my-app/build/temp.linux-x86_64-3.6/my-extension/src
Traceback (most recent call last):
  File "setup.py", line 128, in <module>
    'build_ext': BuildExtension
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 653, in build_extensions
    build_ext.build_extensions(self)
  File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
    depends=ext.depends)
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 468, in unix_wrap_ninja_compile
    cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 377, in unix_cuda_flags
    cflags + _get_cuda_arch_flags(cflags) +
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1407, in _get_cuda_arch_flags
    capability = torch.cuda.get_device_capability()
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/cuda/__init__.py", line 291, in get_device_capability
    prop = get_device_properties(device)
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/cuda/__init__.py", line 296, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/cuda/__init__.py", line 172, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

我对CUDA不是很熟悉,对Python只有一半精通(我在这里作为“devops”的“ops”部分)。

这不是一个完整的解决方案,因为我缺乏完全找出解决方案的细节。但它应该对你或你的队友有帮助。


所以首先根据 source code,如果您设置了 CUDA arch flags,则不需要达到 torch._C._cuda_init()

这意味着 pytorch 正在尝试找出 CUDA arch 因为它不是由用户指定的。

这是相关的thread。如您所见,将 TORCH_CUDA_ARCH_LIST 环境设置为适合您环境的内容应该适合您。