火炬 1.6.0 或更高版本的 Pytorch 自定义 CUDA 扩展构建失败
Pytorch custom CUDA extension build fails for torch 1.6.0 or higher
我有一个用于 pytorch 的自定义 CUDA 扩展 (https://pytorch.org/tutorials/advanced/cpp_extension.html),它过去可以很好地与 pytorch1.4、CUDA10.1 和 Titan Xp GPU 一起工作。但是,最近我们将系统更改为新的 A40 GPU 和 CUDA11.1。当我尝试使用 CUDA11.1、pytorch 1.8.1、gcc 9.3.0 和 Ubuntu 20.04 构建我的自定义 pytorch 扩展时,出现以下错误:
$ python3 setup.py install
running install
running bdist_egg
running egg_info
creating cuda_test.egg-info
writing cuda_test.egg-info/PKG-INFO
writing dependency_links to cuda_test.egg-info/dependency_links.txt
writing top-level names to cuda_test.egg-info/top_level.txt
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
reading manifest file 'cuda_test.egg-info/SOURCES.txt'
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'cuda_test' extension
creating /path/to/code/cuda/test/build
creating /path/to/code/cuda/test/build/temp.linux-x86_64-3.7
Emitting ninja build file /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o
/cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(256): error: identifier "FLT_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(274): error: identifier "DBL_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(190): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(228): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(243): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(293): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(406): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(498): error: identifier "DBL_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(562): error: identifier "DBL_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(565): error: identifier "DBL_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(630): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(119): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(137): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(147): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(170): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(249): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(327): error: identifier "FLT_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(375): error: identifier "FLT_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(377): error: identifier "FLT_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(420): error: identifier "FLT_EPSILON" is undefined
我还编写了一个简单的测试代码来验证我较大的 CPP/CUDA 代码不是产生相同错误消息的罪魁祸首。我还检查了 arithmetic.h 和 catrig.h 是否包含 ,它应该提供 {FLT,DBL}_{MIN,MAX,EPSILON,MANT_DIG} 定义,但这看起来很正常,因为它是标准的 NVIDIA 代码。
如果有人遇到类似问题或知道解决方案,请告诉我。
----更新----
以下是我尝试过的其他一些方法:
- CUDA 代码在我使用 CUDA10.1、pytorch 1.4.0、gcc 9.3.0 和 Ubuntu 20.04 时编译。
- 改用 pytorch 1.5.1 会产生以下错误:
/usr/include/c++/9/bits/stl_function.h(437): error: identifier "__builtin_is_constant_evaluated" is undefined
但这可以通过将 gcc 降级到 7.5 版来解决。
- 使用 pytorch 1.6.0 或更高版本总是导致开始时报告的错误,即使使用 gcc-7 也是如此。
我发现了问题。英特尔 MKL 模块未正确加载并导致错误。修复此问题后,编译也适用于 CUDA 11.1 和 pytorch 1.8.1!
我有一个用于 pytorch 的自定义 CUDA 扩展 (https://pytorch.org/tutorials/advanced/cpp_extension.html),它过去可以很好地与 pytorch1.4、CUDA10.1 和 Titan Xp GPU 一起工作。但是,最近我们将系统更改为新的 A40 GPU 和 CUDA11.1。当我尝试使用 CUDA11.1、pytorch 1.8.1、gcc 9.3.0 和 Ubuntu 20.04 构建我的自定义 pytorch 扩展时,出现以下错误:
$ python3 setup.py install
running install
running bdist_egg
running egg_info
creating cuda_test.egg-info
writing cuda_test.egg-info/PKG-INFO
writing dependency_links to cuda_test.egg-info/dependency_links.txt
writing top-level names to cuda_test.egg-info/top_level.txt
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
reading manifest file 'cuda_test.egg-info/SOURCES.txt'
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'cuda_test' extension
creating /path/to/code/cuda/test/build
creating /path/to/code/cuda/test/build/temp.linux-x86_64-3.7
Emitting ninja build file /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o
/cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(256): error: identifier "FLT_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(274): error: identifier "DBL_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(190): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(228): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(243): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(293): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(406): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(498): error: identifier "DBL_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(562): error: identifier "DBL_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(565): error: identifier "DBL_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(630): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(119): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(137): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(147): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(170): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(249): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(327): error: identifier "FLT_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(375): error: identifier "FLT_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(377): error: identifier "FLT_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(420): error: identifier "FLT_EPSILON" is undefined
我还编写了一个简单的测试代码来验证我较大的 CPP/CUDA 代码不是产生相同错误消息的罪魁祸首。我还检查了 arithmetic.h 和 catrig.h 是否包含
----更新----
以下是我尝试过的其他一些方法:
- CUDA 代码在我使用 CUDA10.1、pytorch 1.4.0、gcc 9.3.0 和 Ubuntu 20.04 时编译。
- 改用 pytorch 1.5.1 会产生以下错误:
/usr/include/c++/9/bits/stl_function.h(437): error: identifier "__builtin_is_constant_evaluated" is undefined
但这可以通过将 gcc 降级到 7.5 版来解决。 - 使用 pytorch 1.6.0 或更高版本总是导致开始时报告的错误,即使使用 gcc-7 也是如此。
我发现了问题。英特尔 MKL 模块未正确加载并导致错误。修复此问题后,编译也适用于 CUDA 11.1 和 pytorch 1.8.1!