如何使用 30 系列卡加快 tensorflow 中的 'Adding visible gpu devices' 进程?
How to speed up the 'Adding visible gpu devices' process in tensorflow with a 30 series card?
每次 运行 代码时,我都会坚持约 2 分钟。网上很多人都说第一个运行只需要很长时间,但我不是这样。虽然它不会让任何事情出错,但它很烦人。当我卡住时,系统的使用率非常低,包括 CPU、系统 RAM、GPU、视频内存。
我正在使用 Nvidia Geforce RTX 3070,Windows 10 x64 20H2.Here 我的环境:
# Name Version Build Channel
blas 1.0 mkl defaults
boto3 1.16.47 pypi_0 pypi
botocore 1.19.47 pypi_0 pypi
ca-certificates 2020.12.8 haa95532_0 defaults
certifi 2020.12.5 py38haa95532_0 defaults
click 7.1.2 pypi_0 pypi
cudatoolkit 11.0.221 h74a9793_0 defaults
freetype 2.10.4 hd328e21_0 defaults
intel-openmp 2020.2 254 defaults
jmespath 0.10.0 pypi_0 pypi
joblib 1.0.0 pypi_0 pypi
jpeg 9b hb83a4c4_2 defaults
keras 2.4.3 pypi_0 pypi
libpng 1.6.37 h2a8f88b_0 defaults
libtiff 4.1.0 h56a325e_1 defaults
libuv 1.40.0 he774522_0 defaults
lz4-c 1.9.2 hf4a77e7_3 defaults
mkl 2020.2 256 defaults
mkl-service 2.3.0 py38h196d8e1_0 defaults
mkl_fft 1.2.0 py38h45dec08_0 defaults
mkl_random 1.1.1 py38h47e9c7a_0 defaults
ninja 1.10.2 py38h6d14046_0 defaults
numpy 1.19.2 py38hadc3359_0 defaults
numpy-base 1.19.2 py38ha3acd2a_0 defaults
olefile 0.46 py_0 defaults
openssl 1.1.1i h2bbff1b_0 defaults
pillow 8.0.1 py38h4fa10fc_0 defaults
pip 20.3.3 py38haa95532_0 defaults
python 3.8.5 h5fd99cc_1 defaults
pytorch 1.7.1 py3.8_cuda110_cudnn8_0 pytorch
regex 2020.11.13 pypi_0 pypi
s3transfer 0.3.3 pypi_0 pypi
sacremoses 0.0.43 pypi_0 pypi
scikit-learn 0.24.0 pypi_0 pypi
scipy 1.6.0 pypi_0 pypi
sentencepiece 0.1.94 pypi_0 pypi
setuptools 51.0.0 py38haa95532_2 defaults
six 1.15.0 py38haa95532_0 defaults
sklearn 0.0 pypi_0 pypi
sqlite 3.33.0 h2a8f88b_0 defaults
tb-nightly 2.5.0a20210101 pypi_0 pypi
threadpoolctl 2.1.0 pypi_0 pypi
thulac 0.2.1 pypi_0 pypi
tk 8.6.10 he774522_0 defaults
torchaudio 0.7.2 py38 pytorch
torchvision 0.8.2 py38_cu110 pytorch
transformers 2.1.1 pypi_0 pypi
typing_extensions 3.7.4.3 py_0 defaults
vc 14.2 h21ff451_1 defaults
vs2015_runtime 14.27.29016 h5e58377_2 defaults
wheel 0.36.2 pyhd3eb1b0_0 defaults
wincertstore 0.2 py38_0 defaults
xz 5.2.5 h62dcd97_0 defaults
zlib 1.2.11 h62dcd97_4 defaults
zstd 1.4.5 h04227a9_0 defaults
虽然我使用的是 PyTorch,但应该归咎于 tensorflow 而不是 PyTorch(根据日志)。我在纯 tensorflow 2.3
中遇到了同样的问题
2021-01-03 01:17:50.516100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.622054: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-03 01:17:52.645796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.645998: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.649575: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.649707: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.649827: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.649928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.651954: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.660165: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.660416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:17:52.660971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-03 01:17:52.668967: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x19659fe67d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:17:52.669132: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-01-03 01:17:52.669395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.669576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.669683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.669790: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.669896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.670072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.670201: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.670365: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:18:37.097681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-03 01:18:37.097876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-01-03 01:18:37.098025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-01-03 01:18:37.098301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6591 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-01-03 01:18:37.101296: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1960330d0d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:18:37.101474: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 3070, Compute Capability 8.6
args:
Namespace(articles_per_title=10, device='0,1,2,3', length=1000, model_config='config/model_config_small.json', model_path='model/final_model', no_wordpiece=False, repetition_penalty=1.0, save_path='generated/', segment=False, temperature=2.0, titles='我', titles_file='', tokenizer_path='cache/vocab_small.txt', topk=10, topp=0)
我注意到the tensorflow installation guide for GPU users说安培架构的GPU可能会遇到这个问题,可以通过export CUDA_CACHE_MAXSIZE=2147483648
扩展默认的JIT缓存来解决。它不适用于 Windows。我搜索了我的环境变量,其中 none 个名为 CUDA_CACHE_MAXSIZE
。
我尝试自己添加它,但仍然需要很长时间才能通过 Adding Visible Devices 0
。
我该怎么办?
只需转到 Windows Environment Variables
并在 system variables
下设置 CUDA_CACHE_MAXSIZE=2147483648
。
而且你需要REBOOT,然后一切都会好起来的。
你很幸运能得到一张安培卡,因为它们到处都缺货。
每次 运行 代码时,我都会坚持约 2 分钟。网上很多人都说第一个运行只需要很长时间,但我不是这样。虽然它不会让任何事情出错,但它很烦人。当我卡住时,系统的使用率非常低,包括 CPU、系统 RAM、GPU、视频内存。 我正在使用 Nvidia Geforce RTX 3070,Windows 10 x64 20H2.Here 我的环境:
# Name Version Build Channel
blas 1.0 mkl defaults
boto3 1.16.47 pypi_0 pypi
botocore 1.19.47 pypi_0 pypi
ca-certificates 2020.12.8 haa95532_0 defaults
certifi 2020.12.5 py38haa95532_0 defaults
click 7.1.2 pypi_0 pypi
cudatoolkit 11.0.221 h74a9793_0 defaults
freetype 2.10.4 hd328e21_0 defaults
intel-openmp 2020.2 254 defaults
jmespath 0.10.0 pypi_0 pypi
joblib 1.0.0 pypi_0 pypi
jpeg 9b hb83a4c4_2 defaults
keras 2.4.3 pypi_0 pypi
libpng 1.6.37 h2a8f88b_0 defaults
libtiff 4.1.0 h56a325e_1 defaults
libuv 1.40.0 he774522_0 defaults
lz4-c 1.9.2 hf4a77e7_3 defaults
mkl 2020.2 256 defaults
mkl-service 2.3.0 py38h196d8e1_0 defaults
mkl_fft 1.2.0 py38h45dec08_0 defaults
mkl_random 1.1.1 py38h47e9c7a_0 defaults
ninja 1.10.2 py38h6d14046_0 defaults
numpy 1.19.2 py38hadc3359_0 defaults
numpy-base 1.19.2 py38ha3acd2a_0 defaults
olefile 0.46 py_0 defaults
openssl 1.1.1i h2bbff1b_0 defaults
pillow 8.0.1 py38h4fa10fc_0 defaults
pip 20.3.3 py38haa95532_0 defaults
python 3.8.5 h5fd99cc_1 defaults
pytorch 1.7.1 py3.8_cuda110_cudnn8_0 pytorch
regex 2020.11.13 pypi_0 pypi
s3transfer 0.3.3 pypi_0 pypi
sacremoses 0.0.43 pypi_0 pypi
scikit-learn 0.24.0 pypi_0 pypi
scipy 1.6.0 pypi_0 pypi
sentencepiece 0.1.94 pypi_0 pypi
setuptools 51.0.0 py38haa95532_2 defaults
six 1.15.0 py38haa95532_0 defaults
sklearn 0.0 pypi_0 pypi
sqlite 3.33.0 h2a8f88b_0 defaults
tb-nightly 2.5.0a20210101 pypi_0 pypi
threadpoolctl 2.1.0 pypi_0 pypi
thulac 0.2.1 pypi_0 pypi
tk 8.6.10 he774522_0 defaults
torchaudio 0.7.2 py38 pytorch
torchvision 0.8.2 py38_cu110 pytorch
transformers 2.1.1 pypi_0 pypi
typing_extensions 3.7.4.3 py_0 defaults
vc 14.2 h21ff451_1 defaults
vs2015_runtime 14.27.29016 h5e58377_2 defaults
wheel 0.36.2 pyhd3eb1b0_0 defaults
wincertstore 0.2 py38_0 defaults
xz 5.2.5 h62dcd97_0 defaults
zlib 1.2.11 h62dcd97_4 defaults
zstd 1.4.5 h04227a9_0 defaults
虽然我使用的是 PyTorch,但应该归咎于 tensorflow 而不是 PyTorch(根据日志)。我在纯 tensorflow 2.3
中遇到了同样的问题2021-01-03 01:17:50.516100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.622054: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-03 01:17:52.645796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.645998: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.649575: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.649707: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.649827: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.649928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.651954: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.660165: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.660416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:17:52.660971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-03 01:17:52.668967: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x19659fe67d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:17:52.669132: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-01-03 01:17:52.669395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.669576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.669683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.669790: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.669896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.670072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.670201: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.670365: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:18:37.097681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-03 01:18:37.097876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-01-03 01:18:37.098025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-01-03 01:18:37.098301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6591 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-01-03 01:18:37.101296: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1960330d0d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:18:37.101474: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 3070, Compute Capability 8.6
args:
Namespace(articles_per_title=10, device='0,1,2,3', length=1000, model_config='config/model_config_small.json', model_path='model/final_model', no_wordpiece=False, repetition_penalty=1.0, save_path='generated/', segment=False, temperature=2.0, titles='我', titles_file='', tokenizer_path='cache/vocab_small.txt', topk=10, topp=0)
我注意到the tensorflow installation guide for GPU users说安培架构的GPU可能会遇到这个问题,可以通过export CUDA_CACHE_MAXSIZE=2147483648
扩展默认的JIT缓存来解决。它不适用于 Windows。我搜索了我的环境变量,其中 none 个名为 CUDA_CACHE_MAXSIZE
。
我尝试自己添加它,但仍然需要很长时间才能通过 Adding Visible Devices 0
。
我该怎么办?
只需转到 Windows Environment Variables
并在 system variables
下设置 CUDA_CACHE_MAXSIZE=2147483648
。
而且你需要REBOOT,然后一切都会好起来的。
你很幸运能得到一张安培卡,因为它们到处都缺货。