如何使用 30 系列卡加快 tensorflow 中的 'Adding visible gpu devices' 进程？

Question

每次运行代码时，我都会坚持约 2 分钟。网上很多人都说第一个运行只需要很长时间，但我不是这样。虽然它不会让任何事情出错，但它很烦人。当我卡住时，系统的使用率非常低，包括 CPU、系统 RAM、GPU、视频内存。我正在使用 Nvidia Geforce RTX 3070，Windows 10 x64 20H2.Here 我的环境：

# Name                    Version                   Build  Channel
blas                      1.0                         mkl    defaults
boto3                     1.16.47                  pypi_0    pypi
botocore                  1.19.47                  pypi_0    pypi
ca-certificates           2020.12.8            haa95532_0    defaults
certifi                   2020.12.5        py38haa95532_0    defaults
click                     7.1.2                    pypi_0    pypi
cudatoolkit               11.0.221             h74a9793_0    defaults
freetype                  2.10.4               hd328e21_0    defaults
intel-openmp              2020.2                      254    defaults
jmespath                  0.10.0                   pypi_0    pypi
joblib                    1.0.0                    pypi_0    pypi
jpeg                      9b                   hb83a4c4_2    defaults
keras                     2.4.3                    pypi_0    pypi
libpng                    1.6.37               h2a8f88b_0    defaults
libtiff                   4.1.0                h56a325e_1    defaults
libuv                     1.40.0               he774522_0    defaults
lz4-c                     1.9.2                hf4a77e7_3    defaults
mkl                       2020.2                      256    defaults
mkl-service               2.3.0            py38h196d8e1_0    defaults
mkl_fft                   1.2.0            py38h45dec08_0    defaults
mkl_random                1.1.1            py38h47e9c7a_0    defaults
ninja                     1.10.2           py38h6d14046_0    defaults
numpy                     1.19.2           py38hadc3359_0    defaults
numpy-base                1.19.2           py38ha3acd2a_0    defaults
olefile                   0.46                       py_0    defaults
openssl                   1.1.1i               h2bbff1b_0    defaults
pillow                    8.0.1            py38h4fa10fc_0    defaults
pip                       20.3.3           py38haa95532_0    defaults
python                    3.8.5                h5fd99cc_1    defaults
pytorch                   1.7.1           py3.8_cuda110_cudnn8_0    pytorch
regex                     2020.11.13               pypi_0    pypi
s3transfer                0.3.3                    pypi_0    pypi
sacremoses                0.0.43                   pypi_0    pypi
scikit-learn              0.24.0                   pypi_0    pypi
scipy                     1.6.0                    pypi_0    pypi
sentencepiece             0.1.94                   pypi_0    pypi
setuptools                51.0.0           py38haa95532_2    defaults
six                       1.15.0           py38haa95532_0    defaults
sklearn                   0.0                      pypi_0    pypi
sqlite                    3.33.0               h2a8f88b_0    defaults
tb-nightly                2.5.0a20210101           pypi_0    pypi
threadpoolctl             2.1.0                    pypi_0    pypi
thulac                    0.2.1                    pypi_0    pypi
tk                        8.6.10               he774522_0    defaults
torchaudio                0.7.2                      py38    pytorch
torchvision               0.8.2                py38_cu110    pytorch
transformers              2.1.1                    pypi_0    pypi
typing_extensions         3.7.4.3                    py_0    defaults
vc                        14.2                 h21ff451_1    defaults
vs2015_runtime            14.27.29016          h5e58377_2    defaults
wheel                     0.36.2             pyhd3eb1b0_0    defaults
wincertstore              0.2                      py38_0    defaults
xz                        5.2.5                h62dcd97_0    defaults
zlib                      1.2.11               h62dcd97_4    defaults
zstd                      1.4.5                h04227a9_0    defaults

虽然我使用的是 PyTorch，但应该归咎于 tensorflow 而不是 PyTorch（根据日志）。我在纯 tensorflow 2.3

中遇到了同样的问题

2021-01-03 01:17:50.516100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.622054: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-03 01:17:52.645796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.645998: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.649575: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.649707: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.649827: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.649928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.651954: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.660165: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.660416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:17:52.660971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-03 01:17:52.668967: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x19659fe67d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:17:52.669132: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-03 01:17:52.669395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.669576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.669683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.669790: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.669896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.670072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.670201: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.670365: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:18:37.097681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-03 01:18:37.097876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-03 01:18:37.098025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-03 01:18:37.098301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6591 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-01-03 01:18:37.101296: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1960330d0d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:18:37.101474: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3070, Compute Capability 8.6
args:
Namespace(articles_per_title=10, device='0,1,2,3', length=1000, model_config='config/model_config_small.json', model_path='model/final_model', no_wordpiece=False, repetition_penalty=1.0, save_path='generated/', segment=False, temperature=2.0, titles='我', titles_file='', tokenizer_path='cache/vocab_small.txt', topk=10, topp=0)

我注意到the tensorflow installation guide for GPU users说安培架构的GPU可能会遇到这个问题，可以通过export CUDA_CACHE_MAXSIZE=2147483648扩展默认的JIT缓存来解决。它不适用于 Windows。我搜索了我的环境变量，其中 none 个名为 CUDA_CACHE_MAXSIZE。我尝试自己添加它，但仍然需要很长时间才能通过 Adding Visible Devices 0。我该怎么办？

Answer 1

只需转到 Windows Environment Variables 并在 system variables 下设置 CUDA_CACHE_MAXSIZE=2147483648。而且你需要REBOOT，然后一切都会好起来的。

你很幸运能得到一张安培卡，因为它们到处都缺货。

如何使用 30 系列卡加快 tensorflow 中的 'Adding visible gpu devices' 进程？

How to speed up the 'Adding visible gpu devices' process in tensorflow with a 30 series card?

python

gpu

tensorflow

pytorch