Blas GEMM 启动失败:
Blas GEMM launch failed :
我正在尝试在预训练的 CNN 模型之上制作一个密集分类器。配置了工作 GPU,tensorflow 也使用 GPU 进行操作。我的环境不是由 anaconda 创建的,它有以下包:
IDE - Pycharm,TF = 2.4.0,CUDA = 11.0
但我不是因为
而无法获得输出
Blas GEMM launch failed : a.shape=(20, 8192), b.shape=(8192, 256), m=20, n=256, k=8192
也表明
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
我已经检查了网站并尝试设置
configuration.gpu_options.allow_growth = True
但这并没有帮助。我还尝试了以下代码:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
但是输出错误还是一样。在不同的堆栈答案中,我发现了一些其他的解决方案也没有用。
我的代码是这样的
import os
import numpy as np
import shutil
import matplotlib.pyplot as plt
import tensorflow as tf
print(tf.test.gpu_device_name())
print(tf.__version__)
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights = 'imagenet', include_top = False, input_shape = (150,150,3))
base_dir = 'C:/Users/emamu/Downloads/cat_and_dog_small'
train_dir = os.path.join(base_dir, 'train')
val_dir = os.path.join(base_dir, 'val')
test_dir = os.path.join(base_dir, 'test')
data_generator_unaugmented = ImageDataGenerator(1./255)
batch_size = 20
def extract_feature(directory, sample_count):
features = np.zeros(shape=(sample_count, 4, 4, 512))
labels = np.zeros(shape=(sample_count))
generator = data_generator_unaugmented.flow_from_directory(directory,
target_size = (150, 150),
batch_size = batch_size,
class_mode = 'binary')
i = 0
for input_batch, label_batch in generator:
feature_batch = conv_base.predict(input_batch)
features[i*batch_size:(i+1)*batch_size] = feature_batch
labels[i*batch_size:(i+1)*batch_size] = label_batch
i=i+1
if i*batch_size >= sample_count:
break
return features, labels
train_feature, train_label = extract_feature(train_dir, 2000)
val_feature, val_label = extract_feature(val_dir, 1000)
test_feature, test_label = extract_feature(test_dir, 1000)
train_feature = np.reshape(train_feature, (2000, 4*4*512))
val_feature = np.reshape(val_feature, (1000, 4*4*512))
test_feature = np.reshape(test_feature, (1000, 4*4*512))
# Creating a classifier model on top
model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_dim=4*4*512))
model.add(layers.Dropout(0.4))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer=optimizers.RMSprop(lr=1e-5), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(train_feature, train_label,
epochs = 30, batch_size = 20,
validation_data=(val_feature, val_label))
acc = history.history['acc']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc)+1)
plt.plot(epochs, acc, label = 'training accuracy')
plt.plot(epochs, val_acc, label = 'val accuracy')
plt.figure()
plt.show()
plt.plot(epochs, loss, label = 'training loss')
plt.plot(epochs, val_loss, label = 'val loss')
plt.figure()
plt.show()
无论我尝试什么,我都会收到以下错误:
C:\Users\emamu\PycharmProjects\gput_test_01\venv\Scripts\python.exe C:/Users/emamu/PycharmProjects/gput_test_01/cnn_transfer_learning.py
2021-05-06 12:00:03.592087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:05.532458: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-06 12:00:05.534984: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-05-06 12:00:05.559708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-05-06 12:00:05.559871: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:05.568725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:05.568814: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 12:00:05.572032: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 12:00:05.573585: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 12:00:05.580959: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 12:00:05.583500: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 12:00:05.584151: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:05.584333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
/device:GPU:0
2.4.0
2021-05-06 12:00:06.133316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-06 12:00:06.133441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-06 12:00:06.133513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-05-06 12:00:06.133740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 2903 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-05-06 12:00:06.134445: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 12:00:06.145654: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 12:00:06.146140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-05-06 12:00:06.146343: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:06.146433: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:06.146522: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 12:00:06.146610: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 12:00:06.146699: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 12:00:06.146787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 12:00:06.146880: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 12:00:06.146972: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:06.147095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 12:00:06.147534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-05-06 12:00:06.147732: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:06.147840: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:06.147933: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 12:00:06.148021: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 12:00:06.148106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 12:00:06.148193: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 12:00:06.148276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 12:00:06.148359: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:06.148464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 12:00:06.148562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-06 12:00:06.148646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-06 12:00:06.148699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-05-06 12:00:06.148810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2903 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-05-06 12:00:06.148979: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Found 2000 images belonging to 2 classes.
C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\keras_preprocessing\image\image_data_generator.py:720: UserWarning: This ImageDataGenerator specifies `featurewise_center`, but it hasn't been fit on any training data. Fit it first by calling `.fit(numpy_data)`.
warnings.warn('This ImageDataGenerator specifies '
2021-05-06 12:00:06.750321: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-06 12:00:06.876918: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:08.024040: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0
2021-05-06 12:00:08.087066: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0
2021-05-06 12:00:08.222607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:08.720199: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:09.177801: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.66GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-05-06 12:00:09.178227: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.66GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-05-06 12:00:09.257660: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:09.587331: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:10.196301: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:10.486611: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:11.055390: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:11.328883: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:11.828937: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:12.031743: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
Found 1000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Epoch 1/30
2021-05-06 12:00:45.989644: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.990645: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.991702: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.992462: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.993468: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:46.000724: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:46.000869: W tensorflow/stream_executor/stream.cc:1455] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:/Users/emamu/PycharmProjects/gput_test_01/cnn_transfer_learning.py", line 92, in <module>
history = model.fit(train_feature, train_label,
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
return graph_function._call_flat(
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call
outputs = execute.execute(
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20, 8192), b.shape=(8192, 256), m=20, n=256, k=8192
[[node sequential/dense/MatMul (defined at /Users/emamu/PycharmProjects/gput_test_01/cnn_transfer_learning.py:92) ]] [Op:__inference_train_function_11890]
Function call stack:
train_function
Process finished with exit code 1
此错误的实际原因是什么?这个问题有永久性的解决方法吗?
我已经解决了这个问题。该过程与 tensorflow website 状态相同。但是,如果安装了任何较旧的 CUDA 或 cuDNN,则首先从 PC 中删除所有 CUDA 驱动程序并安装 visual studio。然后,最重要的部分是选择正确的 CUDA 和 cuDNN 版本。如果TF的版本是2.4.x,则选择CUDA 11.0.x和8.05的cuDNN(或任何只支持CUDA 11.0.x的版本)。
要了解有关版本的更多详细信息,请查看 this 页面。再说一次,tensorflow、CUDA和cuDNN的版本是tensorflow-gpu安装过程中最关键的部分。
我正在尝试在预训练的 CNN 模型之上制作一个密集分类器。配置了工作 GPU,tensorflow 也使用 GPU 进行操作。我的环境不是由 anaconda 创建的,它有以下包: IDE - Pycharm,TF = 2.4.0,CUDA = 11.0
但我不是因为
而无法获得输出Blas GEMM launch failed : a.shape=(20, 8192), b.shape=(8192, 256), m=20, n=256, k=8192
也表明
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
我已经检查了网站并尝试设置
configuration.gpu_options.allow_growth = True
但这并没有帮助。我还尝试了以下代码:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
但是输出错误还是一样。在不同的堆栈答案中,我发现了一些其他的解决方案也没有用。
我的代码是这样的
import os
import numpy as np
import shutil
import matplotlib.pyplot as plt
import tensorflow as tf
print(tf.test.gpu_device_name())
print(tf.__version__)
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights = 'imagenet', include_top = False, input_shape = (150,150,3))
base_dir = 'C:/Users/emamu/Downloads/cat_and_dog_small'
train_dir = os.path.join(base_dir, 'train')
val_dir = os.path.join(base_dir, 'val')
test_dir = os.path.join(base_dir, 'test')
data_generator_unaugmented = ImageDataGenerator(1./255)
batch_size = 20
def extract_feature(directory, sample_count):
features = np.zeros(shape=(sample_count, 4, 4, 512))
labels = np.zeros(shape=(sample_count))
generator = data_generator_unaugmented.flow_from_directory(directory,
target_size = (150, 150),
batch_size = batch_size,
class_mode = 'binary')
i = 0
for input_batch, label_batch in generator:
feature_batch = conv_base.predict(input_batch)
features[i*batch_size:(i+1)*batch_size] = feature_batch
labels[i*batch_size:(i+1)*batch_size] = label_batch
i=i+1
if i*batch_size >= sample_count:
break
return features, labels
train_feature, train_label = extract_feature(train_dir, 2000)
val_feature, val_label = extract_feature(val_dir, 1000)
test_feature, test_label = extract_feature(test_dir, 1000)
train_feature = np.reshape(train_feature, (2000, 4*4*512))
val_feature = np.reshape(val_feature, (1000, 4*4*512))
test_feature = np.reshape(test_feature, (1000, 4*4*512))
# Creating a classifier model on top
model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_dim=4*4*512))
model.add(layers.Dropout(0.4))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer=optimizers.RMSprop(lr=1e-5), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(train_feature, train_label,
epochs = 30, batch_size = 20,
validation_data=(val_feature, val_label))
acc = history.history['acc']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc)+1)
plt.plot(epochs, acc, label = 'training accuracy')
plt.plot(epochs, val_acc, label = 'val accuracy')
plt.figure()
plt.show()
plt.plot(epochs, loss, label = 'training loss')
plt.plot(epochs, val_loss, label = 'val loss')
plt.figure()
plt.show()
无论我尝试什么,我都会收到以下错误:
C:\Users\emamu\PycharmProjects\gput_test_01\venv\Scripts\python.exe C:/Users/emamu/PycharmProjects/gput_test_01/cnn_transfer_learning.py
2021-05-06 12:00:03.592087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:05.532458: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-06 12:00:05.534984: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-05-06 12:00:05.559708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-05-06 12:00:05.559871: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:05.568725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:05.568814: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 12:00:05.572032: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 12:00:05.573585: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 12:00:05.580959: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 12:00:05.583500: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 12:00:05.584151: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:05.584333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
/device:GPU:0
2.4.0
2021-05-06 12:00:06.133316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-06 12:00:06.133441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-06 12:00:06.133513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-05-06 12:00:06.133740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 2903 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-05-06 12:00:06.134445: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 12:00:06.145654: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 12:00:06.146140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-05-06 12:00:06.146343: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:06.146433: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:06.146522: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 12:00:06.146610: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 12:00:06.146699: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 12:00:06.146787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 12:00:06.146880: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 12:00:06.146972: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:06.147095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 12:00:06.147534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-05-06 12:00:06.147732: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 12:00:06.147840: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:06.147933: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 12:00:06.148021: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 12:00:06.148106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 12:00:06.148193: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 12:00:06.148276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 12:00:06.148359: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:06.148464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 12:00:06.148562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-06 12:00:06.148646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-06 12:00:06.148699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-05-06 12:00:06.148810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2903 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-05-06 12:00:06.148979: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Found 2000 images belonging to 2 classes.
C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\keras_preprocessing\image\image_data_generator.py:720: UserWarning: This ImageDataGenerator specifies `featurewise_center`, but it hasn't been fit on any training data. Fit it first by calling `.fit(numpy_data)`.
warnings.warn('This ImageDataGenerator specifies '
2021-05-06 12:00:06.750321: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-06 12:00:06.876918: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 12:00:08.024040: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0
2021-05-06 12:00:08.087066: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0
2021-05-06 12:00:08.222607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 12:00:08.720199: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:09.177801: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.66GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-05-06 12:00:09.178227: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.66GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-05-06 12:00:09.257660: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:09.587331: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:10.196301: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:10.486611: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:11.055390: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:11.328883: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:11.828937: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:12.031743: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
Found 1000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Epoch 1/30
2021-05-06 12:00:45.989644: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.990645: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.991702: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.992462: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:45.993468: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:46.000724: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-05-06 12:00:46.000869: W tensorflow/stream_executor/stream.cc:1455] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:/Users/emamu/PycharmProjects/gput_test_01/cnn_transfer_learning.py", line 92, in <module>
history = model.fit(train_feature, train_label,
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
return graph_function._call_flat(
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call
outputs = execute.execute(
File "C:\Users\emamu\PycharmProjects\gput_test_01\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20, 8192), b.shape=(8192, 256), m=20, n=256, k=8192
[[node sequential/dense/MatMul (defined at /Users/emamu/PycharmProjects/gput_test_01/cnn_transfer_learning.py:92) ]] [Op:__inference_train_function_11890]
Function call stack:
train_function
Process finished with exit code 1
此错误的实际原因是什么?这个问题有永久性的解决方法吗?
我已经解决了这个问题。该过程与 tensorflow website 状态相同。但是,如果安装了任何较旧的 CUDA 或 cuDNN,则首先从 PC 中删除所有 CUDA 驱动程序并安装 visual studio。然后,最重要的部分是选择正确的 CUDA 和 cuDNN 版本。如果TF的版本是2.4.x,则选择CUDA 11.0.x和8.05的cuDNN(或任何只支持CUDA 11.0.x的版本)。
要了解有关版本的更多详细信息,请查看 this 页面。再说一次,tensorflow、CUDA和cuDNN的版本是tensorflow-gpu安装过程中最关键的部分。