Pre_trained 模型在 ResNet、InceptionNet 上运行良好但无法在 VGG16 和 VGG19 上运行 运行
Pre_trained model work well on ResNet, InceptionNet but unable to run on VGG16 and VGG19
我在使用一些预训练模型进行对象分类时遇到了这个问题。此代码适用于 ResNet 和 Inception,但是当我使用 VGG16 或 VGG19 时,cudnn 出现了一些问题。
我运行我在conda虚拟环境中的代码有tensorflow-gpu=2.2.0, cuda=10.1, cudnn=7.6.5.
我的 OS 的 cudnn 是 8.0.4。这会是个问题吗???我用这个系统在很多模型上工作得很好,但在这个案例中不是这样。
这是我的代码:
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to the input image")
ap.add_argument("-model", "--model", type=str, default="vgg16",
help="name of pre-trained network to use")
args = vars(ap.parse_args())
MODELS = {
"vgg16": VGG16,
"vgg19": VGG19,
"inception": InceptionV3,
"xception": Xception, # TensorFlow ONLY
"resnet": ResNet50
}
if args["model"] not in MODELS.keys():
raise AssertionError("The --model command line argument should "
"be a key in the `MODELS` dictionary")
inputShape = (224, 224)
preprocess = imagenet_utils.preprocess_input
if args["model"] in ("inception", "xception"):
inputShape = (299, 299)
preprocess = preprocess_input
Network = MODELS[args["model"]]
model = Network(weights="imagenet")
#model = Network()
model.summary()
image = load_img(args["image"], target_size=inputShape)
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
image = preprocess(image)
preds = model.predict(image)
P = imagenet_utils.decode_predictions(preds)
for (i, (imagenetID, label, prob)) in enumerate(P[0]):
print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))
这是日志消息:
2020-11-08 11:14:31.324751: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-08 11:14:31.334392: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "Classify_keras_applications.py", line 92, in <module>
preds = model.predict(image)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 88, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1268, in predict
tmp_batch_outputs = predict_function(iterator)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
result = self._call(*args, **kwds)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 650, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
return self._call_flat(
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
outputs = execute.execute(
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node vgg19/block1_conv1/Conv2D (defined at Classify_keras_applications.py:92) ]] [Op:__inference_predict_function_763]
Function call stack:
predict_function
你检查过这个问题了吗:https://github.com/tensorflow/tensorflow/issues/34888
他们提到要在您的代码顶部添加此代码:
import tensorflow as tf
gpus= tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
这不会一次性分配您的 GPU 的所有内存,但它会随着模型的增长而增加。
但是,我敢打赌 VGGx 不适合您的 GPU 内存,即使有这个额外的代码,我也不认为它会适合。
作为参考,检查这个 doc:
- VGG16: 528 MB
- VGG19: 549 MB
并且:
- ResNet50:98MB
- InceptionV3:92MB
VGGx 比其他的大 5 倍
我在使用一些预训练模型进行对象分类时遇到了这个问题。此代码适用于 ResNet 和 Inception,但是当我使用 VGG16 或 VGG19 时,cudnn 出现了一些问题。
我运行我在conda虚拟环境中的代码有tensorflow-gpu=2.2.0, cuda=10.1, cudnn=7.6.5.
我的 OS 的 cudnn 是 8.0.4。这会是个问题吗???我用这个系统在很多模型上工作得很好,但在这个案例中不是这样。
这是我的代码:
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to the input image")
ap.add_argument("-model", "--model", type=str, default="vgg16",
help="name of pre-trained network to use")
args = vars(ap.parse_args())
MODELS = {
"vgg16": VGG16,
"vgg19": VGG19,
"inception": InceptionV3,
"xception": Xception, # TensorFlow ONLY
"resnet": ResNet50
}
if args["model"] not in MODELS.keys():
raise AssertionError("The --model command line argument should "
"be a key in the `MODELS` dictionary")
inputShape = (224, 224)
preprocess = imagenet_utils.preprocess_input
if args["model"] in ("inception", "xception"):
inputShape = (299, 299)
preprocess = preprocess_input
Network = MODELS[args["model"]]
model = Network(weights="imagenet")
#model = Network()
model.summary()
image = load_img(args["image"], target_size=inputShape)
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
image = preprocess(image)
preds = model.predict(image)
P = imagenet_utils.decode_predictions(preds)
for (i, (imagenetID, label, prob)) in enumerate(P[0]):
print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))
这是日志消息:
2020-11-08 11:14:31.324751: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-08 11:14:31.334392: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "Classify_keras_applications.py", line 92, in <module>
preds = model.predict(image)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 88, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1268, in predict
tmp_batch_outputs = predict_function(iterator)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
result = self._call(*args, **kwds)
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 650, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
return self._call_flat(
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
outputs = execute.execute(
File "/home/phat/anaconda3/envs/DL/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node vgg19/block1_conv1/Conv2D (defined at Classify_keras_applications.py:92) ]] [Op:__inference_predict_function_763]
Function call stack:
predict_function
你检查过这个问题了吗:https://github.com/tensorflow/tensorflow/issues/34888
他们提到要在您的代码顶部添加此代码:
import tensorflow as tf
gpus= tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
这不会一次性分配您的 GPU 的所有内存,但它会随着模型的增长而增加。 但是,我敢打赌 VGGx 不适合您的 GPU 内存,即使有这个额外的代码,我也不认为它会适合。
作为参考,检查这个 doc:
- VGG16: 528 MB
- VGG19: 549 MB
并且:
- ResNet50:98MB
- InceptionV3:92MB
VGGx 比其他的大 5 倍