在 google ml 引擎上部署模型时检测到错误模型
Bad model detected when deploying model on google ml engine
我使用具有以下配置的 google ml 引擎训练了我的模型。
JOB_NAME=object_detection"_$(date +%m_%d_%Y_%H_%M_%S)"
echo $JOB_NAME
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir=gs:///train \
--scale-tier BASIC_GPU \
--runtime-version 1.12 \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name object_detection.model_main \
--region europe-west1 \
-- \
--model_dir=gs:///train \
--pipeline_config_path=gs:///data/fast_rcnn_resnet101_coco.config
训练后,我从 GCP 下载最新的检查点并使用以下命令导出模型:
python export_inference_graph.py --input_type encoded_image_string_tensor --pipeline_config_path training/fast_rcnn_resnet101_coco.config --trained_checkpoint_prefix training/model.ckpt-11127 --output_directory exported_graphs
我的模型配置如下所示:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_UINT8
shape: (-1, -1, -1, 3)
name: image_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['detection_boxes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 4)
name: detection_boxes:0
outputs['detection_classes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300)
name: detection_classes:0
outputs['detection_features'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, -1, -1)
name: detection_features:0
outputs['detection_multiclass_scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 2)
name: detection_multiclass_scores:0
outputs['detection_scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300)
name: detection_scores:0
outputs['num_detections'] tensor_info:
dtype: DT_FLOAT
shape: (-1)
name: num_detections:0
outputs['raw_detection_boxes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 4)
name: raw_detection_boxes:0
outputs['raw_detection_scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 2)
name: raw_detection_scores:0
Method name is: tensorflow/serving/predict
在此之后,我使用以下配置在 ml-engine 上部署此模型:
Python version 2.7
Framework TensorFlow
Framework version 1.12.3
Runtime version 1.12
Machine type Single core CPU
我收到以下错误:
错误
创建版本失败。检测到错误模型:"Failed to load model: Loading servable: {name: default version: 1} failed: Not found: Op type not registered 'FusedBatchNormV3' in binary running on localhost. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler
should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.\n\n (Error code: 0)"
很可能是某个地方的 Tf 版本不兼容,例如在模型和运行时之间。您创建模型时使用的 Tf 版本实际上是 运行 吗?
许多帖子似乎证实了我的回答:
Bad model deploying to GCP Cloudml
我能够弄清楚这一点:在导出模型时,我使用的是不同的 Tensorflow 版本。为了保持连贯性并避免此类错误,请确保训练、导出和部署期间的 Tensorflow 版本都相同。
我使用具有以下配置的 google ml 引擎训练了我的模型。
JOB_NAME=object_detection"_$(date +%m_%d_%Y_%H_%M_%S)"
echo $JOB_NAME
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir=gs:///train \
--scale-tier BASIC_GPU \
--runtime-version 1.12 \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name object_detection.model_main \
--region europe-west1 \
-- \
--model_dir=gs:///train \
--pipeline_config_path=gs:///data/fast_rcnn_resnet101_coco.config
训练后,我从 GCP 下载最新的检查点并使用以下命令导出模型:
python export_inference_graph.py --input_type encoded_image_string_tensor --pipeline_config_path training/fast_rcnn_resnet101_coco.config --trained_checkpoint_prefix training/model.ckpt-11127 --output_directory exported_graphs
我的模型配置如下所示:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_UINT8
shape: (-1, -1, -1, 3)
name: image_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['detection_boxes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 4)
name: detection_boxes:0
outputs['detection_classes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300)
name: detection_classes:0
outputs['detection_features'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, -1, -1)
name: detection_features:0
outputs['detection_multiclass_scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 2)
name: detection_multiclass_scores:0
outputs['detection_scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300)
name: detection_scores:0
outputs['num_detections'] tensor_info:
dtype: DT_FLOAT
shape: (-1)
name: num_detections:0
outputs['raw_detection_boxes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 4)
name: raw_detection_boxes:0
outputs['raw_detection_scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 300, 2)
name: raw_detection_scores:0
Method name is: tensorflow/serving/predict
在此之后,我使用以下配置在 ml-engine 上部署此模型:
Python version 2.7
Framework TensorFlow
Framework version 1.12.3
Runtime version 1.12
Machine type Single core CPU
我收到以下错误:
错误
创建版本失败。检测到错误模型:"Failed to load model: Loading servable: {name: default version: 1} failed: Not found: Op type not registered 'FusedBatchNormV3' in binary running on localhost. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler
should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.\n\n (Error code: 0)"
很可能是某个地方的 Tf 版本不兼容,例如在模型和运行时之间。您创建模型时使用的 Tf 版本实际上是 运行 吗?
许多帖子似乎证实了我的回答:
Bad model deploying to GCP Cloudml
我能够弄清楚这一点:在导出模型时,我使用的是不同的 Tensorflow 版本。为了保持连贯性并避免此类错误,请确保训练、导出和部署期间的 Tensorflow 版本都相同。