在 google ml 引擎上部署模型时检测到错误模型

Question

我使用具有以下配置的 google ml 引擎训练了我的模型。

JOB_NAME=object_detection"_$(date +%m_%d_%Y_%H_%M_%S)"
echo $JOB_NAME
gcloud ml-engine jobs submit training $JOB_NAME \
        --job-dir=gs:///train \
        --scale-tier BASIC_GPU \
        --runtime-version 1.12 \
        --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
        --module-name object_detection.model_main \
        --region europe-west1 \
        -- \
        --model_dir=gs:///train \
        --pipeline_config_path=gs:///data/fast_rcnn_resnet101_coco.config

训练后，我从 GCP 下载最新的检查点并使用以下命令导出模型：

python export_inference_graph.py --input_type encoded_image_string_tensor --pipeline_config_path training/fast_rcnn_resnet101_coco.config --trained_checkpoint_prefix training/model.ckpt-11127 --output_directory exported_graphs

我的模型配置如下所示：

The given SavedModel SignatureDef contains the following input(s):
  inputs['inputs'] tensor_info:
      dtype: DT_UINT8
      shape: (-1, -1, -1, 3)
      name: image_tensor:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['detection_boxes'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 300, 4)
      name: detection_boxes:0
  outputs['detection_classes'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 300)
      name: detection_classes:0
  outputs['detection_features'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, -1, -1)
      name: detection_features:0
  outputs['detection_multiclass_scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 300, 2)
      name: detection_multiclass_scores:0
  outputs['detection_scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 300)
      name: detection_scores:0
  outputs['num_detections'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1)
      name: num_detections:0
  outputs['raw_detection_boxes'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 300, 4)
      name: raw_detection_boxes:0
  outputs['raw_detection_scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 300, 2)
      name: raw_detection_scores:0
Method name is: tensorflow/serving/predict

在此之后，我使用以下配置在 ml-engine 上部署此模型：

Python version 2.7
Framework TensorFlow
Framework version 1.12.3
Runtime version 1.12
Machine type Single core CPU

我收到以下错误：

错误

创建版本失败。检测到错误模型："Failed to load model: Loading servable: {name: default version: 1} failed: Not found: Op type not registered 'FusedBatchNormV3' in binary running on localhost. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.\n\n (Error code: 0)"

Answer 1

很可能是某个地方的 Tf 版本不兼容，例如在模型和运行时之间。您创建模型时使用的 Tf 版本实际上是运行吗？

许多帖子似乎证实了我的回答：

Bad model deploying to GCP Cloudml

Answer 2

我能够弄清楚这一点：在导出模型时，我使用的是不同的 Tensorflow 版本。为了保持连贯性并避免此类错误，请确保训练、导出和部署期间的 Tensorflow 版本都相同。

在 google ml 引擎上部署模型时检测到错误模型

Bad model detected when deploying model on google ml engine

google-cloud-ml