Cloud Machine Learning Engine 部署模型失败

Question

我已经训练了我自己的模型和来自 official tutorial 的模型。

我已完成部署模型以支持预测的步骤。但是，它一直给我一个错误提示：

"create version failed. internal error happened"

当我尝试通过运行部署模型时：

gcloud ml-engine versions create v1 \
--model $MODEL_NAME \
--origin $MODEL_BINARIES \
--python-version 3.5 \
--runtime-version 1.13

*模型二进制文件应该是正确的，因为我将其指向包含 model.pb 和变量文件夹的文件夹，例如MODEL_BINARIES=gs://$BUCKET_NAME/results/20190404_020134/saved_model/1554343466.

我也尝试过更改模型的区域设置，但这没有帮助。

Answer 1

原来你的 GCS bucket 和训练好的模型需要在同一个区域。这在 Cloud ML 教程中没有得到很好的解释，它只说：

Note: Use the same region where you plan on running Cloud ML Engine jobs. The example uses us-central1 because that is the region used in the getting-started instructions.

另请注意，很多区域不能同时用于存储桶和模型训练（例如 asia-east1）。

Cloud Machine Learning Engine 部署模型失败

Cloud Machine Learning Engine fails to deploy model

google-cloud-platform

google-cloud-ml