Google Cloud MLEngine 预测错误 429

Question

我正在接触 Tensorflow 和 MLEngine。
基于 faster_rcnn_resnet101_coco_11_06_2017，我使用 Object Detection API.

使用自定义数据训练了一个模型

导出的模型大小为 190.5MB。

本地预测工作正常。但是 MLEngine 在使用 gcloud 时给我以下错误：

"error": {
    "code": 429,
    "message": "Prediction server is out of memory, possibly because model size 
    is too big.",
    "status": "RESOURCE_EXHAUSTED"
 }

并且在使用NodeJS client library时出现如下错误：

code: 429,
errors: 
 [ { message: 'Prediction server is out of memory, possibly because model size is too big.',
     domain: 'global',
     reason: 'rateLimitExceeded' } ] }

我用于测试预测的图像是 PNG，尺寸为 700px*525px (365kb) 和 373px*502px (90kb)

我不确定如何进行。
对象检测是否需要比 MLEngine 提供的内存更多的内存？
模型的大小真的是这里的问题吗？我该如何改进？

感谢您的帮助和想法！

Answer 1

有一个page in documentation explaining how http status codes can be interpreted in the context of online prediction. In this particular case, the nodes running your model ran out of memory (see also this answer to an older question by a googler working with the ML Engine). The suggested solutions would be reduce your model size and/or using a smaller batch size (by default set to 64 records per batch). Considering that your model is already smaller than the maximum 250 MB，你可能要先考虑后一种方案。

Google Cloud MLEngine 预测错误 429

Google Cloud MLEngine Prediction Error 429

object-detection

tensorflow

google-cloud-ml