Tensorflow 对象检测 API：如何禁用从检查点加载

Question

我创建了 MobileNetV2 特征提取器架构的自定义变体，方法是将 tensorflow/models 存储库的 research/slim/nets/mobilenet/mobilenet_v2.py 中的 expansion_size 从 6 更改为 4。

我希望能够使用 model_main.py 脚本训练 SSD + Mobilenet_v2（有此更改）模型，如对象检测 API 的 running_locally tutorial 中所述.

当我这样做时，我看到以下错误，这是有道理的：

`InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

解决这个问题：

我从 pipeline.config 中删除了 finetune_checkpoint 规范。
我在 object_detection/model_hparams.py 中将 load_pretrained=True 更改为 load_pretrained=False。
我将 --hparams_overrides='load_pretrained=false' 添加为 model_main.py 的命令行输入参数。

尽管如此，我仍然看到同样的错误。

为什么 tensorflow 仍在尝试恢复检查点。我怎样才能让它不这样做？

Answer 1

自己找到了解决方案。事实证明，即使我已经安排它不从我的管道配置文件中恢复检查点，但结果是内部 tf.Estimator 对象仍然可以使用指定的 model_dir 中的检查点；尽管 model_dir 的主要用途是作为输出目录，输出检查点被写入其中。

我在 official documentation for tf.Estimator 中找到了此信息。以下是相关摘录供参考：

`model_dir: Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into an estimator to continue training a previously saved model. If PathLike object, the path will be resolved. If None, the model_dir in config will be used if set. If both are set, they must be same. If both are None, a temporary directory will be used.

我原来的 model_dir 中有一个旧检查点，它在架构上与我的自定义模型不兼容。因此我看到了错误。为了解决它，我只是将我的 model_dir 更改为指向一个新的空目录并且它起作用了。我希望能帮助遇到类似问题的人。

Tensorflow 对象检测 API：如何禁用从检查点加载

Tensorflow object detection API: How to disable loading from checkpoint

python

tensorflow

object-detection-api