TFLite 模型在 GPU 上溢出，在 CPU 上正常。内部有什么区别？

Question

在 Android 上，我有一个模型在 CPU 上运行良好，但在切换到 GPU 委托时会溢出（结果为 'Infinity'）。如果我重新缩放输入，我可以消除溢出，所以这似乎是 CPU 和 GPU 之间不同内部 range/precision 的问题。我的印象是 CPU 和 GPU 默认都使用 32 位浮点数，所以结果应该是相同的。有谁知道 TFLite 的内部结构足以提供一些见解？

Answer 1

关于浮点精度，Android上的TFLite GPU delegate可以运行两种模式，您可以在Options class中通过以下方法选择. （从 here 复制）

    /**
     * Sets whether precision loss is allowed.
     *
     * @param precisionLossAllowed When `true` (default), the GPU may quantify tensors, downcast
     *     values, process in FP16. When `false`, computations are carried out in 32-bit floating
     *     point.
     */
    public Options setPrecisionLossAllowed(boolean precisionLossAllowed) {
      this.precisionLossAllowed = precisionLossAllowed;
      return this;
    }

由于此 precisionLossAllowed 选项的默认值为 true，您的模型在默认情况下使用 GPU 时将运行在 FP16 模式下。如果你想像 CPU 那样在 FP32 模式下强制运行ning，你应该在创建委托时明确地将此选项设置为 false。

GpuDelegate.Options gpuOptions = new GpuDelegate.Options();
gpuOptions.setPrecisionLossAllowed(false);
GpuDelegate gpuDelegate = new GpuDelegate(gpuOptions);

Interpreter.Options interpreterOptions = new Interpreter.Options();
interpreterOptions.addDelegate(gpuDelegate);
Interpreter interpreter = new Interpreter(tflite_model_file, interpreterOptions);

这应该会为您提供与 CPU 模式相同的结果，但与 FP16 模式相比执行速度较慢。

TFLite 模型在 GPU 上溢出，在 CPU 上正常。内部有什么区别？

TFLite model overflows on GPU, ok on CPU. What are the differences internally?

precision

gpu

tensorflow-lite