Keras 预测时间不一致

Question

我试图估算我的 keras 模型的预测时间，结果发现了一些奇怪的事情。除了正常情况下相当快之外，每隔一段时间模型需要很长时间才能做出预测。不仅如此，模型运行的时间越长，这些时间也会增加。我添加了一个最小的工作示例来重现错误。

import time
import numpy as np
from sklearn.datasets import make_classification
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Make a dummy classification problem
X, y = make_classification()

# Make a dummy model
model = Sequential()
model.add(Dense(10, activation='relu',name='input',input_shape=(X.shape[1],)))
model.add(Dense(2, activation='softmax',name='predictions'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X, y, verbose=0, batch_size=20, epochs=100)

for i in range(1000):
    # Pick a random sample
    sample = np.expand_dims(X[np.random.randint(99), :], axis=0)
    # Record the prediction time 10x and then take the average
    start = time.time()
    for j in range(10):
        y_pred = model.predict_classes(sample)
    end = time.time()
    print('%d, %0.7f' % (i, (end-start)/10))

时间与样本无关（随机抽取）。如果重复测试，for 循环中预测时间较长的索引将再次（几乎）相同。

我正在使用：

tensorflow 2.0.0
python 3.7.4

对于我的应用程序，我需要保证在一定时间内执行。然而，考虑到这种行为，这是不可能的。出了什么问题？是Keras的bug还是tensorflow后端的bug？

编辑： predict_on_batch 显示相同的行为，但是更稀疏：

y_pred = model(sample, training=False).numpy() 也显示了一些严重的异常值，但是它们并没有增加。

编辑 2：我降级到最新的tensorflow 1版本（1.15）。不仅问题不再存在，"normal" 预测时间也显着改善！我不认为这两个尖峰有问题，因为当我重复测试时它们没有出现（至少不是在相同的指数和线性增加）并且百分比没有第一个图中那么大。

因此我们可以得出结论，这似乎是 tensorflow 2.0 固有的问题，它在其他情况下表现出与@OverLordGoldDragon 提到的类似行为。

Answer 1

在我遇到的几个实例中，TF2 通常表现出糟糕且类似错误的内存管理 - 简要说明 and here. With prediction in particular, the most performant feeding method is via model(x) directly - see 及其相关讨论。

简而言之：model(x) 通过其 __call__ 方法（它继承自 base_layer.Layer), whereas predict(), predict_classes(), etc. involve a dedicated loop function via _select_training_loop()；每个使用不同的数据预处理和 post 处理方法）适用于不同的用例，2.1 中的 model(x) 专门设计用于产生最快的小模型/小批量（可能是任何尺寸）性能（并且在 2.0 中仍然是最快的）。

从链接的讨论中引用 TensorFlow dev：

You can predict the output using model call, not model predict, i.e., calling model(x) would make this much faster because there are no "conversion to dataset" part, and also it's directly calling a cached tf.function.

注意：这在 2.1 中应该不是什么问题，尤其是 2.2 - 但无论如何都要测试每种方法。我也意识到这并不能直接回答你关于时间峰值的问题；我怀疑它与 Eager 缓存机制有关，但最可靠的确定方法是通过 TF Profiler, which is broken in 2.1.

更新：关于增加峰值，可能是GPU节流；你已经完成了 ~1000 次迭代，尝试 10,000 次迭代 - 最终，增加应该停止。正如您在评论中指出的那样，model(x) 不会发生这种情况；这是有道理的，因为涉及的 GPU 步骤少了 ("conversion to dataset")。

Update2：如果您遇到这个问题，您可能会向开发人员 here 提出问题；主要是我在那里唱歌

Answer 2

虽然我无法解释执行时间的不一致，但我可以建议您尝试将模型转换为 TensorFlow Lite 以加快对单个数据记录或小批量的预测。

我运行这个模型的基准：

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(384, activation='elu', input_shape=(256,)),
    tf.keras.layers.Dense(384, activation='elu'),
    tf.keras.layers.Dense(256, activation='elu'),
    tf.keras.layers.Dense(128, activation='elu'),
    tf.keras.layers.Dense(32, activation='tanh')
])

单个记录的预测时间为：

model.predict(input): 18 毫秒
model(input): 1.3ms
转换为 TensorFlow Lite 的模型：43us

转换模型的时间为 2 秒。

下面的class展示了模型的转换和使用方法，并提供了和Keras模型一样的predict方法。请注意，需要对其进行修改以用于不仅仅具有单个一维输入和单个一维输出的模型。

class LiteModel:

    @classmethod
    def from_file(cls, model_path):
        return LiteModel(tf.lite.Interpreter(model_path=model_path))

    @classmethod
    def from_keras_model(cls, kmodel):
        converter = tf.lite.TFLiteConverter.from_keras_model(kmodel)
        tflite_model = converter.convert()
        return LiteModel(tf.lite.Interpreter(model_content=tflite_model))

    def __init__(self, interpreter):
        self.interpreter = interpreter
        self.interpreter.allocate_tensors()
        input_det = self.interpreter.get_input_details()[0]
        output_det = self.interpreter.get_output_details()[0]
        self.input_index = input_det["index"]
        self.output_index = output_det["index"]
        self.input_shape = input_det["shape"]
        self.output_shape = output_det["shape"]
        self.input_dtype = input_det["dtype"]
        self.output_dtype = output_det["dtype"]

    def predict(self, inp):
        inp = inp.astype(self.input_dtype)
        count = inp.shape[0]
        out = np.zeros((count, self.output_shape[1]), dtype=self.output_dtype)
        for i in range(count):
            self.interpreter.set_tensor(self.input_index, inp[i:i+1])
            self.interpreter.invoke()
            out[i] = self.interpreter.get_tensor(self.output_index)[0]
        return out

    def predict_single(self, inp):
        """ Like predict(), but only for a single record. The input data can be a Python list. """
        inp = np.array([inp], dtype=self.input_dtype)
        self.interpreter.set_tensor(self.input_index, inp)
        self.interpreter.invoke()
        out = self.interpreter.get_tensor(self.output_index)
        return out[0]

完整的基准测试代码和图表可以在这里找到： https://medium.com/@micwurm/using-tensorflow-lite-to-speed-up-predictions-a3954886eb98

Keras 预测时间不一致

Keras inconsistent prediction time

python

performance

keras

tensorflow

tensorflow2.0