我可以使用 TensorFlow 测量单个操作的执行时间吗？

Question

我知道我可以测量调用 sess.run() 的执行时间，但是否有可能获得更精细的粒度并测量单个操作的执行时间？

Answer 1

public 版本中尚无执行此操作的方法。我们知道这是一个重要的功能，我们正在努力。

Answer 2

为了更新这个答案，我们确实有一些 CPU 侧重于推理的分析功能。如果您查看 https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/benchmark，您会看到一个程序，您可以运行在模型上获取每个操作的时间。

Answer 3

我已经使用 Timeline object 获取图中每个节点的执行时间：

您使用经典 sess.run() 但还指定了可选参数 options 和 run_metadata
然后您使用 run_metadata.step_stats 数据创建一个 Timeline 对象

下面是一个衡量矩阵乘法性能的示例程序：

import tensorflow as tf
from tensorflow.python.client import timeline

x = tf.random_normal([1000, 1000])
y = tf.random_normal([1000, 1000])
res = tf.matmul(x, y)

# Run the graph with full trace option
with tf.Session() as sess:
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    sess.run(res, options=run_options, run_metadata=run_metadata)

    # Create the Timeline object, and write it to a json
    tl = timeline.Timeline(run_metadata.step_stats)
    ctf = tl.generate_chrome_trace_format()
    with open('timeline.json', 'w') as f:
        f.write(ctf)

然后您可以打开 Google Chrome，转到页面 chrome://tracing 并加载 timeline.json 文件。您应该会看到如下内容：

Answer 4

您可以使用 runtime statistics 提取此信息。您将需要做这样的事情（查看上述 link 中的完整示例）：

run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(<values_you_want_to_execute>, options=run_options, run_metadata=run_metadata)
your_writer.add_run_metadata(run_metadata, 'step%d' % i)

比打印出来更好，你可以在 tensorboard 中看到它：

Additionally, clicking on a node will display the exact total memory, compute time, and tensor output sizes.

Answer 5

对于 fat-lobyte 在 Olivier Moindrot 的回答下的评论，如果您想收集所有会话的时间线，您可以将“open('timeline.json', 'w')”更改为“open('timeline.json', 'a')”。

Answer 6

因为在谷歌上搜索 "Tensorflow Profiling" 时这是一个很高的值，请注意当前（2017 年底，TensorFlow 1.4）获取时间线的方式是使用 ProfilerHook。这适用于 tf.Estimator 中的 MonitoredSessions，其中 tf.RunOptions 不可用。

estimator = tf.estimator.Estimator(model_fn=...)
hook = tf.train.ProfilerHook(save_steps=10, output_dir='.')
estimator.train(input_fn=..., steps=..., hooks=[hook])

Answer 7

Uber最近发布的SBNet自定义op库(http://www.github.com/uber/sbnet)有一个基于cuda事件定时器的实现，可以通过以下方式使用：

with tf.control_dependencies([input1, input2]):
    dt0 = sbnet_module.cuda_timer_start()
with tf.control_dependencies([dt0]):
    input1 = tf.identity(input1)
    input2 = tf.identity(input2)

### portion of subgraph to time goes in here

with tf.control_dependencies([result1, result2, dt0]):
    cuda_time = sbnet_module.cuda_timer_end(dt0)
with tf.control_dependencies([cuda_time]):
    result1 = tf.identity(result1)
    result2 = tf.identity(result2)

py_result1, py_result2, dt = session.run([result1, result2, cuda_time])
print "Milliseconds elapsed=", dt

请注意，子图的任何部分都可以是异步的，您在为计时器操作指定所有输入和输出依赖项时应该非常小心。否则，计时器可能会乱序插入到图表中，您可能会得到错误的时间。我发现时间线和 time.time() 时间安排对于分析 Tensorflow 图的实用性非常有限。另请注意，cuda_timer API 将在默认流上同步，这是目前的设计，因为 TF 使用多个流。

话虽如此，我个人建议切换到 PyTorch :) 开发迭代更快，代码运行s 更快，一切都不那么痛苦。

另一种从 tf.Session 中减去开销（可能很大）的方法有点古怪和神秘，是将图形复制 N 次，然后运行将其用于变量 N，求解方程未知的固定开销。 IE。你会用 N1=10 和 N2=20 来测量 session.run() 并且你知道你的时间是 t 并且开销是 x。所以像

N1*x+t = t1
N2*x+t = t2

求解 x 和 t。缺点是这可能需要大量内存并且不一定准确:)还要确保您的输入完全 different/random/independent 否则 TF 将折叠整个子图而不是运行它 N 次......有TensorFlow 的乐趣 :)

Answer 8

从 Tensorflow 1.8 开始，有一个非常好的使用 tf.profile.Profiler here.

的例子

Answer 9

2.0兼容的答案：可以在Keras Callback.

中使用Profiling

代码是：

log_dir="logs/profile/" + datetime.now().strftime("%Y%m%d-%H%M%S")

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch = 3)

model.fit(train_data,
          steps_per_epoch=20,
          epochs=5, 
          callbacks=[tensorboard_callback])

有关如何分析的更多详细信息，请参阅此 Tensorboard Link。

Answer 10

这适用于 Tensorflow 2（使用 TF 2.5 和 2.8 测试）：

import tensorflow as tf

tf.profiler.experimental.start(r'/path/to/logdir')
with tf.profiler.experimental.Trace("My cool model", _r=1):
    run_model_that_you_want_to_profile()
tf.profiler.experimental.stop()

然后就可以在Tensorboard中看到trace了（tensorboard --logdir /path/to/logdir，然后在浏览器中打开http://localhost:6006/#profile）。

也可能有用：

指南：Optimize TensorFlow performance using the Profiler
tf.summary.trace_on()（自己没试过）
This colab tutorial 关于使用 Tensorboard 分析器

我可以使用 TensorFlow 测量单个操作的执行时间吗？

Can I measure the execution time of individual operations with TensorFlow?

tensorflow