IPython 的 %timeit 魔法的 -n 和 -r 参数

Question

我想在 Jupyter notebook 中使用 timeit 魔术命令为代码块计时。根据文档，timeit 有几个参数。两个特别控制循环次数和重复次数。我不清楚这两个论点之间的区别。例如

import numpy
N = 1000000
v = numpy.arange(N)

%timeit -n 10 -r 500 pass; w = v + v

将运行 10 次循环和 500 次重复。我的问题是，

这可以理解为下列的？（实际计时结果有明显差异）

import time
n = 10
r = 500
T = numpy.empty(r)
for j in range(r):
    t0 = time.time()
    for i in range(n):
        w = v + v
    T[j] = (time.time() - t0)/n

print('Best time is {:.4f} ms'.format(max(T)*1000))

我正在做的一个假设很可能是不正确的，即内循环的时间是通过此循环的 n 次迭代的平均时间。然后采用该循环重复 500 次中的最佳值。

我已经搜索了文档，但没有找到任何具体说明这是做什么的。例如，文档 here 是

Options: -n: execute the given statement times in a loop. If this value is not given, a fitting value is chosen.

-r: repeat the loop iteration times and take the best result. Default: 3

关于内部循环的计时方式并没有真正说明。最后的结果是“最好”的是什么？

我要计时的代码不涉及任何随机性，所以我想知道是否应该将此内部循环设置为n=1。然后，r 重复将处理任何系统可变性。

Answer 1

看起来 %timeit 的最新版本正在取 r n 循环平均值的平均值，而不是最佳平均值。

显然，这与 Python 的早期版本有所不同。 r平均的最佳时间仍然可以通过TimeResultsreturn参数得到，但不再是显示的值。

评论：我最近运行上面的这段代码，发现以下语法不再有效：

n = 1
r = 50
tr = %timeit -n $n -r $r -q -o pass; compute_mean(x,np)

也就是说，不再可能（似乎）使用$var将变量传递给timeit魔术命令。这是否意味着这个魔法命令应该被淘汰并替换为 timeit 模块？

我正在使用 Python 3.7.4.

Answer 2

number 和 repeat 是不同的参数，因为它们有不同的用途。 number 控制每次计时执行多少次，它用于获取代表性计时。 repeat 参数控制完成的计时次数，它的用途是获得准确的统计数据。 IPython 使用 mean 或 average 计算所有重复语句的运行-时间，然后将其除以编号编号。所以它衡量的是平均值的平均值。在早期版本中，它使用所有 repeats 的最短时间 (min()) 并将其除以 number 并将其报告为“最佳".

要理解为什么有两个参数来控制 number 和 repeats 你必须明白你在计时什么以及如何你可以测量时间。

时钟的粒度和执行的数量

一台计算机有不同的“时钟”来测量时间。这些时钟有不同的“滴答声”（取决于 OS）。例如，它可以测量秒、毫秒或纳秒 - 这些刻度称为时钟的粒度。

如果执行持续时间小于或大致等于时钟的粒度，则无法获得有代表性的计时。假设您的操作需要 100ns（=0.0000001 秒），但时钟仅测量毫秒（=0.001 秒），那么大多数测量将测量 0 毫秒，少数测量将测量 1 毫秒——这取决于时钟周期中执行开始的位置和完成的。这并不能真正代表您想要计时的持续时间。

这是在 Windows 上，其中 time.time 的粒度为 1 毫秒：

import time

def fast_function():
    return None

r = []
for _ in range(10000):
    start = time.time()
    fast_function()
    r.append(time.time() - start)

import matplotlib.pyplot as plt
plt.title('measuring time of no-op-function with time.time')
plt.ylabel('number of measurements')
plt.xlabel('measured time [s]')
plt.yscale('log')
plt.hist(r, bins='auto')
plt.tight_layout()

这显示了此示例中测量时间的直方图。几乎所有测量都是 0 毫秒，三个测量是 1 毫秒：

在 Windows 上有粒度低得多的时钟，这只是为了说明粒度的影响，每个时钟都有一定的粒度，即使它小于一毫秒。

为了克服粒度的限制，可以增加执行次数，因此预期的持续时间明显高于时钟的粒度。因此，一旦执行运行 number 次，而不是运行ning 执行。从上面获取数字并使用 100 000 的 number，预期的运行-time 将是 =0.01 秒。因此，忽略其他所有因素，时钟现在几乎在所有情况下都为 10 毫秒，这与预期的执行时间非常相似。

简而言之，指定 number 测量 sum of number 次执行。您需要再次将这种方式测量的时间除以数字以获得“每次执行时间”。

其他进程和重复执行

您的 OS 通常有很多活动进程，其中一些可以运行并行（不同的处理器或使用超线程），但大多数运行按顺序运行每个进程的 OS 调度时间到运行上 CPU。大多数时钟不关心当前运行s 是什么进程，因此测量的时间会根据调度计划而有所不同。还有一些时钟不是测量系统时间而是测量进程时间。然而，它们测量 Python 进程的完整时间，有时会包括垃圾收集或其他 Python 线程 - 此外 Python 进程不是无状态的，并非每个操作都是无状态的总是完全相同，并且还有内存 allocations/re-allocations/clears 发生（有时在幕后），这些内存操作时间可能会因很多原因而有所不同。

我再次使用直方图测量在我的计算机上对一万个求和所需的时间（仅使用 repeat 并设置 number到 1):

import timeit
r = timeit.repeat('sum(1 for _ in range(10000))', number=1, repeat=1_000)

import matplotlib.pyplot as plt
plt.title('measuring summation of 10_000 1s')
plt.ylabel('number of measurements')
plt.xlabel('measured time [s]')
plt.yscale('log')
plt.hist(r, bins='auto')
plt.tight_layout()

此直方图显示在略低于 ~5 毫秒处的急剧截止，这表明这是可以执行操作的“最佳”时间。如果条件不是最佳的或其他 processes/threads 花费了一些时间：

，则更高的时间是测量值

避免这些波动的典型方法是经常重复计时次数，然后使用统计数据来获得最准确的数字。哪个统计数据取决于您要衡量的内容。我将在下面详细介绍。

同时使用 number 和 repeat

本质上 %timeit 是 timeit.repeat 的包装，大致相当于：

import timeit

timer = timeit.default_timer()

results = []
for _ in range(repeat):
    start = timer()
    for _ in range(number):
        function_or_statement_to_time
    results.append(timer() - start)

但是 %timeit 与 timeit.repeat 相比有一些方便的功能。例如，它根据 repeat 和 number[=154] 获得的时间计算 one 执行的最佳和平均时间=].

这些大致是这样计算的：

import statistics best = min(results) / number average = statistics.mean(results) / number

您还可以使用 TimeitResult（如果您使用 -o 选项则返回）来检查所有结果：

>>> r = %timeit -o ... 7.46 ns ± 0.0788 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each) >>> r.loops # the "number" is called "loops" on the result 100000000 >>> r.repeat 7 >>> r.all_runs [0.7445439999999905, 0.7611092000000212, 0.7249667000000102, 0.7238135999999997, 0.7385598000000186, 0.7338551999999936, 0.7277425999999991] >>> r.best 7.238135999999997e-09 >>> r.average 7.363701571428618e-09 >>> min(r.all_runs) / r.loops # calculated best by hand 7.238135999999997e-09 >>> from statistics import mean >>> mean(r.all_runs) / r.loops # calculated average by hand 7.363701571428619e-09

关于 number 和 repeat
值的一般建议
如果你想修改 number 或 repeat 那么你应该设置 number 到没有运行ning 进入计时器粒度的可能的最小值。根据我的经验，应该设置 number 以便 number 函数的执行至少需要 10 微秒（0.00001 秒），否则你可能只会“时间” “计时器”的最小分辨率。

repeat应该设置得尽可能高。重复次数越多，您就越有可能找到真正的最佳或平均水平。然而，更多的重复将花费更长的时间，因此也需要权衡。

IPython 调整 number 但保持 repeat 不变。我经常做相反的事情：我调整 number 以便语句的 number 执行需要 ~10us 然后我调整重复我得到了很好的统计数据表示（通常在 100-10000 范围内）。但您的里程可能会有所不同。

哪个统计数据最好？

timeit.repeat 的文档提到了这一点：

Note

It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.

例如，通常想要找出算法的速度有多快，然后可以使用这些重复中的最小值。如果对时间的平均值或中值更感兴趣，可以使用这些测量值。在大多数情况下，最感兴趣的数字是最小值，因为最小值类似于执行的速度——最小值可能是进程中断最少的一次执行（被其他进程、GC 或最多最佳内存操作）。

为了说明差异，我再次重复了上述计时，但这次我包括了最小值、平均值和中值：

import timeit r = timeit.repeat('sum(1 for _ in range(10000))', number=1, repeat=1_000) import numpy as np import matplotlib.pyplot as plt plt.title('measuring summation of 10_000 1s') plt.ylabel('number of measurements') plt.xlabel('measured time [s]') plt.yscale('log') plt.hist(r, bins='auto', color='black', label='measurements') plt.tight_layout() plt.axvline(np.min(r), c='lime', label='min') plt.axvline(np.mean(r), c='red', label='mean') plt.axvline(np.median(r), c='blue', label='median') plt.legend()

与此“建议”相反（参见上面引用的文档）IPythons %timeit 报告平均值而不是 min()。但是，默认情况下，他们也只使用 7 的 repeat - 我认为这太少而无法准确确定 minimum - 所以在此使用平均值case 实际上是 sensible.It 进行“快速而肮脏”计时的好工具。

如果您需要可以根据您的需要进行自定义的东西，可以直接使用 timeit.repeat 或者甚至是第 3 方模块。例如：

pyperf

perfplot

simple_benchmark（我自己的图书馆）

IPython 的 %timeit 魔法的 -n 和 -r 参数

-n and -r arguments to IPython's %timeit magic

python

ipython

jupyter

时钟的粒度和执行的数量

其他进程和重复执行

同时使用 number 和 repeat

关于 number 和 repeat

哪个统计数据最好？

Note