使用计时器测试 Erlang 函数性能
Testing Erlang function performance with timer
我正在使用 timer:tc/3
:
在紧密循环(比如 5000 次迭代)中测试函数的性能
{Duration_us, _Result} = timer:tc(M, F, [A])
此 returns 函数的持续时间(以微秒为单位)和结果。为了争论的缘故,持续时间是 N 微秒。
然后我对迭代结果执行简单的平均计算。
如果我在 timer:tc/3
调用 之前放置一个 timer:sleep(1)
函数调用,则所有迭代的平均持续时间总是 > 没有睡眠的平均值:
timer:sleep(1),
timer:tc(M, F, [A]).
这对我来说意义不大,因为 timer:tc/3
函数应该是原子的,不关心它之前发生的任何事情。
谁能解释一下这个奇怪的功能?它与调度和减少有某种关系吗?
衡量性能是一项复杂的任务,尤其是在新硬件和现代 OS 中。有很多东西可以 fiddle 影响你的结果。首先,你并不孤单。当您在台式机或笔记本电脑上测量时,可能会有其他进程会干扰您的测量,包括系统进程。第二件事,是硬件本身。现代 CPUs 有许多很酷的功能,可以控制性能和功耗。它们可以在过热之前短时间内提高性能,当同一芯片上的其他 CPU 或同一 CPU 上的其他超线程没有工作时,它们可以提高性能。另一方面,当没有足够的工作并且 CPU 对突然的变化反应不够快时,他们可以进入省电模式。很难判断这是否是您的情况,但是对于以前的工作或缺少它不会影响您的测量是天真的。您应该始终注意在稳定状态下测量足够长的时间(至少数秒),并尽可能多地移除可能影响测量的其他因素。 (也不要忘记 Erlang 中的 GC。)
你的意思是这样的吗:
4> foo:foo(10000).
其中:
-module(foo).
-export([foo/1, baz/1]).
foo(N) -> TL = bar(N), {TL,sum(TL)/N} .
bar(0) -> [];
bar(N) ->
timer:sleep(1),
{D,_} = timer:tc(?MODULE, baz, [1000]),
[D|bar(N-1)]
.
baz(0) -> ok;
baz(N) -> baz(N-1).
sum([]) -> 0;
sum([H|T]) -> H + sum(T).
我试过了,很有趣。使用 sleep 语句,timer:tc/3 返回的平均时间为 19 到 22 微秒,并且在注释掉 sleep 的情况下,平均值下降到 4 到 6 微秒。相当戏剧化!
我注意到计时中存在人工制品,因此像这样的事件(这些数字是计时器返回的各个微秒计时:tc/3)并不少见:
---- snip ----
5,5,5,6,5,5,5,6,5,5,5,6,5,5,5,5,4,5,5,5,5,5,4,5,5,5,5,6,5,5,
5,6,5,5,5,5,5,6,5,5,5,5,5,6,5,5,5,6,5,5,5,5,5,5,5,5,5,5,4,5,
5,5,5,6,5,5,5,6,5,5,7,8,7,8,5,6,5,5,5,6,5,5,5,5,4,5,5,5,5,
14,4,5,5,4,5,5,4,5,4,5,5,5,4,5,5,4,5,5,4,5,4,5,5,5,4,5,5,4,
5,5,4,5,4,5,5,4,4,5,5,4,5,5,4,4,4,4,4,5,4,5,5,4,5,5,5,4,5,5,
4,5,5,4,5,4,5,5,5,4,5,5,4,5,5,4,5,4,5,4,5,4,5,5,4,4,4,4,5,4,
5,5,54,22,26,21,22,22,24,24,32,31,36,31,33,27,25,21,22,21,
24,21,22,22,24,21,22,21,24,21,22,22,24,21,22,21,24,21,22,21,
23,27,22,21,24,21,22,21,24,22,22,21,23,22,22,21,24,22,22,21,
24,21,22,22,24,22,22,21,24,22,22,22,24,22,22,22,24,22,22,22,
24,22,22,22,24,22,22,21,24,22,22,21,24,21,22,22,24,22,22,21,
24,21,23,21,24,22,23,21,24,21,22,22,24,21,22,22,24,21,22,22,
24,22,23,21,24,21,23,21,23,21,21,21,23,21,25,22,24,21,22,21,
24,21,22,21,24,22,21,24,22,22,21,24,22,23,21,23,21,22,21,23,
21,22,21,23,21,23,21,24,22,22,22,24,22,22,41,36,30,33,30,35,
21,23,21,25,21,23,21,24,22,22,21,23,21,22,21,24,22,22,22,24,
22,22,21,24,22,22,22,24,22,22,21,24,22,22,21,24,22,22,21,24,
22,22,21,24,21,22,22,27,22,23,21,23,21,21,21,23,21,21,21,24,
21,22,21,24,21,22,22,24,22,22,22,24,21,22,22,24,21,22,21,24,
21,23,21,23,21,22,21,23,21,23,22,24,22,22,21,24,21,22,22,24,
21,23,21,24,21,22,22,24,21,22,22,24,21,22,21,24,21,22,22,24,
22,22,22,24,22,22,21,24,22,21,21,24,21,22,22,24,21,22,22,24,
24,23,21,24,21,22,24,21,22,21,23,21,22,21,24,21,22,21,32,31,
32,21,25,21,22,22,24,46,5,5,5,5,5,4,5,5,5,5,6,5,5,5,5,5,5,4,
6,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,4,5,4,5,5,5,5,6,5,5,5,5,5,
5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,4,6,4,6,5,5,5,5,5,5,4,6,5,5,5,
5,4,5,5,5,5,5,5,6,5,5,5,5,4,5,5,5,5,5,5,6,5,5,5,5,5,5,5,6,5,
5,5,5,4,5,5,6,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,6,5,5,5,5,5,5,5,
6,5,5,5,5,4,5,4,5,5,5,5,6,5,5,5,5,5,5,4,5,4,5,5,5,5,5,6,5,5,
5,5,4,5,4,5,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,
---- snip ----
我假设这就是你所指的效果,尽管当你说 always > N 时,是 always,还是只是大多?反正不总是适合我。
以上结果摘录是没有睡眠的。通常在使用睡眠定时器时:tc/3 returns 低时间,如 4 或 5 大部分时间没有睡眠,但有时大时间如 22,并且睡眠到位通常是大时间,如 22,偶尔会有低潮期。
为什么会发生这种情况当然并不明显,因为睡眠实际上只是意味着屈服。我想知道这一切是否都归结于 CPU 缓存。毕竟,尤其是在不忙的机器上,人们可能希望没有睡眠的情况下一次性执行大部分代码,而不会将其移动到另一个核心,而无需对核心做太多其他事情,从而最大限度地利用超出缓存...但是当您睡觉并因此屈服并稍后返回时,缓存命中的机会可能会大大减少。
我正在使用 timer:tc/3
:
{Duration_us, _Result} = timer:tc(M, F, [A])
此 returns 函数的持续时间(以微秒为单位)和结果。为了争论的缘故,持续时间是 N 微秒。
然后我对迭代结果执行简单的平均计算。
如果我在 timer:tc/3
调用 之前放置一个 timer:sleep(1)
函数调用,则所有迭代的平均持续时间总是 > 没有睡眠的平均值:
timer:sleep(1),
timer:tc(M, F, [A]).
这对我来说意义不大,因为 timer:tc/3
函数应该是原子的,不关心它之前发生的任何事情。
谁能解释一下这个奇怪的功能?它与调度和减少有某种关系吗?
衡量性能是一项复杂的任务,尤其是在新硬件和现代 OS 中。有很多东西可以 fiddle 影响你的结果。首先,你并不孤单。当您在台式机或笔记本电脑上测量时,可能会有其他进程会干扰您的测量,包括系统进程。第二件事,是硬件本身。现代 CPUs 有许多很酷的功能,可以控制性能和功耗。它们可以在过热之前短时间内提高性能,当同一芯片上的其他 CPU 或同一 CPU 上的其他超线程没有工作时,它们可以提高性能。另一方面,当没有足够的工作并且 CPU 对突然的变化反应不够快时,他们可以进入省电模式。很难判断这是否是您的情况,但是对于以前的工作或缺少它不会影响您的测量是天真的。您应该始终注意在稳定状态下测量足够长的时间(至少数秒),并尽可能多地移除可能影响测量的其他因素。 (也不要忘记 Erlang 中的 GC。)
你的意思是这样的吗:
4> foo:foo(10000).
其中:
-module(foo).
-export([foo/1, baz/1]).
foo(N) -> TL = bar(N), {TL,sum(TL)/N} .
bar(0) -> [];
bar(N) ->
timer:sleep(1),
{D,_} = timer:tc(?MODULE, baz, [1000]),
[D|bar(N-1)]
.
baz(0) -> ok;
baz(N) -> baz(N-1).
sum([]) -> 0;
sum([H|T]) -> H + sum(T).
我试过了,很有趣。使用 sleep 语句,timer:tc/3 返回的平均时间为 19 到 22 微秒,并且在注释掉 sleep 的情况下,平均值下降到 4 到 6 微秒。相当戏剧化!
我注意到计时中存在人工制品,因此像这样的事件(这些数字是计时器返回的各个微秒计时:tc/3)并不少见:
---- snip ----
5,5,5,6,5,5,5,6,5,5,5,6,5,5,5,5,4,5,5,5,5,5,4,5,5,5,5,6,5,5,
5,6,5,5,5,5,5,6,5,5,5,5,5,6,5,5,5,6,5,5,5,5,5,5,5,5,5,5,4,5,
5,5,5,6,5,5,5,6,5,5,7,8,7,8,5,6,5,5,5,6,5,5,5,5,4,5,5,5,5,
14,4,5,5,4,5,5,4,5,4,5,5,5,4,5,5,4,5,5,4,5,4,5,5,5,4,5,5,4,
5,5,4,5,4,5,5,4,4,5,5,4,5,5,4,4,4,4,4,5,4,5,5,4,5,5,5,4,5,5,
4,5,5,4,5,4,5,5,5,4,5,5,4,5,5,4,5,4,5,4,5,4,5,5,4,4,4,4,5,4,
5,5,54,22,26,21,22,22,24,24,32,31,36,31,33,27,25,21,22,21,
24,21,22,22,24,21,22,21,24,21,22,22,24,21,22,21,24,21,22,21,
23,27,22,21,24,21,22,21,24,22,22,21,23,22,22,21,24,22,22,21,
24,21,22,22,24,22,22,21,24,22,22,22,24,22,22,22,24,22,22,22,
24,22,22,22,24,22,22,21,24,22,22,21,24,21,22,22,24,22,22,21,
24,21,23,21,24,22,23,21,24,21,22,22,24,21,22,22,24,21,22,22,
24,22,23,21,24,21,23,21,23,21,21,21,23,21,25,22,24,21,22,21,
24,21,22,21,24,22,21,24,22,22,21,24,22,23,21,23,21,22,21,23,
21,22,21,23,21,23,21,24,22,22,22,24,22,22,41,36,30,33,30,35,
21,23,21,25,21,23,21,24,22,22,21,23,21,22,21,24,22,22,22,24,
22,22,21,24,22,22,22,24,22,22,21,24,22,22,21,24,22,22,21,24,
22,22,21,24,21,22,22,27,22,23,21,23,21,21,21,23,21,21,21,24,
21,22,21,24,21,22,22,24,22,22,22,24,21,22,22,24,21,22,21,24,
21,23,21,23,21,22,21,23,21,23,22,24,22,22,21,24,21,22,22,24,
21,23,21,24,21,22,22,24,21,22,22,24,21,22,21,24,21,22,22,24,
22,22,22,24,22,22,21,24,22,21,21,24,21,22,22,24,21,22,22,24,
24,23,21,24,21,22,24,21,22,21,23,21,22,21,24,21,22,21,32,31,
32,21,25,21,22,22,24,46,5,5,5,5,5,4,5,5,5,5,6,5,5,5,5,5,5,4,
6,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,4,5,4,5,5,5,5,6,5,5,5,5,5,
5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,4,6,4,6,5,5,5,5,5,5,4,6,5,5,5,
5,4,5,5,5,5,5,5,6,5,5,5,5,4,5,5,5,5,5,5,6,5,5,5,5,5,5,5,6,5,
5,5,5,4,5,5,6,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,6,5,5,5,5,5,5,5,
6,5,5,5,5,4,5,4,5,5,5,5,6,5,5,5,5,5,5,4,5,4,5,5,5,5,5,6,5,5,
5,5,4,5,4,5,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,5,5,5,6,5,5,5,5,
---- snip ----
我假设这就是你所指的效果,尽管当你说 always > N 时,是 always,还是只是大多?反正不总是适合我。
以上结果摘录是没有睡眠的。通常在使用睡眠定时器时:tc/3 returns 低时间,如 4 或 5 大部分时间没有睡眠,但有时大时间如 22,并且睡眠到位通常是大时间,如 22,偶尔会有低潮期。
为什么会发生这种情况当然并不明显,因为睡眠实际上只是意味着屈服。我想知道这一切是否都归结于 CPU 缓存。毕竟,尤其是在不忙的机器上,人们可能希望没有睡眠的情况下一次性执行大部分代码,而不会将其移动到另一个核心,而无需对核心做太多其他事情,从而最大限度地利用超出缓存...但是当您睡觉并因此屈服并稍后返回时,缓存命中的机会可能会大大减少。