为什么 "if not (a and b)" 比 "if not a or not b" 快?
Why is "if not (a and b)" faster than "if not a or not b"?
最近心血来潮,用timeit
测试了这两种方法,看看哪种评估方法更快:
import timeit
"""Test method returns True if either argument is falsey, else False."""
def and_chk((a, b)):
if not (a and b):
return True
return False
def not_or_chk((a, b)):
if not a or not b:
return True
return False
...并得到这些结果:
VALUES FOR a,b -> 0,0 0,1 1,0 1,1
method
and_chk(a,b) 0.95559 0.98646 0.95138 0.98788
not_or_chk(a,b) 0.96804 1.07323 0.96015 1.05874
...seconds per 1,111,111 cycles.
效率的差异在 1% 到 9% 之间,总是有利于 if not (a and b)
,这与我的预期相反,因为我知道 if not a or not b
将评估其条款(if not a
然后 if not b
) 按顺序,运行 一旦 if
遇到真表达式(并且没有 and
子句)就会阻塞。相比之下,and_chk
方法需要评估 both 子句,然后才能 return 任何结果到包装它的 if not..
。
然而,计时结果反驳了这种理解。那么,如何评估 if
条件?我完全清楚这样一个事实,即这种程度的微优化实际上(如果不是完全的话)毫无意义。我只是想了解 Python 是怎么回事。
为了完成,我是这样设置的timeit
...
cyc = 1111111
bothFalse_and = iter([(0,0)] * cyc)
zeroTrue_and = iter([(1,0)] * cyc)
oneTrue_and = iter([(0,1)] * cyc)
bothTrue_and = iter([(1,1)] * cyc)
bothFalse_notor = iter([(0,0)] * cyc)
zeroTrue_notor = iter([(1,0)] * cyc)
oneTrue_notor = iter([(0,1)] * cyc)
bothTrue_notor = iter([(1,1)] * cyc)
time_bothFalse_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothFalse_and as tups, and_chk')
time_zeroTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import zeroTrue_and as tups, and_chk')
time_oneTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import oneTrue_and as tups, and_chk')
time_bothTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothTrue_and as tups, and_chk')
time_bothFalse_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothFalse_notor as tups, not_or_chk')
time_zeroTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import zeroTrue_notor as tups, not_or_chk')
time_oneTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import oneTrue_notor as tups, not_or_chk')
time_bothTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothTrue_notor as tups, not_or_chk')
...然后 运行 每个 timeit.Timer(..)
函数与 .timeit(cyc)
一起获取结果。
TL;DR
not_or_chk
函数需要加中的两个一元运算进行两次跳转(最坏情况下),而and_chk
函数只有两个跳跃(同样,在最坏的情况下)。
详情
dis module 来救援! dis
模块让您可以查看代码的 Python 字节码反汇编。例如:
import dis
"""Test method returns True if either argument is falsey, else False."""
def and_chk((a, b)):
if not (a and b):
return True
return False
def not_or_chk((a, b)):
if not a or not b:
return True
return False
print("And Check:\n")
print(dis.dis(and_chk))
print("Or Check:\n")
print(dis.dis(not_or_chk))
产生这个输出:
And Check:
5 0 LOAD_FAST 0 (.0)
3 UNPACK_SEQUENCE 2
6 STORE_FAST 1 (a)
9 STORE_FAST 2 (b)
6 12 LOAD_FAST 1 (a) * This block is the *
15 JUMP_IF_FALSE_OR_POP 21 * disassembly of *
18 LOAD_FAST 2 (b) * the "and_chk" *
>> 21 POP_JUMP_IF_TRUE 28 * function *
7 24 LOAD_GLOBAL 0 (True)
27 RETURN_VALUE
8 >> 28 LOAD_GLOBAL 1 (False)
31 RETURN_VALUE
None
Or Check:
10 0 LOAD_FAST 0 (.0)
3 UNPACK_SEQUENCE 2
6 STORE_FAST 1 (a)
9 STORE_FAST 2 (b)
11 12 LOAD_FAST 1 (a) * This block is the *
15 UNARY_NOT * disassembly of *
16 POP_JUMP_IF_TRUE 26 * the "not_or_chk" *
19 LOAD_FAST 2 (b) * function *
22 UNARY_NOT
23 POP_JUMP_IF_FALSE 30
12 >> 26 LOAD_GLOBAL 0 (True)
29 RETURN_VALUE
13 >> 30 LOAD_GLOBAL 1 (False)
33 RETURN_VALUE
None
看看我用星号标记的 Python 字节码的两个块。这些块是您的两个反汇编函数。注意and_chk
只有两次跳转,函数中的计算是决定是否跳转。
另一方面,not_or_chk
函数在最坏的情况下需要执行两次not
操作,另外解释器决定是否跳槽。
我刚刚通过你的 Meta SO 问题注意到了这个问题:Is it appropriate to share the results of my research toward solving my own minor questions?. As you mention in that question (and one of the tags on this question indicates), this sort of thing falls into the category of micro-optimization. Ideally, one shouldn't need to worry about such minor performance differences, and as Knuth says, premature optimization is the root of all evil。但我想调查这些问题很有趣也很有启发性,因为它可以让你更好地了解 Python 是如何工作的 "under the hood".
提示我查看 if
-less 版本的函数的时间差异是什么。恕我直言,结果很有趣。我也借此机会简化了您的测试程序。
#!/usr/bin/env python
''' Do timeit tests on various implementations of NAND
NAND returns True if either argument is falsey, else False.
From
Written by PM 2Ring 2015.04.09
'''
from timeit import Timer
import dis
def and_chk(a, b):
return not (a and b)
def and_chk_if(a, b):
if not (a and b):
return True
else:
return False
def not_or_chk(a, b):
return not a or not b
def not_or_chk_if(a, b):
if not a or not b:
return True
else:
return False
#All the functions
funcs = (
and_chk,
and_chk_if,
not_or_chk,
not_or_chk_if,
)
#Argument tuples to test the functions with
bools = (0, 1)
bool_tups = [(u, v) for u in bools for v in bools]
def show_dis():
''' Show the disassembly for each function '''
print 'Disassembly'
for func in funcs:
fname = func.func_name
print '\n%s' % fname
dis.dis(func)
print
def verify():
''' Verify that the functions actually perform as intended '''
print 'Verifying...'
for func in funcs:
fname = func.func_name
print '\n%s' % fname
for args in bool_tups:
print args, func(*args)
print
def time_test(loops, reps):
''' Print timing stats for all the functions '''
print 'Timing tests\nLoops = %d, Repetitions = %d' % (loops, reps)
for func in funcs:
fname = func.func_name
print '\n%s' % fname
setup = 'from __main__ import %s' % fname
for args in bool_tups:
t = Timer('%s%s' % (fname, args), setup)
r = t.repeat(reps, loops)
r.sort()
print args, r
show_dis()
verify()
time_test(loops=520000, reps=3)
输出
Disassembly
and_chk
13 0 LOAD_FAST 0 (a)
3 JUMP_IF_FALSE 4 (to 10)
6 POP_TOP
7 LOAD_FAST 1 (b)
>> 10 UNARY_NOT
11 RETURN_VALUE
and_chk_if
16 0 LOAD_FAST 0 (a)
3 JUMP_IF_FALSE 4 (to 10)
6 POP_TOP
7 LOAD_FAST 1 (b)
>> 10 JUMP_IF_TRUE 5 (to 18)
13 POP_TOP
17 14 LOAD_GLOBAL 0 (True)
17 RETURN_VALUE
>> 18 POP_TOP
19 19 LOAD_GLOBAL 1 (False)
22 RETURN_VALUE
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
not_or_chk
22 0 LOAD_FAST 0 (a)
3 UNARY_NOT
4 JUMP_IF_TRUE 5 (to 12)
7 POP_TOP
8 LOAD_FAST 1 (b)
11 UNARY_NOT
>> 12 RETURN_VALUE
not_or_chk_if
25 0 LOAD_FAST 0 (a)
3 UNARY_NOT
4 JUMP_IF_TRUE 8 (to 15)
7 POP_TOP
8 LOAD_FAST 1 (b)
11 UNARY_NOT
12 JUMP_IF_FALSE 5 (to 20)
>> 15 POP_TOP
26 16 LOAD_GLOBAL 0 (True)
19 RETURN_VALUE
>> 20 POP_TOP
28 21 LOAD_GLOBAL 1 (False)
24 RETURN_VALUE
25 LOAD_CONST 0 (None)
28 RETURN_VALUE
Verifying...
and_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
and_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
not_or_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
not_or_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
Timing tests
Loops = 520000, Repetitions = 3
and_chk
(0, 0) [0.36773014068603516, 0.37793493270874023, 0.38387489318847656]
(0, 1) [0.36292791366577148, 0.37119913101196289, 0.37146902084350586]
(1, 0) [0.38673520088195801, 0.39340090751647949, 0.39670205116271973]
(1, 1) [0.38938498497009277, 0.39505791664123535, 0.40034103393554688]
and_chk_if
(0, 0) [0.4021449089050293, 0.40345501899719238, 0.41558098793029785]
(0, 1) [0.40686416625976562, 0.41213202476501465, 0.44800615310668945]
(1, 0) [0.4281308650970459, 0.42916202545166016, 0.43681907653808594]
(1, 1) [0.46246123313903809, 0.46759700775146484, 0.47544980049133301]
not_or_chk
(0, 0) [0.35435700416564941, 0.36368083953857422, 0.36867713928222656]
(0, 1) [0.35602092742919922, 0.35732793807983398, 0.36237406730651855]
(1, 0) [0.39550518989562988, 0.40660715103149414, 0.40977287292480469]
(1, 1) [0.4060060977935791, 0.4112389087677002, 0.41296815872192383]
not_or_chk_if
(0, 0) [0.4308779239654541, 0.44109201431274414, 0.45827698707580566]
(0, 1) [0.43600606918334961, 0.4370419979095459, 0.47623395919799805]
(1, 0) [0.48452401161193848, 0.48769593238830566, 0.49147915840148926]
(1, 1) [0.53107500076293945, 0.54048299789428711, 0.55434417724609375]
这些计时是在 2GHz Pentium 4(单核 32 位)运行 Mepis 11(Debian 系列 Linux 发行版)上使用 Python 2.6.6 执行的。
您会注意到我避免使用您的 next(tups)
策略来获取每个函数调用的参数,而是直接将参数作为常量传递。执行 next(tups)
所花费的时间应该 相当恒定,但最好在可行时消除此类开销,以便报告的时间测量值更准确地反映代码的性能我们真的感兴趣。
另外,通常是多次重复计时循环,取最小值; FWIW,我通常做 3 到 5 次。来自 timeit docs:
Note
It’s tempting to calculate mean and standard deviation from the result
vector and report these. However, this is not very useful. In a
typical case, the lowest value gives a lower bound for how fast your
machine can run the given code snippet; higher values in the result
vector are typically not caused by variability in Python’s speed, but
by other processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be interested in.
After that, you should look at the entire vector and apply common
sense rather than statistics.
你的 Meta post 说你想执行和报告其他微优化实验,所以你可能有兴趣看一下我 post 几个月前编辑的一些代码对 函数的各种实现进行时间测试。
最近心血来潮,用timeit
测试了这两种方法,看看哪种评估方法更快:
import timeit
"""Test method returns True if either argument is falsey, else False."""
def and_chk((a, b)):
if not (a and b):
return True
return False
def not_or_chk((a, b)):
if not a or not b:
return True
return False
...并得到这些结果:
VALUES FOR a,b -> 0,0 0,1 1,0 1,1
method
and_chk(a,b) 0.95559 0.98646 0.95138 0.98788
not_or_chk(a,b) 0.96804 1.07323 0.96015 1.05874
...seconds per 1,111,111 cycles.
效率的差异在 1% 到 9% 之间,总是有利于 if not (a and b)
,这与我的预期相反,因为我知道 if not a or not b
将评估其条款(if not a
然后 if not b
) 按顺序,运行 一旦 if
遇到真表达式(并且没有 and
子句)就会阻塞。相比之下,and_chk
方法需要评估 both 子句,然后才能 return 任何结果到包装它的 if not..
。
然而,计时结果反驳了这种理解。那么,如何评估 if
条件?我完全清楚这样一个事实,即这种程度的微优化实际上(如果不是完全的话)毫无意义。我只是想了解 Python 是怎么回事。
为了完成,我是这样设置的timeit
...
cyc = 1111111
bothFalse_and = iter([(0,0)] * cyc)
zeroTrue_and = iter([(1,0)] * cyc)
oneTrue_and = iter([(0,1)] * cyc)
bothTrue_and = iter([(1,1)] * cyc)
bothFalse_notor = iter([(0,0)] * cyc)
zeroTrue_notor = iter([(1,0)] * cyc)
oneTrue_notor = iter([(0,1)] * cyc)
bothTrue_notor = iter([(1,1)] * cyc)
time_bothFalse_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothFalse_and as tups, and_chk')
time_zeroTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import zeroTrue_and as tups, and_chk')
time_oneTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import oneTrue_and as tups, and_chk')
time_bothTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothTrue_and as tups, and_chk')
time_bothFalse_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothFalse_notor as tups, not_or_chk')
time_zeroTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import zeroTrue_notor as tups, not_or_chk')
time_oneTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import oneTrue_notor as tups, not_or_chk')
time_bothTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothTrue_notor as tups, not_or_chk')
...然后 运行 每个 timeit.Timer(..)
函数与 .timeit(cyc)
一起获取结果。
TL;DR
not_or_chk
函数需要加中的两个一元运算进行两次跳转(最坏情况下),而and_chk
函数只有两个跳跃(同样,在最坏的情况下)。
详情
dis module 来救援! dis
模块让您可以查看代码的 Python 字节码反汇编。例如:
import dis
"""Test method returns True if either argument is falsey, else False."""
def and_chk((a, b)):
if not (a and b):
return True
return False
def not_or_chk((a, b)):
if not a or not b:
return True
return False
print("And Check:\n")
print(dis.dis(and_chk))
print("Or Check:\n")
print(dis.dis(not_or_chk))
产生这个输出:
And Check:
5 0 LOAD_FAST 0 (.0)
3 UNPACK_SEQUENCE 2
6 STORE_FAST 1 (a)
9 STORE_FAST 2 (b)
6 12 LOAD_FAST 1 (a) * This block is the *
15 JUMP_IF_FALSE_OR_POP 21 * disassembly of *
18 LOAD_FAST 2 (b) * the "and_chk" *
>> 21 POP_JUMP_IF_TRUE 28 * function *
7 24 LOAD_GLOBAL 0 (True)
27 RETURN_VALUE
8 >> 28 LOAD_GLOBAL 1 (False)
31 RETURN_VALUE
None
Or Check:
10 0 LOAD_FAST 0 (.0)
3 UNPACK_SEQUENCE 2
6 STORE_FAST 1 (a)
9 STORE_FAST 2 (b)
11 12 LOAD_FAST 1 (a) * This block is the *
15 UNARY_NOT * disassembly of *
16 POP_JUMP_IF_TRUE 26 * the "not_or_chk" *
19 LOAD_FAST 2 (b) * function *
22 UNARY_NOT
23 POP_JUMP_IF_FALSE 30
12 >> 26 LOAD_GLOBAL 0 (True)
29 RETURN_VALUE
13 >> 30 LOAD_GLOBAL 1 (False)
33 RETURN_VALUE
None
看看我用星号标记的 Python 字节码的两个块。这些块是您的两个反汇编函数。注意and_chk
只有两次跳转,函数中的计算是决定是否跳转。
另一方面,not_or_chk
函数在最坏的情况下需要执行两次not
操作,另外解释器决定是否跳槽。
我刚刚通过你的 Meta SO 问题注意到了这个问题:Is it appropriate to share the results of my research toward solving my own minor questions?. As you mention in that question (and one of the tags on this question indicates), this sort of thing falls into the category of micro-optimization. Ideally, one shouldn't need to worry about such minor performance differences, and as Knuth says, premature optimization is the root of all evil。但我想调查这些问题很有趣也很有启发性,因为它可以让你更好地了解 Python 是如何工作的 "under the hood".
if
-less 版本的函数的时间差异是什么。恕我直言,结果很有趣。我也借此机会简化了您的测试程序。
#!/usr/bin/env python
''' Do timeit tests on various implementations of NAND
NAND returns True if either argument is falsey, else False.
From
Written by PM 2Ring 2015.04.09
'''
from timeit import Timer
import dis
def and_chk(a, b):
return not (a and b)
def and_chk_if(a, b):
if not (a and b):
return True
else:
return False
def not_or_chk(a, b):
return not a or not b
def not_or_chk_if(a, b):
if not a or not b:
return True
else:
return False
#All the functions
funcs = (
and_chk,
and_chk_if,
not_or_chk,
not_or_chk_if,
)
#Argument tuples to test the functions with
bools = (0, 1)
bool_tups = [(u, v) for u in bools for v in bools]
def show_dis():
''' Show the disassembly for each function '''
print 'Disassembly'
for func in funcs:
fname = func.func_name
print '\n%s' % fname
dis.dis(func)
print
def verify():
''' Verify that the functions actually perform as intended '''
print 'Verifying...'
for func in funcs:
fname = func.func_name
print '\n%s' % fname
for args in bool_tups:
print args, func(*args)
print
def time_test(loops, reps):
''' Print timing stats for all the functions '''
print 'Timing tests\nLoops = %d, Repetitions = %d' % (loops, reps)
for func in funcs:
fname = func.func_name
print '\n%s' % fname
setup = 'from __main__ import %s' % fname
for args in bool_tups:
t = Timer('%s%s' % (fname, args), setup)
r = t.repeat(reps, loops)
r.sort()
print args, r
show_dis()
verify()
time_test(loops=520000, reps=3)
输出
Disassembly
and_chk
13 0 LOAD_FAST 0 (a)
3 JUMP_IF_FALSE 4 (to 10)
6 POP_TOP
7 LOAD_FAST 1 (b)
>> 10 UNARY_NOT
11 RETURN_VALUE
and_chk_if
16 0 LOAD_FAST 0 (a)
3 JUMP_IF_FALSE 4 (to 10)
6 POP_TOP
7 LOAD_FAST 1 (b)
>> 10 JUMP_IF_TRUE 5 (to 18)
13 POP_TOP
17 14 LOAD_GLOBAL 0 (True)
17 RETURN_VALUE
>> 18 POP_TOP
19 19 LOAD_GLOBAL 1 (False)
22 RETURN_VALUE
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
not_or_chk
22 0 LOAD_FAST 0 (a)
3 UNARY_NOT
4 JUMP_IF_TRUE 5 (to 12)
7 POP_TOP
8 LOAD_FAST 1 (b)
11 UNARY_NOT
>> 12 RETURN_VALUE
not_or_chk_if
25 0 LOAD_FAST 0 (a)
3 UNARY_NOT
4 JUMP_IF_TRUE 8 (to 15)
7 POP_TOP
8 LOAD_FAST 1 (b)
11 UNARY_NOT
12 JUMP_IF_FALSE 5 (to 20)
>> 15 POP_TOP
26 16 LOAD_GLOBAL 0 (True)
19 RETURN_VALUE
>> 20 POP_TOP
28 21 LOAD_GLOBAL 1 (False)
24 RETURN_VALUE
25 LOAD_CONST 0 (None)
28 RETURN_VALUE
Verifying...
and_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
and_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
not_or_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
not_or_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False
Timing tests
Loops = 520000, Repetitions = 3
and_chk
(0, 0) [0.36773014068603516, 0.37793493270874023, 0.38387489318847656]
(0, 1) [0.36292791366577148, 0.37119913101196289, 0.37146902084350586]
(1, 0) [0.38673520088195801, 0.39340090751647949, 0.39670205116271973]
(1, 1) [0.38938498497009277, 0.39505791664123535, 0.40034103393554688]
and_chk_if
(0, 0) [0.4021449089050293, 0.40345501899719238, 0.41558098793029785]
(0, 1) [0.40686416625976562, 0.41213202476501465, 0.44800615310668945]
(1, 0) [0.4281308650970459, 0.42916202545166016, 0.43681907653808594]
(1, 1) [0.46246123313903809, 0.46759700775146484, 0.47544980049133301]
not_or_chk
(0, 0) [0.35435700416564941, 0.36368083953857422, 0.36867713928222656]
(0, 1) [0.35602092742919922, 0.35732793807983398, 0.36237406730651855]
(1, 0) [0.39550518989562988, 0.40660715103149414, 0.40977287292480469]
(1, 1) [0.4060060977935791, 0.4112389087677002, 0.41296815872192383]
not_or_chk_if
(0, 0) [0.4308779239654541, 0.44109201431274414, 0.45827698707580566]
(0, 1) [0.43600606918334961, 0.4370419979095459, 0.47623395919799805]
(1, 0) [0.48452401161193848, 0.48769593238830566, 0.49147915840148926]
(1, 1) [0.53107500076293945, 0.54048299789428711, 0.55434417724609375]
这些计时是在 2GHz Pentium 4(单核 32 位)运行 Mepis 11(Debian 系列 Linux 发行版)上使用 Python 2.6.6 执行的。
您会注意到我避免使用您的 next(tups)
策略来获取每个函数调用的参数,而是直接将参数作为常量传递。执行 next(tups)
所花费的时间应该 相当恒定,但最好在可行时消除此类开销,以便报告的时间测量值更准确地反映代码的性能我们真的感兴趣。
另外,通常是多次重复计时循环,取最小值; FWIW,我通常做 3 到 5 次。来自 timeit docs:
Note
It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.
你的 Meta post 说你想执行和报告其他微优化实验,所以你可能有兴趣看一下我 post 几个月前编辑的一些代码对