为什么 "if not (a and b)" 比 "if not a or not b" 快?

Why is "if not (a and b)" faster than "if not a or not b"?

最近心血来潮,用timeit测试了这两种方法,看看哪种评估方法更快:

import timeit

"""Test method returns True if either argument is falsey, else False."""

def and_chk((a, b)):
    if not (a and b):
        return True
    return False

def not_or_chk((a, b)):
    if not a or not b:
        return True
    return False

...并得到这些结果:

 VALUES FOR a,b ->      0,0         0,1         1,0         1,1
        method
    and_chk(a,b)    0.95559     0.98646     0.95138     0.98788
 not_or_chk(a,b)    0.96804     1.07323     0.96015     1.05874
                                            ...seconds per 1,111,111 cycles.

效率的差异在 1% 到 9% 之间,总是有利于 if not (a and b),这与我的预期相反,因为我知道 if not a or not b 将评估其条款(if not a 然后 if not b) 按顺序,运行 一旦 if 遇到真表达式(并且没有 and 子句)就会阻塞。相比之下,and_chk 方法需要评估 both 子句,然后才能 return 任何结果到包装它的 if not..

然而,计时结果反驳了这种理解。那么,如何评估 if 条件?我完全清楚这样一个事实,即这种程度的微优化实际上(如果不是完全的话)毫无意义。我只是想了解 Python 是怎么回事。


为了完成,我是这样设置的timeit...

cyc = 1111111

bothFalse_and = iter([(0,0)] * cyc)
zeroTrue_and = iter([(1,0)] * cyc)
oneTrue_and = iter([(0,1)] * cyc)
bothTrue_and = iter([(1,1)] * cyc)

bothFalse_notor = iter([(0,0)] * cyc)
zeroTrue_notor = iter([(1,0)] * cyc)
oneTrue_notor = iter([(0,1)] * cyc)
bothTrue_notor = iter([(1,1)] * cyc)

time_bothFalse_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothFalse_and as tups, and_chk')
time_zeroTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import zeroTrue_and as tups, and_chk')
time_oneTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import oneTrue_and as tups, and_chk')
time_bothTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothTrue_and as tups, and_chk')

time_bothFalse_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothFalse_notor as tups, not_or_chk')
time_zeroTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import zeroTrue_notor as tups, not_or_chk')
time_oneTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import oneTrue_notor as tups, not_or_chk')
time_bothTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothTrue_notor as tups, not_or_chk')

...然后 运行 每个 timeit.Timer(..) 函数与 .timeit(cyc) 一起获取结果。

TL;DR

not_or_chk函数需要中的两个一元运算进行两次跳转(最坏情况下),而and_chk函数只有两个跳跃(同样,在最坏的情况下)。

详情

dis module 来救援! dis 模块让您可以查看代码的 Python 字节码反汇编。例如:

import dis

"""Test method returns True if either argument is falsey, else False."""

def and_chk((a, b)):
    if not (a and b):
        return True
    return False

def not_or_chk((a, b)):
    if not a or not b:
        return True
    return False

print("And Check:\n")
print(dis.dis(and_chk))

print("Or Check:\n")
print(dis.dis(not_or_chk))

产生这个输出:

And Check:

  5           0 LOAD_FAST                0 (.0)
              3 UNPACK_SEQUENCE          2
              6 STORE_FAST               1 (a)
              9 STORE_FAST               2 (b)

  6          12 LOAD_FAST                1 (a)    * This block is the *
             15 JUMP_IF_FALSE_OR_POP    21        * disassembly of    *
             18 LOAD_FAST                2 (b)    * the "and_chk"     *
        >>   21 POP_JUMP_IF_TRUE        28        * function          *

  7          24 LOAD_GLOBAL              0 (True)
             27 RETURN_VALUE

  8     >>   28 LOAD_GLOBAL              1 (False)
             31 RETURN_VALUE
None
Or Check:

 10           0 LOAD_FAST                0 (.0)
              3 UNPACK_SEQUENCE          2
              6 STORE_FAST               1 (a)
              9 STORE_FAST               2 (b)

 11          12 LOAD_FAST                1 (a)    * This block is the *
             15 UNARY_NOT                         * disassembly of    *
             16 POP_JUMP_IF_TRUE        26        * the "not_or_chk"  *
             19 LOAD_FAST                2 (b)    * function          *
             22 UNARY_NOT
             23 POP_JUMP_IF_FALSE       30

 12     >>   26 LOAD_GLOBAL              0 (True)
             29 RETURN_VALUE

 13     >>   30 LOAD_GLOBAL              1 (False)
             33 RETURN_VALUE
None

看看我用星号标记的 Python 字节码的两个块。这些块是您的两个反汇编函数。注意and_chk只有两次跳转,函数中的计算是决定是否跳转

另一方面,not_or_chk函数在最坏的情况下需要执行两次not操作,另外解释器决定是否跳槽。

我刚刚通过你的 Meta SO 问题注意到了这个问题:Is it appropriate to share the results of my research toward solving my own minor questions?. As you mention in that question (and one of the tags on this question indicates), this sort of thing falls into the category of micro-optimization. Ideally, one shouldn't need to worry about such minor performance differences, and as Knuth says, premature optimization is the root of all evil。但我想调查这些问题很有趣也很有启发性,因为它可以让你更好地了解 Python 是如何工作的 "under the hood".

提示我查看 if-less 版本的函数的时间差异是什么。恕我直言,结果很有趣。我也借此机会简化了您的测试程序。

#!/usr/bin/env python

''' Do timeit tests on various implementations of NAND

    NAND returns True if either argument is falsey, else False.
    From 
    Written by PM 2Ring 2015.04.09
'''

from timeit import Timer
import dis

def and_chk(a, b):
    return not (a and b)

def and_chk_if(a, b):
    if not (a and b):
        return True
    else:
        return False

def not_or_chk(a, b):
    return not a or not b

def not_or_chk_if(a, b):
    if not a or not b:
        return True
    else:
        return False


#All the functions
funcs = (
    and_chk,
    and_chk_if,
    not_or_chk,
    not_or_chk_if,
)

#Argument tuples to test the functions with
bools = (0, 1)
bool_tups = [(u, v) for u in bools for v in bools]


def show_dis():
    ''' Show the disassembly for each function '''
    print 'Disassembly'
    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        dis.dis(func)
    print


def verify():
    ''' Verify that the functions actually perform as intended '''
    print 'Verifying...'
    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        for args in bool_tups:
            print args, func(*args)
    print


def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    print 'Timing tests\nLoops = %d, Repetitions = %d' % (loops, reps)

    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        setup = 'from __main__ import %s' % fname
        for args in bool_tups:
            t = Timer('%s%s' % (fname, args), setup)
            r = t.repeat(reps, loops)
            r.sort()
            print args, r


show_dis()
verify()
time_test(loops=520000, reps=3)

输出

Disassembly

and_chk
 13           0 LOAD_FAST                0 (a)
              3 JUMP_IF_FALSE            4 (to 10)
              6 POP_TOP             
              7 LOAD_FAST                1 (b)
        >>   10 UNARY_NOT           
             11 RETURN_VALUE        

and_chk_if
 16           0 LOAD_FAST                0 (a)
              3 JUMP_IF_FALSE            4 (to 10)
              6 POP_TOP             
              7 LOAD_FAST                1 (b)
        >>   10 JUMP_IF_TRUE             5 (to 18)
             13 POP_TOP             

 17          14 LOAD_GLOBAL              0 (True)
             17 RETURN_VALUE        
        >>   18 POP_TOP             

 19          19 LOAD_GLOBAL              1 (False)
             22 RETURN_VALUE        
             23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        

not_or_chk
 22           0 LOAD_FAST                0 (a)
              3 UNARY_NOT           
              4 JUMP_IF_TRUE             5 (to 12)
              7 POP_TOP             
              8 LOAD_FAST                1 (b)
             11 UNARY_NOT           
        >>   12 RETURN_VALUE        

not_or_chk_if
 25           0 LOAD_FAST                0 (a)
              3 UNARY_NOT           
              4 JUMP_IF_TRUE             8 (to 15)
              7 POP_TOP             
              8 LOAD_FAST                1 (b)
             11 UNARY_NOT           
             12 JUMP_IF_FALSE            5 (to 20)
        >>   15 POP_TOP             

 26          16 LOAD_GLOBAL              0 (True)
             19 RETURN_VALUE        
        >>   20 POP_TOP             

 28          21 LOAD_GLOBAL              1 (False)
             24 RETURN_VALUE        
             25 LOAD_CONST               0 (None)
             28 RETURN_VALUE        

Verifying...

and_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

and_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

not_or_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

not_or_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

Timing tests
Loops = 520000, Repetitions = 3

and_chk
(0, 0) [0.36773014068603516, 0.37793493270874023, 0.38387489318847656]
(0, 1) [0.36292791366577148, 0.37119913101196289, 0.37146902084350586]
(1, 0) [0.38673520088195801, 0.39340090751647949, 0.39670205116271973]
(1, 1) [0.38938498497009277, 0.39505791664123535, 0.40034103393554688]

and_chk_if
(0, 0) [0.4021449089050293, 0.40345501899719238, 0.41558098793029785]
(0, 1) [0.40686416625976562, 0.41213202476501465, 0.44800615310668945]
(1, 0) [0.4281308650970459, 0.42916202545166016, 0.43681907653808594]
(1, 1) [0.46246123313903809, 0.46759700775146484, 0.47544980049133301]

not_or_chk
(0, 0) [0.35435700416564941, 0.36368083953857422, 0.36867713928222656]
(0, 1) [0.35602092742919922, 0.35732793807983398, 0.36237406730651855]
(1, 0) [0.39550518989562988, 0.40660715103149414, 0.40977287292480469]
(1, 1) [0.4060060977935791, 0.4112389087677002, 0.41296815872192383]

not_or_chk_if
(0, 0) [0.4308779239654541, 0.44109201431274414, 0.45827698707580566]
(0, 1) [0.43600606918334961, 0.4370419979095459, 0.47623395919799805]
(1, 0) [0.48452401161193848, 0.48769593238830566, 0.49147915840148926]
(1, 1) [0.53107500076293945, 0.54048299789428711, 0.55434417724609375]

这些计时是在 2GHz Pentium 4(单核 32 位)运行 Mepis 11(Debian 系列 Linux 发行版)上使用 Python 2.6.6 执行的。

您会注意到我避免使用您的 next(tups) 策略来获取每个函数调用的参数,而是直接将参数作为常量传递。执行 next(tups) 所花费的时间应该 相当恒定,但最好在可行时消除此类开销,以便报告的时间测量值更准确地反映代码的性能我们真的感兴趣。

另外,通常是多次重复计时循环,取最小值; FWIW,我通常做 3 到 5 次。来自 timeit docs:

Note

It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.

你的 Meta post 说你想执行和报告其他微优化实验,所以你可能有兴趣看一下我 post 几个月前编辑的一些代码对 函数的各种实现进行时间测试。