为什么按位运算符比 multiplication/division/modulo 慢?

Why are bitwise operators slower than multiplication/division/modulo?

众所周知,乘法、整数除法和二的幂模可以更有效地重写为按位运算:

>>> x = randint(50000, 100000)
>>> x << 2 == x * 4
True
>>> x >> 2 == x // 4
True
>>> x & 3 == x % 4
True

在C/C++和Java等编译语言中,测试表明按位运算通常比算术运算快。 (参见 here and here)。然而,当我在 Python 中测试这些时,我得到了相反的结果:

In [1]: from random import randint
   ...: nums = [randint(0, 1000000) for _ in range(100000)]

In [2]: %timeit [i * 8 for i in nums]
7.73 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: %timeit [i << 3 for i in nums]
8.22 ms ± 368 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit [i // 8 for i in nums]
7.05 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit [i >> 3 for i in nums]
7.55 ms ± 367 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit [i % 8 for i in nums]
5.96 ms ± 503 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit [i & 7 for i in nums]
8.29 ms ± 816 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

如您所见,按位运算比对应的算术运算要慢,尤其是对于模运算。我对另一组数字重复了这个测试,得到了相同的结果。是否有一个原因?如果重要的话,这些测试在 CPython 3.6.7 中。

*, %, and / all have fast paths for single-"limb" integers. <<, >>, and & 不会。他们正在通过通用的任意精度代码路径。

我测试了大数,按位运算符更快。

python -m timeit '[i for i in range(10**64, 10**64+1000) if i & 0b10==0]'
1000 loops, best of 3: 238 usec per loop

python -m timeit '[i for i in range(10**64, 10**64+1000) if i % 2==0]'
1000 loops, best of 3: 303 usec per loop