Python: 求除数的表现

Python: performance of finding divisors

对这些函数进行基准测试:

def divisors_optimized(number):
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            yield divisor
            yield number / divisor

    if square_root ** 2 == number:
        yield square_root

def number_of_divisors_optimized(number):
    count = 0
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            count += 2

    if square_root ** 2 == number:
        count += 1

    return count

可以看出两者的基本结构是一样的。

基准代码:

number = 9999999
for i in range(10):
    print(f"iteration {i}:")
    start = time.time()
    result = list(utils.divisors_optimized(number))
    end = time.time()
    print(f'len(divisors_optimized) took {end - start} seconds and found {len(result)} divisors.')

    start = time.time()
    result = utils.number_of_divisors_optimized(number)
    end = time.time()
    print(f'number_of_divisors_optimized took {end - start} seconds and found {result} divisors.')

    print()

输出:

iteration 0:
len(divisors_optimized) took 0.00019598007202148438 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001919269561767578 seconds and found 12 divisors.

iteration 1:
len(divisors_optimized) took 0.00019121170043945312 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020599365234375 seconds and found 12 divisors.

iteration 2:
len(divisors_optimized) took 0.000179290771484375 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00019049644470214844 seconds and found 12 divisors.

iteration 3:
len(divisors_optimized) took 0.00019025802612304688 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.

iteration 4:
len(divisors_optimized) took 0.0001785755157470703 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.

iteration 5:
len(divisors_optimized) took 0.00022721290588378906 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.

iteration 6:
len(divisors_optimized) took 0.0001919269561767578 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00018930435180664062 seconds and found 12 divisors.

iteration 7:
len(divisors_optimized) took 0.00017881393432617188 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.

iteration 8:
len(divisors_optimized) took 0.00017976760864257812 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001785755157470703 seconds and found 12 divisors.

iteration 9:
len(divisors_optimized) took 0.00024819374084472656 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020766258239746094 seconds and found 12 divisors.

您可以看到执行时间非常接近,每次都有利。

有人可以向我解释一下,为什么从生成器创建列表并检索其长度与迭代时计数一样快?我的意思是,内存分配 (list()) 不应该比分配昂贵得多吗?

我正在使用 Python 3.6.3 .

您测试的东西远远 多于您生产的东西。与完成的总工作量相比,int 与 "found factor" 案例的生成器操作成本 list 相形见绌。你正在执行超过 3000 个试验分区; 12 yields 与 12 次加法对这类工作来说是小菜一碟。将 addition/yields 替换为 pass(什么都不做),你会发现它 still 运行 在(大致)相同数量的时间:

def ignore_divisors_optimized(number):
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            pass

    if square_root ** 2 == number:
        pass

并使用 ipython%timeit 魔法进行微基准测试:

>>> %timeit -r5 number_of_divisors_optimized(9999999)
266 µs ± 1.85 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 list(divisors_optimized(9999999))
267 µs ± 1.29 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 ignore_divisors_optimized(9999999)
267 µs ± 1.43 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)

number_of_divisors 快一微秒这一事实无关紧要(重复测试的抖动高于一微秒);它们的速度基本相同,因为 >99% 的工作是循环和测试,而不是测试通过后所做的工作。

这是 90/10 优化规则的一个例子:大约 90% 的时间花在了 10% 的代码上(在这种情况下,就是试用版本身); 10% 用于其他 90% 的代码。你只优化了 90% 的代码中的一小部分 运行s 10% 的时间,这没有帮助,因为绝大多数时间都花在了 if number % divisor == 0:线。如果您删除该测试以支持只循环 range 什么都不做,运行 时间在我的本地微基准测试中下降到 ~78 µs,这意味着该测试占用了 [=40 的近 200 µs =]时间,是所有其余代码加在一起所需时间的两倍多。

如果你想对此进行优化,你需要考虑加快试验分割线本身的方法(这基本上意味着不同的 Python 解释器或使用 Cython 将其编译为 C),或 运行 减少该行次数的方法(例如,预先计算可能达到某个界限的素因数,因此对于任何给定的输入,您都可以避免测试非素因数,然后 producing/computing 非素因数的数量来自已知质因数及其重数的质因数)。