Python: 求除数的表现
Python: performance of finding divisors
对这些函数进行基准测试:
def divisors_optimized(number):
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
yield divisor
yield number / divisor
if square_root ** 2 == number:
yield square_root
def number_of_divisors_optimized(number):
count = 0
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
count += 2
if square_root ** 2 == number:
count += 1
return count
可以看出两者的基本结构是一样的。
基准代码:
number = 9999999
for i in range(10):
print(f"iteration {i}:")
start = time.time()
result = list(utils.divisors_optimized(number))
end = time.time()
print(f'len(divisors_optimized) took {end - start} seconds and found {len(result)} divisors.')
start = time.time()
result = utils.number_of_divisors_optimized(number)
end = time.time()
print(f'number_of_divisors_optimized took {end - start} seconds and found {result} divisors.')
print()
输出:
iteration 0:
len(divisors_optimized) took 0.00019598007202148438 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001919269561767578 seconds and found 12 divisors.
iteration 1:
len(divisors_optimized) took 0.00019121170043945312 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020599365234375 seconds and found 12 divisors.
iteration 2:
len(divisors_optimized) took 0.000179290771484375 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00019049644470214844 seconds and found 12 divisors.
iteration 3:
len(divisors_optimized) took 0.00019025802612304688 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.
iteration 4:
len(divisors_optimized) took 0.0001785755157470703 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.
iteration 5:
len(divisors_optimized) took 0.00022721290588378906 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.
iteration 6:
len(divisors_optimized) took 0.0001919269561767578 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00018930435180664062 seconds and found 12 divisors.
iteration 7:
len(divisors_optimized) took 0.00017881393432617188 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.
iteration 8:
len(divisors_optimized) took 0.00017976760864257812 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001785755157470703 seconds and found 12 divisors.
iteration 9:
len(divisors_optimized) took 0.00024819374084472656 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020766258239746094 seconds and found 12 divisors.
您可以看到执行时间非常接近,每次都有利。
有人可以向我解释一下,为什么从生成器创建列表并检索其长度与迭代时计数一样快?我的意思是,内存分配 (list()
) 不应该比分配昂贵得多吗?
我正在使用 Python 3.6.3 .
您测试的东西远远 多于您生产的东西。与完成的总工作量相比,int
与 "found factor" 案例的生成器操作成本 list
相形见绌。你正在执行超过 3000 个试验分区; 12 yield
s 与 12 次加法对这类工作来说是小菜一碟。将 addition/yield
s 替换为 pass
(什么都不做),你会发现它 still 运行 在(大致)相同数量的时间:
def ignore_divisors_optimized(number):
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
pass
if square_root ** 2 == number:
pass
并使用 ipython
的 %timeit
魔法进行微基准测试:
>>> %timeit -r5 number_of_divisors_optimized(9999999)
266 µs ± 1.85 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 list(divisors_optimized(9999999))
267 µs ± 1.29 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 ignore_divisors_optimized(9999999)
267 µs ± 1.43 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
number_of_divisors
快一微秒这一事实无关紧要(重复测试的抖动高于一微秒);它们的速度基本相同,因为 >99% 的工作是循环和测试,而不是测试通过后所做的工作。
这是 90/10 优化规则的一个例子:大约 90% 的时间花在了 10% 的代码上(在这种情况下,就是试用版本身); 10% 用于其他 90% 的代码。你只优化了 90% 的代码中的一小部分 运行s 10% 的时间,这没有帮助,因为绝大多数时间都花在了 if number % divisor == 0:
线。如果您删除该测试以支持只循环 range
什么都不做,运行 时间在我的本地微基准测试中下降到 ~78 µs,这意味着该测试占用了 [=40 的近 200 µs =]时间,是所有其余代码加在一起所需时间的两倍多。
如果你想对此进行优化,你需要考虑加快试验分割线本身的方法(这基本上意味着不同的 Python 解释器或使用 Cython 将其编译为 C),或 运行 减少该行次数的方法(例如,预先计算可能达到某个界限的素因数,因此对于任何给定的输入,您都可以避免测试非素因数,然后 producing/computing 非素因数的数量来自已知质因数及其重数的质因数)。
对这些函数进行基准测试:
def divisors_optimized(number):
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
yield divisor
yield number / divisor
if square_root ** 2 == number:
yield square_root
def number_of_divisors_optimized(number):
count = 0
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
count += 2
if square_root ** 2 == number:
count += 1
return count
可以看出两者的基本结构是一样的。
基准代码:
number = 9999999
for i in range(10):
print(f"iteration {i}:")
start = time.time()
result = list(utils.divisors_optimized(number))
end = time.time()
print(f'len(divisors_optimized) took {end - start} seconds and found {len(result)} divisors.')
start = time.time()
result = utils.number_of_divisors_optimized(number)
end = time.time()
print(f'number_of_divisors_optimized took {end - start} seconds and found {result} divisors.')
print()
输出:
iteration 0:
len(divisors_optimized) took 0.00019598007202148438 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001919269561767578 seconds and found 12 divisors.
iteration 1:
len(divisors_optimized) took 0.00019121170043945312 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020599365234375 seconds and found 12 divisors.
iteration 2:
len(divisors_optimized) took 0.000179290771484375 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00019049644470214844 seconds and found 12 divisors.
iteration 3:
len(divisors_optimized) took 0.00019025802612304688 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.
iteration 4:
len(divisors_optimized) took 0.0001785755157470703 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.
iteration 5:
len(divisors_optimized) took 0.00022721290588378906 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.
iteration 6:
len(divisors_optimized) took 0.0001919269561767578 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00018930435180664062 seconds and found 12 divisors.
iteration 7:
len(divisors_optimized) took 0.00017881393432617188 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.
iteration 8:
len(divisors_optimized) took 0.00017976760864257812 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001785755157470703 seconds and found 12 divisors.
iteration 9:
len(divisors_optimized) took 0.00024819374084472656 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020766258239746094 seconds and found 12 divisors.
您可以看到执行时间非常接近,每次都有利。
有人可以向我解释一下,为什么从生成器创建列表并检索其长度与迭代时计数一样快?我的意思是,内存分配 (list()
) 不应该比分配昂贵得多吗?
我正在使用 Python 3.6.3 .
您测试的东西远远 多于您生产的东西。与完成的总工作量相比,int
与 "found factor" 案例的生成器操作成本 list
相形见绌。你正在执行超过 3000 个试验分区; 12 yield
s 与 12 次加法对这类工作来说是小菜一碟。将 addition/yield
s 替换为 pass
(什么都不做),你会发现它 still 运行 在(大致)相同数量的时间:
def ignore_divisors_optimized(number):
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
pass
if square_root ** 2 == number:
pass
并使用 ipython
的 %timeit
魔法进行微基准测试:
>>> %timeit -r5 number_of_divisors_optimized(9999999)
266 µs ± 1.85 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 list(divisors_optimized(9999999))
267 µs ± 1.29 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 ignore_divisors_optimized(9999999)
267 µs ± 1.43 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
number_of_divisors
快一微秒这一事实无关紧要(重复测试的抖动高于一微秒);它们的速度基本相同,因为 >99% 的工作是循环和测试,而不是测试通过后所做的工作。
这是 90/10 优化规则的一个例子:大约 90% 的时间花在了 10% 的代码上(在这种情况下,就是试用版本身); 10% 用于其他 90% 的代码。你只优化了 90% 的代码中的一小部分 运行s 10% 的时间,这没有帮助,因为绝大多数时间都花在了 if number % divisor == 0:
线。如果您删除该测试以支持只循环 range
什么都不做,运行 时间在我的本地微基准测试中下降到 ~78 µs,这意味着该测试占用了 [=40 的近 200 µs =]时间,是所有其余代码加在一起所需时间的两倍多。
如果你想对此进行优化,你需要考虑加快试验分割线本身的方法(这基本上意味着不同的 Python 解释器或使用 Cython 将其编译为 C),或 运行 减少该行次数的方法(例如,预先计算可能达到某个界限的素因数,因此对于任何给定的输入,您都可以避免测试非素因数,然后 producing/computing 非素因数的数量来自已知质因数及其重数的质因数)。