在 python 中使用多线程缩短执行时间

Question

我怎样才能实现多线程来使这个过程更快？该程序生成 100 万个随机数并将它们写入文件。只需 2 秒多一点，但我想知道多线程是否会使其更快。

import random
import time

startTime = time.time()

data = open("file2.txt", "a+")

for i in range(1000000):
  number = str(random.randint(1, 9999))
  data.write(number + '\n')
data.close()

executionTime = (time.time() - startTime)
print('Execution time in seconds: ', + str(executionTime))

Answer 1

简短的回答：不容易。

这是一个使用多处理池来加速您的代码的示例：

import random
import time
from multiprocessing import Pool

startTime = time.time()

def f(_):
    number = str(random.randint(1, 9999))
    data.write(number + '\n')

data = open("file2.txt", "a+")
with Pool() as p:
    p.map(f, range(1000000))
data.close()

executionTime = (time.time() - startTime)
print(f'Execution time in seconds: {executionTime})')

好看吗？等待！这不是 drop-in 替换，因为它缺乏进程同步，所以不会写入所有 1000000 行（有些会在同一个缓冲区中被覆盖）！参见 Python multiprocessing safely writing to a file

因此我们需要将计算数字（并行）与写入数字（串行）分开。我们可以这样做：

import random
import time
from multiprocessing import Pool

startTime = time.time()

def f(_):
    return str(random.randint(1, 9999))

with Pool() as p:
    output = p.map(f, range(1000000))

with open("file2.txt", "a+") as data:
    data.write('\n'.join(output) + '\n')

executionTime = (time.time() - startTime)
print(f'Execution time in seconds: {executionTime})')

解决了这个问题后，请注意这不是多线程，而是使用多个进程。您可以使用不同的池对象将其更改为多线程：

from multiprocessing.pool import ThreadPool as Pool

在我的系统上，处理池的处理时间从 1 秒增加到 0.35 秒。然而，使用 ThreadPool 最多需要 2 秒！

原因是Python的全局解释器锁阻止了多个线程有效地处理你的任务，参见What is the global interpreter lock (GIL) in CPython?

总而言之，多线程并不总是正确的答案：

在您的场景中，一个限制是文件访问，只有一个线程可以写入文件，否则您将需要引入锁定，使任何性能提升都没有意义
同样在 Python 中，多线程仅适用于特定任务，例如python 以下的库中发生的长时间计算，因此可以运行并行。在您的场景中，多线程的开销抵消了性能优势的小潜力。

好处：是的，使用多处理而不是多线程，我的系统速度提高了 3 倍。

Answer 2

将字符串@一次写入文件，而不是将每个数字单独写入文件&我已经用multithreading测试了它，它实际上降低了性能，因为你是如果您执行 threading 写入同一文件，您还必须 synchronize 这将影响 performance.

代码：（以秒为单位的执行时间：1.943817138671875）

import time

startTime = time.time()

import random
size = 1000_000
# pre declaring the list to save time from resize it later
l = [None] * size
# l = list(str(random.randint(1, 99999)) for i in range(size))
# random.randrange(0, )
for i in range(size):
    # l[i] = str(random.randint(1, 99999)) + "\n"
    l[i] = f"{random.randint(1, 99999)}\n"
    
# writing data @ once to the file
with open('file2.txt', 'w+') as data:
    data.write(''.join(l))

executionTime = (time.time() - startTime)

print('Execution time in seconds: ' + str(executionTime))

输出：

Execution time in seconds: 1.943817138671875

在 python 中使用多线程缩短执行时间

Improve execution time with Multithreading in python

python

multithreading

代码：（以秒为单位的执行时间：1.943817138671875）

输出：