从生成器中填充已知大小的字节数组

Question

从以前给定或计算的非常长的小字节数数组（MB、GB、TB）输出（所以我使用 bytearray），我需要在下一个迭代步骤中计算后续 -向上 bytearray。可以计算下一个迭代步骤 bytearray 所需的大小，因此我可以使用 bytearray:

的构造函数之一预先分配内存

# A is the current/former bytearray
# sizes of array: 1 -> 2 -> 8 -> 48 -> 480 -> 5_760 -> 92_160 -> 1_658_880 -> 
# 36_495_360 -> 1_021_870_080 -> 30_656_102_400 [ -> 1_103_619_686_400 ... ]
ls = NextLenArray(A)
L = bytearray(ls)

# generator will create new values out of the current existing
for i,j in enumerate(gen_values(A)):
    L[i] = j

# need to assign it back into A for next iteration
A = L

或者，很明显可以直接创建下一个 bytearray 通过使用 generator inside comprehension。我不知道如何以及何时（逐步？）保留内存。

A = bytearray(j for j in gen_values(A))

看起来它运行得更快了一点，但是监控任务管理器它在生成时使用了更多的内存并且在后面的迭代步骤中它提前停止了一步导致MemoryError。

有没有一种简单的方法通过分配一个 bytearray 和需要的大小来合并预留并将其与 generator/comprehension-list 一起使用？

Answer 1

It looks like that it runs a little faster, but monitoring task manager it uses more memory while generating and in later iteration steps it get stopped one step earlier caused be MemoryError.

这是因为生成器没有众所周知的长度。 Python 无法遍历生成器以了解其长度，因为它会被消耗掉。所以它需要或多或少地动态调整字节数组的大小。关于实现（例如，大小不断增加的动态数组或独立大块的动态数组），这可能需要 显着更多的内存直接以合适的大小分配字节数组。在我的机器上，使用 CPython 3.9.2，我无法重现你的问题，因为它使用内存高效实现。

Is there an easy way to combine the pre-reservation by assigning a bytearray with needed size and use this with generator/comprehension-list?

是的，您可以使用基于块的副本。这是一个例子：

import itertools

ls = NextLenArray(A)
L = bytearray(ls)
gen = gen_values(A)
chunkSize = 65536

for i in range(0, ls, chunkSize):
    # Copy a chunk. This can (and does) allocate memory because of a 
    # potential internal copy. But the amount is bounded by the chunk size.
    L[i:i+chunkSize] = itertools.islice(gen, chunkSize)

请注意，在纯 Python 中操作大量内存效率不高（尤其是使用 CPython）。考虑使用高性能 Python 包，例如 Numpy 和 Numba，或者用 C 或 C++ 等本地语言编写某些部分（例如使用 Cython）。或者，您可能对使用 PyPy 感兴趣。

从生成器中填充已知大小的字节数组

Fill a bytearray of known size from generator

arrays

performance

out-of-memory

python-3.x