连接大型 numpy 数组的最快方法
fastest way to concatenate large numpy arrays
我正在做一些光流分析。目标是遍历长电影中的每一帧,计算密集光流,并将所得角度和幅度附加到不断增长的 numpy 数组。我发现完成每个连续循环的时间越来越长,我不确定为什么。这是一个概括问题的简单示例循环:
import numpy as np
arraySize = (1, 256, 256) # correct array size
emptyArray = np.zeros(arraySize) # empty array to fill with angles from every image pair
timeElapsed = [] # empty list to fill with time values
for i in range(100): # iterates through the frames in the image stack
start = time.time() # start the time
newArray = np.zeros(arraySize) # makes an example new array
emptyArray = np.concatenate((emptyArray, newArray)) # concats new and growing arrays
end = time.time() # stop the time
timeElapsed.append(end-start) # append the total time for the loop to the growing list
如果我然后绘制每个循环经过的时间,我会得到每次循环的线性增加。在这个例子中它仍然是可以容忍的,但对于我的实际数据集它不是。
我猜测较大的阵列需要更多时间来处理,但我不确定如何避免这种情况。有没有更好、更快或更 Pythonic 的方法来做到这一点?
------------编辑------------
根据 mathfux 的建议:我将循环修改如下:
arraySize = (1, 256, 256) # correct array size
emptyArray = np.concatenate([np.zeros(arraySize) for i in range(100)]) # empty array to fill with angles from every image pair
timeElapsed = [] # empty list to fill with time values
for i in range(100): # iterates through the frames in the image stack
start = time.time() # start the time
newArray = np.zeros(arraySize) # makes an example new array
emptyArray[i] = newArray[0] # overwrites empty array with newarray values at the relevant position
end = time.time() # stop the time
timeElapsed.append(end-start) # append the total time for the loop to the growing list
现在 time/loop 在迭代之间非常一致:
谢谢!
每次附加一个新数组时,都会分配新内存以创建更大的数组并将数据记录到其中。这是非常昂贵的。更好的解决方案是分配一次特定大小的内存,然后仅使用一次 np.concatenate
记录您的日期:
np.concatenate([np.zeros(arraySize) for i in range(100)])
这种方式在我的电脑上似乎快了 28 倍
start = time.time() # start the time
arrays = []
for i in range(100): # iterates through the frames in the image stack
arrays.append(np.zeros(arraySize))
#Concatenate all in one time
newArray=np.concatenate(arrays)
end = time.time() # stop the time
timeElapsed2 = end-start
print("Elapesed:",timeElapsed2)
print("sum elapsed times of first method:", np.sum(timeElapsed))
已用:0.021436214447021484
第一种方法的总运行时间:0.6163454055786133
使用加速器可以通过使用 GPU 或 TPU 功能提高代码速度,例如通过使用 jax 库,您的代码将 运行 或 about 1000 times faster than other answers (每个循环大约 40 到 50 µs)使用 google colab TPU:
from jax import jit
@jit
def zac():
arraySize = (1, 256, 256) # correct array size
emptyArray = np.zeros(arraySize) # empty array to fill with angles from every image pair
timeElapsed = [] # empty list to fill with time values
for i in range(100): # iterates through the frames in the image stack
start = time.time() # start the time
newArray = np.zeros(arraySize) # makes an example new array
emptyArray = np.concatenate((emptyArray, newArray)) # concats new and growing arrays
end = time.time() # stop the time
timeElapsed.append(end-start) # append the total time for the loop to the growing list
%timeit -n10000 zac()
计算的结果如下:
10000 loops, best of 5: 47.7 µs per loop
我正在做一些光流分析。目标是遍历长电影中的每一帧,计算密集光流,并将所得角度和幅度附加到不断增长的 numpy 数组。我发现完成每个连续循环的时间越来越长,我不确定为什么。这是一个概括问题的简单示例循环:
import numpy as np
arraySize = (1, 256, 256) # correct array size
emptyArray = np.zeros(arraySize) # empty array to fill with angles from every image pair
timeElapsed = [] # empty list to fill with time values
for i in range(100): # iterates through the frames in the image stack
start = time.time() # start the time
newArray = np.zeros(arraySize) # makes an example new array
emptyArray = np.concatenate((emptyArray, newArray)) # concats new and growing arrays
end = time.time() # stop the time
timeElapsed.append(end-start) # append the total time for the loop to the growing list
如果我然后绘制每个循环经过的时间,我会得到每次循环的线性增加。在这个例子中它仍然是可以容忍的,但对于我的实际数据集它不是。
我猜测较大的阵列需要更多时间来处理,但我不确定如何避免这种情况。有没有更好、更快或更 Pythonic 的方法来做到这一点?
------------编辑------------
根据 mathfux 的建议:我将循环修改如下:
arraySize = (1, 256, 256) # correct array size
emptyArray = np.concatenate([np.zeros(arraySize) for i in range(100)]) # empty array to fill with angles from every image pair
timeElapsed = [] # empty list to fill with time values
for i in range(100): # iterates through the frames in the image stack
start = time.time() # start the time
newArray = np.zeros(arraySize) # makes an example new array
emptyArray[i] = newArray[0] # overwrites empty array with newarray values at the relevant position
end = time.time() # stop the time
timeElapsed.append(end-start) # append the total time for the loop to the growing list
现在 time/loop 在迭代之间非常一致:
谢谢!
每次附加一个新数组时,都会分配新内存以创建更大的数组并将数据记录到其中。这是非常昂贵的。更好的解决方案是分配一次特定大小的内存,然后仅使用一次 np.concatenate
记录您的日期:
np.concatenate([np.zeros(arraySize) for i in range(100)])
这种方式在我的电脑上似乎快了 28 倍
start = time.time() # start the time
arrays = []
for i in range(100): # iterates through the frames in the image stack
arrays.append(np.zeros(arraySize))
#Concatenate all in one time
newArray=np.concatenate(arrays)
end = time.time() # stop the time
timeElapsed2 = end-start
print("Elapesed:",timeElapsed2)
print("sum elapsed times of first method:", np.sum(timeElapsed))
已用:0.021436214447021484
第一种方法的总运行时间:0.6163454055786133
使用加速器可以通过使用 GPU 或 TPU 功能提高代码速度,例如通过使用 jax 库,您的代码将 运行 或 about 1000 times faster than other answers (每个循环大约 40 到 50 µs)使用 google colab TPU:
from jax import jit
@jit
def zac():
arraySize = (1, 256, 256) # correct array size
emptyArray = np.zeros(arraySize) # empty array to fill with angles from every image pair
timeElapsed = [] # empty list to fill with time values
for i in range(100): # iterates through the frames in the image stack
start = time.time() # start the time
newArray = np.zeros(arraySize) # makes an example new array
emptyArray = np.concatenate((emptyArray, newArray)) # concats new and growing arrays
end = time.time() # stop the time
timeElapsed.append(end-start) # append the total time for the loop to the growing list
%timeit -n10000 zac()
计算的结果如下:
10000 loops, best of 5: 47.7 µs per loop