优化 4D Numpy 数组构造

Question

我有一个形状为 (50,8,2048,256) 的 4D 数组 data，它包含 50 个包含 8 个 2048x256 像素图像的组。 times 是一个形状为 (50,8) 的数组，给出每张图像的拍摄时间。

我为每个组中的所有图像计算每个像素的一阶多项式拟合，得到一个形状数组 (50,2048,256,2)。这基本上是 50 个组中每个组的矢量图。我用来存储多项式的代码是：

fits = np.ones((50,2048,256,2))
times = times.reshape(50,8,1).repeat(2048,2).reshape(50,8,2048,1).repeat(256,3)
for group in range(50):    
    for xpos in range(2048):
        for ypos in range(256):
            px_data = data[:,:,ypos,xpos]
            fits[group,ypos,xpos,:] = np.polyfit(times[group,:,ypos,xpos],data[group,:,ypos,xpos],1)

现在的挑战是我想生成一个形状为 (50,12,2048,256) 的数组 new_data，其中我使用 fits 的多项式系数和 new_time 生成 50 组 12 张图像。

我想我可以使用 np.polyval(fits, new_time) 之类的东西来生成图像，但我对如何表达它感到很困惑。它应该是这样的：

new_data = np.ones((50,12,2048,256))
for i,(times,fit) in enumerate(zip(new_times,fits)):
    new_data[i] = np.polyval(fit,times)

但是我收到广播错误。如有任何帮助，我们将不胜感激！

更新好的，所以我稍微更改了代码，以便它可以正常工作并完全按照我的要求进行操作，但是所有这些循环都非常慢（每组约 1 分钟，这意味着我将花费将近一个小时运行！）。谁能建议一种优化方法以加快速度？

# Generate the polynomials for each pixel in each group
fits = np.ones((50,2048,256,2))
times = np.arange(0,50*8*grptme,grptme).reshape(50,8)
times = times.reshape(50,8,1).repeat(2048,2).reshape(50,8,2048,1).repeat(256,3)
for group in range(50):
    for xpos in range(2048):
        for ypos in range(256):
            fits[group,xpos,ypos] = np.polyfit(times[group,:,xpos,ypos],data[group,:,xpos,ypos],1)

# Create new array of 12 images per group using the polynomials for each pixel
new_data = np.ones((50,12,2048,256))
times = np.arange(0,50*12*grptme,grptme).reshape(50,12)
times = times.reshape(50,12,1).repeat(2048,2).reshape(50,12,2048,1).repeat(256,3)
for group in range(50):
    for img in range(12):
        for xpos in range(2048):
            for ypos in range(256):
                new_data[group,img,xpos,ypos] = np.polynomial.polynomial.polyval(times[group,img,xpos,ypos],fits[group,xpos,ypos])

Answer 1

关于速度，我看到了很多循环，由于 numpy 的美丽，这些循环应该而且通常可以避免。如果我完全理解你的问题，你想在 50 组 8 个数据点上拟合一阶多项式 2048 * 256 次。所以对于适合你形象的形状并没有起到什么作用。所以我的建议是展平你的图像，因为使用 np.polyfit 你可以同时适应一系列 x 值和几组 y 值

来自文档字符串

x : array_like, shape (M,)
    x-coordinates of the M sample points ``(x[i], y[i])``.
y : array_like, shape (M,) or (M, K)
    y-coordinates of the sample points. Several data sets of sample
    points sharing the same x-coordinates can be fitted at once by
    passing in a 2D-array that contains one dataset per column.

所以我会选择

# Generate the polynomials for each pixel in each group
fits = np.ones((50,2048*256,2))
times = np.arange(0,50*8*grptme,grptme).reshape(50,8)
data_fit = data.reshape((50,8,2048*256))
for group in range(50):
    fits[group] = np.polyfit(times[group],data_fit[group],1).T
fits_original_shape = fits.reshape((50,2048,256,2))

转置是必要的，因为你想在最后一个索引中有参数，但是np.polyfit先有它们，然后是不同的数据集

然后评估它，基本上又是同样的技巧：

# Create new array of 12 images per group using the polynomials for each pixel
new_data = np.zeros((50,12,2048*256))
times = np.arange(0,50*12*grptme,grptme).reshape(50,12)
#times = times.reshape(50,12,1).repeat(2048,2).reshape(50,12,2048,1).repeat(256,3)
for group in range(50):
    new_data[group] = np.polynomial.polynomial.polyval(times[group],fits[group].T).T
new_data_original_shape = new_data.reshape((50,12,2048,256))

由于参数与不同数据集的排序，再次需要两个转置，以便与数组的形状相匹配。

也许人们也可以通过一些高级的 numpy 魔法来避免在组上循环，但是有了这个代码运行速度已经快得多了。

希望对您有所帮助！

优化 4D Numpy 数组构造

Optimize 4D Numpy array construction

python

arrays

optimization

numpy

slice