高速将 3d numpy 数组保存到磁盘中

Question

我有一个大小为 (192,192,4000) 的 numpy 数组，我想在磁盘上快速写入它。我不关心格式，我可以在之后转换它。

我现在写的是保存成csv格式，需要很长时间：

for i in range(0,192):
        np.savetxt(foder+"/{}_{}.csv".format(filename,i), data[i] , "%i", delimiter=", ")

这需要 20-25 秒。我已经尝试 pandas 在 Whosebug 问题中找到的 DataFrame 和 Panel 方法并保存了 numpy。所有这些似乎运行都没有错误，但是当我打开它时文件夹是空的。

知道如何提高速度吗？

为什么代码运行没有错误但没有保存任何内容，例如 numpy.save？！

Answer 1

通常保存大型数组的最快方法是将其保存为二进制文件，这可以通过 numpy 的保存命令来完成。例如，下面创建一个用零填充的 3D 数组，将数组写入文件然后检索它：

a = numpy.zeros((192,192,4000))
numpy.save("mydata.npy",a)
b = numpy.load("mydata.npy")

当然，保存命令后文件"mydata.npy"应该在当前目录中。

Answer 2

您还可以在保存前从 3D to 2D 重塑数组。有关示例，请参见以下代码。

import numpy as gfg 


arr = gfg.random.rand(5, 4, 3) 

# reshaping the array from 3D 
# matrice to 2D matrice. 
arr_reshaped = arr.reshape(arr.shape[0], -1) 

# saving reshaped array to file. 
gfg.savetxt("geekfile.txt", arr_reshaped) 

# retrieving data from file. 
loaded_arr = gfg.loadtxt("geekfile.txt") 

# This loadedArr is a 2D array, therefore 
# we need to convert it to the original 
# array shape.reshaping to get original 
# matrice with original shape. 
load_original_arr = loaded_arr.reshape( 
    loaded_arr.shape[0], loaded_arr.shape[1] // arr.shape[2], arr.shape[2]) 

# check the shapes: 
print("shape of arr: ", arr.shape) 
print("shape of load_original_arr: ", load_original_arr.shape) 

# check if both arrays are same or not: 
if (load_original_arr == arr).all(): 
    print("Yes, both the arrays are same") 
else: 
    print("No, both the arrays are not same")

高速将 3d numpy 数组保存到磁盘中

Save 3d numpy array with high speed into the Disk

python

numpy

save

multidimensional-array