洗牌数据时避免缓冲

Question

我一直在努力为这个问题找到一个好名字，所以一个好的答案（可能已经存在于某处:/），所以我不介意任何重命名的想法。

我正在使用 numpy 数组，其中一行表示对象上的数据，通常类似于 features = [feature0, feature1].

在使用这个数组的时候，我是先洗牌然后用它来学习。在使用它时（洗牌后），我越来越需要使用当前行中前几行的功能。

为此，我使用了一个缓冲区，结果我使用了一个新数组，其中包含像 [featuresN-i, ..., featuresN-1, featuresN] 这样的行作为第 N 行，然后对其进行洗牌。

我想知道是否有一种方法可以改为打乱索引并从我的 2d 数组上的 something_function 中获取像这样的 3d 数组：

original_array.something_function(shuffled_index[N:M]) 
-> [
    [[features of shuffled_index[ N ] - i],
                   ...                    ,
     [features of shuffled_index[ N ]    ]], 
    [[features of shuffled_index[N+1] - i],
                   ...                    ,
     [features of shuffled_index[N+1]    ]],
                  .....                    ,
    [[features of shuffled_index[ M ] - i],
                   ...                    ,
     [features of shuffled_index[ M ]    ]]
   ]

如果有，是否值得调用它来将我的缓冲数组的大小减小 i 倍？

欢迎任何提示。

Answer 1

正如您自己意识到的那样：不要打乱数组。打乱索引。

import numpy as np

# create data
nrows = 100
ncols = 4
arr = np.random.rand(nrows, ncols)

# create indices and shuffle
indices = np.arange(nrows)
np.random.shuffle(indices) # in-place operation!

# loop over shuffled indices, do stuff with array
for ii in indices:
    print ii, arr[[ii-1, ii, (ii+1) % nrows]] # (ii+1) % nrows to handle edge case (through wrap around)

洗牌数据时避免缓冲

Avoid buffering when shuffling data

python

buffer

numpy

shuffle