numpy 是否有可能替换 for 循环以提高性能？

Question

一般来说，有没有办法用 numpy 或 python 提高此代码的性能？目标是建立一个训练集。 features 是原始数据。我想使用移动 window 方法，步长为 1 到“'enrich'”数据。最后，我希望 ro 将数据从 2D 数组重塑为 3D 数组，因为一个训练输入的形状为 (windowSize, features.shape[1]).

import numpy as np

windowSize = 4
features = np.array([[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16],[17,18],[19,20]])
featuresReshaped = features[:windowSize]

for i in range(1, features.shape[0], 1):
    featuresReshaped = np.vstack((featuresReshaped, features[i:i+windowSize]))

maxindex = int(featuresReshaped.shape[0]/windowSize) * windowSize
featuresReshaped = featuresReshaped[:maxindex]
featuresReshaped = featuresReshaped.reshape(int(featuresReshaped.shape[0]/windowSize), windowSize, featuresReshaped.shape[1])

Answer 1

此解决方案通过使用 NumPy 索引避免了所有循环和诸如此类的事情。

import numpy as np

windowSize = 4
features = np.array(
    [[ 1,  2], 
     [ 3,  4],
     [ 5,  6],
     [ 7,  8],
     [ 9, 10],
     [11, 12],
     [13, 14],
     [15, 16],
     [17, 18],
     [19, 20]]
)

indices = np.add.outer(np.arange(len(features) - windowSize + 1), np.arange(windowSize))
# indices:
# [[0 1 2 3]
#  [1 2 3 4]
#  [2 3 4 5]
#  [3 4 5 6]
#  [4 5 6 7]
#  [5 6 7 8]
#  [6 7 8 9]]
features[indices] # indices must be of type np.ndarray or this won't work
# features[indices]:
# [[[ 1  2]
#   [ 3  4]
#   [ 5  6]
#   [ 7  8]]

#  [[ 3  4]
#   [ 5  6]
#   [ 7  8]
#   [ 9 10]]

#  [[ 5  6]
#   [ 7  8]
#   [ 9 10]
#   [11 12]]

#  [[ 7  8]
#   [ 9 10]
#   [11 12]
#   [13 14]]

#  [[ 9 10]
#   [11 12]
#   [13 14]
#   [15 16]]

#  [[11 12]
#   [13 14]
#   [15 16]
#   [17 18]]

#  [[13 14]
#   [15 16]
#   [17 18]
#   [19 20]]]

需要注意的是，您的代码输出的内容与我的不同，我认为这可能是一个错误，因为您的最后一个切片是：

print(featuresReshaped[-1])
# [[15 16]
#  [17 18]
#  [19 20]
#  [17 18]]]

这与您提供的“移动 window”描述不一致。

numpy 是否有可能替换 for 循环以提高性能？

Is there a possibility for numpy to replace a for-loop for performance improvements?

arrays

optimization

numpy

dataset

python-3.x