了解 Keras LSTM：批量大小和状态性的作用

Question

来源

有几个来源解释了有状态/无状态 LSTM 以及我已经阅读过的 batch_size 的作用。我稍后会在 post:

中提到它们

[1] https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/

[2] https://machinelearningmastery.com/stateful-stateless-lstm-time-series-forecasting-python/

[3] http://philipperemy.github.io/keras-stateful-lstm/

[4] https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/

还有其他 SO 线程，例如 and ，但是它们并没有完全解释我在寻找什么。

我的问题

我仍然不确定我的任务关于状态和确定 batch_size 的正确方法是什么。

我有大约 1000 个独立的时间序列 (samples)，每个时间序列的长度约为 600 天 (timesteps)（实际上是可变长度，但我考虑过将数据修剪为常数timeframe），每个时间步长有 8 个特征（或 input_dim）（一些特征与每个样本相同，每个样本有一些特征）。

Input shape = (1000, 600, 8)

其中一个特征是我要预测的特征，而其他特征（应该是）支持预测这个“主特征”。我将为 1000 个时间序列中的每一个都这样做。模拟此问题的最佳策略是什么？

Output shape = (1000, 600, 1)

什么是批处理？

来自[4]：

Keras uses fast symbolic mathematical libraries as a backend, such as TensorFlow and Theano.

A downside of using these libraries is that the shape and size of your data must be defined once up front and held constant regardless of whether you are training your network or making predictions.

[…]

This does become a problem when you wish to make fewer predictions than the batch size. For example, you may get the best results with a large batch size, but are required to make predictions for one observation at a time on something like a time series or sequence problem.

这对我来说听起来像是一个“批处理”会沿着 timesteps 维度拆分数据。

然而，[3] 指出：

Said differently, whenever you train or test your LSTM, you first have to build your input matrix X of shape nb_samples, timesteps, input_dim where your batch size divides nb_samples. For instance, if nb_samples=1024 and batch_size=64, it means that your model will receive blocks of 64 samples, compute each output (whatever the number of timesteps is for every sample), average the gradients and propagate it to update the parameters vector.

当深入研究 [1] 和 [4] 的示例时，Jason 总是将他的时间序列拆分为几个仅包含 1 个时间步长的样本（在他的示例中完全确定序列中的下一个元素的前身）。所以我认为这些批次实际上是沿着 samples 轴拆分的。（然而，他的时间序列拆分方法对我来说对于长期依赖问题没有意义。）

结论

所以假设我选择 batch_size=10，这意味着在一个时期内权重更新 1000 / 10 = 100 次，随机选择 10 个，包含 600 x 8 值的完整时间序列，当我以后想要要使用模型进行预测，我总是必须将 10 个完整时间序列分批输入（或使用 [4 中的 解决方案 3，将权重复制到batch_size).

不同的新模型

理解 batch_size 的原则 – 但是仍然不知道什么是 batch_size 的良好价值。 以及如何确定它

状态

KERAS documentation告诉我们

You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch.

如果我将我的时间序列分成几个samples（比如[1] 和 [4]) 以便我想要建模的依赖项跨越多个批次， 或者跨批次样本相互关联 ，我可能 需要一个有状态的网络 ，否则不需要。这是一个正确而完整的结论吗？

所以对于我的问题，我想我不需要有状态的网络。我将我的训练数据构建为形状为 (samples, timesteps, features) 的 3D 数组，然后调用 model.fit 和尚未确定的 batch_size。示例代码可能如下所示：

model = Sequential()
model.add(LSTM(32, input_shape=(600, 8)))   # (timesteps, features)
model.add(LSTM(32))
model.add(LSTM(32))
model.add(LSTM(32))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)

Answer 1

让我通过一个例子来解释它：

假设您有以下系列：1,2,3,4,5,6,...,100。您必须决定您的 lstm 将学习多少个时间步长，并据此重塑您的数据。如下所示：

如果您决定 time_steps = 5，则必须以这种方式将时间序列重塑为样本矩阵：

1,2,3,4,5 -> sample1

2,3,4,5,6 -> sample2

3,4,5,6,7 -> sample3

etc...

这样，您将得到一个形状矩阵（96 个样本 x 5 个时间步长）

此矩阵应重塑为 (96 x 5 x 1)，表明 Keras 您只有 1 个时间序列。如果您有更多并行时间序列（如您的情况），则对每个时间序列执行相同的操作，因此您将以每个形状（96 个样本 x 5 个时间步长）的 n 个矩阵（每个时间序列一个）结束。

为了论证，假设您有 3 个时间序列。您应该将所有三个矩阵连接成一个形状张量（96 个样本 x 5 个时间步长 x 3 个时间序列）。此示例的 lstm 的第一层为：

    model = Sequential()
    model.add(LSTM(32, input_shape=(5, 3)))

作为第一个参数的 32 完全由您决定。这意味着在每个时间点，你的3个时间序列将变成32个不同的变量作为输出space。更容易将每个时间步视为具有 3 个输入和 32 个输出但具有与 FC 层不同的计算的完全连接层。

如果要堆叠多个 lstm 层，请使用 return_sequences=True 参数，这样该层将输出整个预测序列，而不仅仅是最后一个值。

您的目标应该是您要预测的系列中的下一个值。

综合起来，假设您有以下时间序列：

时间序列 1（主）：1,2,3,4,5,6,..., 100

时间序列 2（支持）：2,4,6,8,10,12,..., 200

时间序列 3（支持）：3,6,9,12,15,18,..., 300

创建输入和目标张量

x     -> y
1,2,3,4,5 -> 6

2,3,4,5,6 -> 7

3,4,5,6,7 -> 8

reformat the rest of time series, but forget about the target since you don't want to predict those series

创建你的模型

    model = Sequential()
    model.add(LSTM(32, input_shape=(5, 3), return_sequences=True)) # Input is shape (5 timesteps x 3 timeseries), output is shape (5 timesteps x 32 variables) because return_sequences  = True
    model.add(LSTM(8))  # output is shape (1 timesteps x 8 variables) because return_sequences = False
    model.add(Dense(1, activation='linear')) # output is (1 timestep x 1 output unit on dense layer). It is compare to target variable.

编译并训练。一个好的批量大小是 32。批量大小是为了加快计算而拆分样本矩阵的大小。只是不要使用 statefull

了解 Keras LSTM：批量大小和状态性的作用

Understanding Keras LSTMs: Role of Batch-size and Statefulness

python

lstm

keras

recurrent-neural-network

来源

我的问题

什么是批处理？

状态