Keras SimpleRNN / LSTM 默认使用哪个轴作为时间轴？

Question

当对经典 sentiment analysis 算法使用 SimpleRNN 或 LSTM 时（此处应用于长度 <= 250 words/tokens 的句子）：

model = Sequential()
model.add(Embedding(5000, 32, input_length=250))   # Output shape: (None, 250, 32)
model.add(SimpleRNN(100))                          # Output shape: (None, 100)
model.add(Dense(1, activation='sigmoid'))          # Output shape: (None, 1)

哪里指定RNN的输入的哪个轴作为"temporal"轴？

更准确地说，在Embedding层之后，给定的输入句子，例如"the cat sat on the mat"，被编码成形状为 (250, 32) 的矩阵 x，其中 250 是输入文本的最大长度（以单词为单位），并且32 嵌入的维数。然后，在Keras中的哪里指定是否使用它：

h[t] = activation( W_h * x[:, t] + U_h * h[t-1] + b_h )

或者这个：

h[t] = activation( W_h * x[t, :] + U_h * h[t-1] + b_h )

（在这两种情况下，y[t] = activation( W_y * h[t] + b_y )）

TL;DR：如果 RNN Keras 层的输入大小为 (250, 32)，默认情况下它使用哪个轴作为时间轴？ Keras 或 Tensorflow 文档中的详细信息在哪里？

PS：如何解释参数个数（由model.summary()给出）是13300？ W_h 有 100x32 个系数，U_h 有 100x100 个系数，b_h 有 100x1 个系数，即我们已经有 13300 个！ W_y 和 b_y 没有系数了！这怎么解释？

Answer 1

时间轴：一直是dim 1，除非time_major=True，那么就是dim 2； Embedding 层输出一个 3D 张量。这可以看出here where step_input_shape is the shape of input fed to the RNN cell at each step in the recurrent loop。对于您的情况，timesteps=250 和 SimpleRNN 单元 "sees" 在每一步形状为 (batch_size, 32) 的张量。

# of params：您可以通过检查每一层的 .build() 代码来查看图形的派生方式：Embedding, SimpleRNN, Dense，或者同样调用 .weights 在每一层上。对于您的情况，w/ l = model.layers[1]:

l.weights[0].shape == (32, 100) --> 3200 个参数 (kernel)
l.weights[1].shape == (100, 100) --> 10000 个参数 (recurrent_kernel)
l.weights[2].shape == (100,) --> 100 个参数 (bias) (总和: 13,300)

计算逻辑：没有W_y或b_y； "y" 是隐藏状态，h，实际上对于所有循环层 - 你引用的可能来自通用 RNN 公式。 @ "in both cases..." - 这是错误的；要查看实际发生的情况，请检查 .call() 代码。

P.S。我建议定义完整的 batch_shape 模型进行调试，因为它消除了模棱两可的 None 形状

SimpleRNN 公式与代码：按要求；请注意源代码中的 h 具有误导性，在公式 ("pre-activation").

中通常是 z

return_sequences=True -> all 时间步的输出被返回：(batch_size, timesteps, channels)
return_sequences=False -> 仅返回 last 时间步的输出：(batch_size, 1, channels)。参见 here

Keras SimpleRNN / LSTM 默认使用哪个轴作为时间轴？

Which axis does Keras SimpleRNN / LSTM use as the temporal axis by default?

python

keras

tensorflow

recurrent-neural-network