为什么简单的 Elman RNN 中的输出形状取决于序列长度（而隐藏状态形状不依赖）？

Question

我正在学习 RNN，并尝试使用 PyTorch 编写一个程序。我在理解输出维度时遇到一些问题

这是一些简单的 RNN 架构的代码

class RNN(nn.Module):
    def __init__(self, input_size, hidden_dim, n_layers):
        super(RNN, self).__init__()
        self.hidden_dim=hidden_dim
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)

    def forward(self, x, hidden):
        r_out, hidden = self.rnn(x, hidden)

        return r_out, hidden

所以，我的理解是 hidden_dim 是我将在隐藏层中拥有的块数，本质上是输出和隐藏状态中的特征数。

我创建了一些虚拟数据来测试它

test_rnn = RNN(input_size=1, hidden_dim=4, n_layers=1)

# generate evenly spaced, test data pts
time_steps = np.linspace(0, 6, 3)
data = np.sin(time_steps)
data.resize((3, 1))

test_input = torch.Tensor(data).unsqueeze(0) # give it a batch_size of 1 as first dimension
print('Input size: ', test_input.size())

# test out rnn sizes
test_out, test_h = test_rnn(test_input, None)
print('Hidden state size: ', test_h.size())
print('Output size: ', test_out.size())

我得到的是

Input size:  torch.Size([1, 3, 1])
Hidden state size:  torch.Size([1, 1, 4])
Output size:  torch.Size([1, 3, 4])

所以我明白 x 的形状是这样确定的 x = (batch_size, seq_length, input_size).. 所以 1 个浴槽大小，输入 1 个特征和 3 个时间步长（序列长度）。
对于隐藏状态，就像 hidden = (n_layers, batch_size, hidden_dim).. 所以我在隐藏层中有 1 层、批量大小 1 和 4 个块。
我没有得到的是 RNN 输出。从文档中，r_out = (batch_size, time_step, hidden_size)..输出不应该与隐藏单元输出的隐藏状态相同吗？也就是说，如果我的隐藏层中有 4 个单元，我希望它为隐藏状态输出 4 个数字，并为输出输出 4 个数字。为什么时间步长是输出的一个维度？因为，每个隐藏单元，接受一些数字，输出状态 S 和输出 Y，这两者是相等的，是吗？我尝试了一个图表，这就是我想出的。帮助我了解我哪里做错了。

所以 TL;DR
为什么简单的 Elman RNN 中的输出形状取决于序列长度（而隐藏状态形状则不然）？因为在我画的图中，我看到他们两个是一样的。

Answer 1

在 PyTorch API 中，输出是 RNN 计算期间的一系列隐藏状态，即每个输入向量有一个隐藏状态向量。隐藏状态是最后一个隐藏状态，RNN 在处理输入后结束的状态，所以 test_out[:, -1, :] = test_h.

Vector y in your diagrams is the same as a hidden state Ht, 它确实有 4 个数字，但是每个时间步的状态都不同，所以你有 4 个数字时间步长。

PyTorch 分离输出序列 = 隐藏状态的原因（尽管在 LSTM 中不一样）是因为您可以拥有一批不同长度的序列。在那种情况下，最终状态不只是test_out[:, -1, :]，因为你需要select基于单个序列长度的最终状态。

为什么简单的 Elman RNN 中的输出形状取决于序列长度（而隐藏状态形状不依赖）？

Why does output shape in a simple Elman RNN depend on the sequence length(while hidden state shape doesn't)?

recurrent-neural-network

pytorch