LSTM MNIST 数据集中的特征和时间步长

Question

我使用 LSTM 有一段时间了，我想我已经掌握了主要概念。我一直在尝试使用 Keras 环境一段时间，以便更好地了解 LSTM 的工作原理，因此我决定训练一个神经网络来识别 MNIST 数据集。

我知道当我训练 LSTM 时，我应该给出一个张量作为输入（样本数、时间步长、特征）。我将图像从 28x28 重塑为 784 个元素 (1x784) 的单个矢量，然后制作 input_shape = (60000, 1, 784)。最终我尝试更改时间步数，我的新 input_shape 变为 (60000,16,49).

我不明白的是为什么当我改变时间步数时，特征向量从 784 变为 49。我想我不太理解 LSTM 中时间步的概念。你能更好地解释一下吗？可能指的是这个特殊情况？此外，当我增加时间步长时，精度会降低，这是为什么？不应该更高吗？谢谢。

编辑

from __future__ import print_function
import numpy as np
import struct
from keras.models import Sequential
from keras.layers import Dense, LSTM, Activation
from keras.utils import np_utils
train_im = open('train-images-idx3-ubyte','rb')
train_la = open('train-labels-idx1-ubyte','rb')
test_im = open('t10k-images-idx3-ubyte','rb')
test_la = open('t10k-labels-idx1-ubyte','rb')

##training images and labels

magic,num_ima = struct.unpack('>II', train_im.read(8))
rows,columns = struct.unpack('>II', train_im.read(8))
img = np.fromfile(train_im,dtype=np.uint8).reshape(rows*columns, num_ima) #784*60000

magic_l, num_l = struct.unpack('>II', train_la.read(8))
lab = np.fromfile(train_la, dtype=np.int8) #1*60000

## test images and labels

magic, num_test = struct.unpack('>II', test_im.read(8))
rows,columns = struct.unpack('>II', test_im.read(8))
img_test = np.fromfile(test_im,dtype=np.uint8).reshape(rows*columns, num_test) #784x10000

magic_l, num_l = struct.unpack('>II', test_la.read(8))
lab_test = np.fromfile(test_la, dtype=np.int8) #1*10000

batch = 50
epoch=15
hidden_units = 10
classes = 1
a, b = img.T.shape[0:]

img = img.reshape(img.T.shape[0],-1,784)
img_test = img_test.reshape(img_test.T.shape[0],-1,784)
lab = np_utils.to_categorical(lab, 10)
lab_test = np_utils.to_categorical(lab_test, 10)
print(img.shape[0:])
model = Sequential()
model.add(LSTM(40,input_shape =img.shape[1:], batch_size = batch))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer = 'RMSprop', loss='mean_squared_error', metrics = ['accuracy'])
model.fit(img, lab, batch_size = batch,epochs=epoch,verbose=1)


scores = model.evaluate(img_test, lab_test, batch_size=batch)
predictions = model.predict(img_test, batch_size = batch)
print('LSTM test score:', scores[0])
print('LSTM test accuracy:', scores[1])

编辑 2 非常感谢，当我这样做时出现以下错误：

ValueError: Input arrays should have the same number of samples as target arrays. Found 3750 input samples and 60000 target samples.

我知道我也应该重塑输出，但我不知道它应该有什么形状。

Answer 1

时间步代表时间状态，就像从视频中提取的帧一样。传递给 LSTM 的输入的形状应该是 (num_samples,timesteps,input_dim) 的形式。如果你想要 16 个时间步，你应该将你的数据重塑为 (num_samples//timesteps, timesteps, input_dims)

img=img.reshape(3750,16,784)

所以你的batch_size=50，它会一次传递50*16张图片。现在，当您保持 num_samples 不变时，它会拆分您的 input_dim。

编辑： 目标数组将具有与 num_samples 相同的形状，即在您的情况下为 3750。所有时间步将共享相同的标签。您必须决定要如何处理这些 MNIST 序列。您当前的模型将这些序列（不是数字）分类为 10 类.

LSTM MNIST 数据集中的特征和时间步长

Feature and time steps in LSTM MNIST dataset

python

mnist

lstm

keras

tensorflow