为什么要为结构化数据使用循环神经网络?

Why Bother With Recurrent Neural Networks For Structured Data?

我一直在使用 [instances, time, features] 形状的结构化数据在 Keras 中开发前馈神经网络 (FNN) 和循环神经网络 (RNN),并且 FNN 和 RNN 的性能一直相同(除了RNN 需要更多的计算时间)。

我还模拟了表格数据(下面的代码),我希望 RNN 优于 FNN,因为系列中的下一个值取决于系列中的前一个值;但是,两种架构都可以正确预测。

对于 NLP 数据,我发现 RNN 优于 FNN,但对于表格数据则不然。一般来说,人们什么时候会期望 RNN 在表格数据方面优于 FNN?具体来说,有人 post 可以用表格数据模拟代码来证明 RNN 优于 FNN 吗?

谢谢!如果我的模拟代码不适合我的问题,请修改它或分享更理想的代码!

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt

在 10 个时间步长上模拟了两个特征,其中第二个特征的值取决于前一个时间步长中两个特征的值。

## Simulate data.

np.random.seed(20180825)

X = np.random.randint(50, 70, size = (11000, 1)) / 100

X = np.concatenate((X, X), axis = 1)

for i in range(10):

    X_next = np.random.randint(50, 70, size = (11000, 1)) / 100

    X = np.concatenate((X, X_next, (0.50 * X[:, -1].reshape(len(X), 1)) 
        + (0.50 * X[:, -2].reshape(len(X), 1))), axis = 1)

print(X.shape)

## Training and validation data.

split = 10000

Y_train = X[:split, -1:].reshape(split, 1)
Y_valid = X[split:, -1:].reshape(len(X) - split, 1)
X_train = X[:split, :-2]
X_valid = X[split:, :-2]

print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)

FNN:

## FNN model.

# Define model.

network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))

# Compile model.

network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_fnn = network_fnn.fit(X_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
    validation_data = (X_valid, Y_valid))

plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

LSTM:

## LSTM model.

X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1] // 2, 2)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1] // 2, 2)

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
    validation_data = (X_lstm_valid, Y_valid))

plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

在实践中,即使在 NLP 中,您也会看到 RNN 和 CNN 通常具有竞争力。 Here's 2017 年的评论论文更详细地展示了这一点。从理论上讲,RNN 可能可以更好地处理语言的全部复杂性和顺序性质,但在实践中,更大的障碍通常是正确训练网络,而 RNN 很挑剔。

另一个可能有机会解决的问题是查看诸如平衡括号问题之类的问题(字符串中只有括号或括号与其他干扰字符一起)。这需要按顺序处理输入并跟踪某些状态,使用 LSTM 比使用 FFN 可能更容易学习。

更新: 一些看起来顺序的数据实际上可能不需要顺序处理。例如,即使您提供要添加的数字序列,因为加法是可交换的,FFN 也可以和 RNN 一样好。这也可能适用于许多健康问题,其中主要信息不是连续性的。假设每年测量一位患者的吸烟习惯。从行为的角度来看,轨迹很重要,但如果您预测患者是否会患肺癌,则预测将仅由患者吸烟的年数决定(FFN 可能仅限于过去 10 年)。

所以你想让玩具问题更复杂,并且需要考虑数据的顺序。也许是某种模拟时间序列,您想要预测数据中是否存在尖峰,但您不关心绝对值,只关心尖峰的相对性质。

更新2

我修改了您的代码以展示 RNN 表现更好的案例。诀窍是使用比 FFN 更自然地在 LSTM 中建模的更复杂的条件逻辑。代码如下。对于 8 列,我们看到 FFN 在 1 分钟内训练并达到 6.3 的验证损失。 LSTM 的训练时间延长了 3 倍,但它的最终验证损失降低了 6 倍,为 1.06。

随着我们增加列数,LSTM 的优势越来越大,特别是如果我们添加更复杂的条件。对于 16 列,FFN 验证损失为 19(您可以更清楚地看到训练曲线为该模型无法立即拟合数据)。相比之下,LSTM 的训练时间要长 11 倍,但验证损失为 0.31,比 FFN 小 30 倍!您可以尝试更大的矩阵,看看这种趋势会延伸多远。

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import time

matplotlib.use('Agg')

np.random.seed(20180908)

rows = 20500
cols = 10

# Randomly generate Z
Z = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))

larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))
larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))
smaller = np.min((larger, larger2), axis=0)
# Z is now the max of the first half of the array.
Z = np.append(Z, larger, axis=1)
# Z is now the min of the max of each half of the array.
# Z = np.append(Z, smaller, axis=1)

# Combine and shuffle.

#Z = np.concatenate((Z_sum, Z_avg), axis = 0)

np.random.shuffle(Z)

## Training and validation data.

split = 10000

X_train = Z[:split, :-1]
X_valid = Z[split:, :-1]
Y_train = Z[:split, -1:].reshape(split, 1)
Y_valid = Z[split:, -1:].reshape(rows - split, 1)

print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)

print("Now setting up the FNN")

## FNN model.

tick = time.time()

# Define model.

network_fnn = models.Sequential()
network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))

# Compile model.

network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
    validation_data = (X_valid, Y_valid))

tock = time.time()

print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')

print("Now evaluating the FNN")

loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)
print("train loss: ", loss_fnn[-1])
print("validation loss: ", val_loss_fnn[-1])

plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()
plt.show()

plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training points')
plt.show()

plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('valid points')
plt.show()

print("LSTM")

## LSTM model.

X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)

tick = time.time()

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
    validation_data = (X_lstm_valid, Y_valid))

tock = time.time()

print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')

print("now eval")

loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)
print("train loss: ", loss_lstm[-1])
print("validation loss: ", val_loss_lstm[-1])

plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()
plt.show()

plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training')
plt.show()

plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title("validation")
plt.show()