为什么我的 RNN 学习将所有输入分类为 2 种可能分类中的一种?

Why is my RNN learning to classify all inputs as only 1 of 2 possible classifications?

我已经编写了我的第一个 RNN TensorFlow 实现,它将递增或递减的随机序列作为输入。训练标签是对应于每个序列的单个整数,其中 1 是递增序​​列,0 是递减序列。当我的模型训练时,它很快倾向于将每个序列分类为递减序列,我不明白为什么。这是我的代码:

from __future__ import print_function
import tensorflow as tf
from tensorflow.contrib import rnn
import random

sequenceLength = 5     # Input Dimension
maxNum = 1000          # Must be >= than (sequenceLength - 1)
outputDim = 1
hiddenDim = 16
learningRate = 0.1
trainingIterations = 10000
batchSize = 10
displayStep = 1000

def generateData():
    data = []
    labels = []
    for _ in range(batchSize):
        type = (1 if random.random() < 0.5 else 0)
        temp = []
        if type == 1:
            labels.append([1])
            temp.append(random.randint(0, maxNum - sequenceLength + 1))
            for i in range(1, sequenceLength):
                temp.append(random.randint(temp[i - 1] + 1, maxNum - sequenceLength + i + 1))
            data.append(temp)
        if type == 0:
            labels.append([0])
            temp.append(random.randint(0 + sequenceLength - 1, maxNum))
            for i in range(1, sequenceLength):
                temp.append(random.randint( 0 + sequenceLength - i - 1, temp[i - 1] - 1))
            data.append(temp)
    return data, labels

x = tf.placeholder(tf.float32, [batchSize, sequenceLength], name="input")
y = tf.placeholder(tf.float32, [batchSize, outputDim], name="label")

W = tf.Variable(tf.random_normal([hiddenDim, outputDim]))
b = tf.Variable(tf.random_normal([outputDim]))

cell = rnn.BasicRNNCell(hiddenDim)
outputs, states = tf.nn.static_rnn(cell, [x], dtype=tf.float32)
prediction = tf.sigmoid(tf.matmul(outputs[0], W + b))

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=prediction, labels=y))
optimiser = tf.train.AdamOptimizer(learning_rate=learningRate).minimize(loss)

correctPrediction = tf.equal(tf.round(prediction), y)
accuracy = tf.reduce_mean(tf.cast(correctPrediction, tf.float32))

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for i in range(trainingIterations):
        batchX, batchY = generateData()
        dict = {x: batchX, y : batchY}
        session.run(optimiser, feed_dict=dict)
        if i % displayStep == 0:
            print("Predictions:\t" + str(session.run(tf.transpose(tf.round(prediction)), dict)))
            print("Labels:\t\t" + str(session.run(tf.transpose(y), dict)) + "\n")
        #     batchAccuracy = session.run(accuracy, feed_dict=dict)
        #     batchLoss = session.run(loss, feed_dict=dict)
        #     print("Iteration: " + str(i) + "\nAccuracy: " + str(batchAccuracy) + "\nLoss: " + str(batchLoss) + "\n")

正如我所说,这是我第一次使用 TensorFlow 实现,因此,尽管我很清楚 RNN 的工作原理,但我仍然对我们与 TensorFlow 交互的高级抽象感到迷惑。我最不确定的是 predictionlosscorrectPredictionaccuracy 的计算。我两次使用 sigmoid 函数的方式可以吗?一次为我的预测产生一个概率,然后再次计算我的预测(作为概率)和标签之间的交叉熵。

编辑

我刚刚注意到,在极少数情况下,在不更改任何代码的情况下,RNN 可以快速学会正确分类序列。

你的学习率太大了。我将学习率降低到

learningRate = 0.01

另外这里不需要应用sigmoid

prediction = tf.sigmoid(tf.matmul(outputs[0], W + b))

因为你的损失已经包含了 sigmoid:

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=prediction, labels=y))

我修改了你的代码,进行了上述更改(以及在打印部分进行了一些额外的更改,以调整格式),并得到了以下输出(你可以看到从第二次打印开始预测变得完美段):

Predictions:    [[ 0.  1.  0.  0.  0.  0.  0.  1.  0.  1.]]
Labels:     [[ 1.  0.  1.  1.  1.  0.  1.  0.  0.  0.]]

Iteration: 0
Accuracy: 0.2
Loss: 3.27201

Predictions:    [[ 0.  1.  0.  0.  1.  1.  0.  0.  0.  0.]]
Labels:     [[ 0.  1.  0.  0.  1.  1.  0.  0.  0.  0.]]

Iteration: 1000
Accuracy: 1.0
Loss: 0.000647951

Predictions:    [[ 0.  1.  1.  1.  1.  1.  0.  1.  0.  1.]]
Labels:     [[ 0.  1.  1.  1.  1.  1.  0.  1.  0.  1.]]

Iteration: 2000
Accuracy: 1.0
Loss: 0.000801496

Predictions:    [[ 1.  0.  1.  1.  0.  0.  1.  0.  1.  0.]]
Labels:     [[ 1.  0.  1.  1.  0.  0.  1.  0.  1.  0.]]

Iteration: 3000
Accuracy: 1.0
Loss: 0.000515367

Predictions:    [[ 1.  1.  1.  1.  1.  1.  1.  0.  0.  0.]]
Labels:     [[ 1.  1.  1.  1.  1.  1.  1.  0.  0.  0.]]

Iteration: 4000
Accuracy: 1.0
Loss: 0.000312456

Predictions:    [[ 0.  0.  0.  0.  1.  0.  0.  1.  0.  0.]]
Labels:     [[ 0.  0.  0.  0.  1.  0.  0.  1.  0.  0.]]

Iteration: 5000
Accuracy: 1.0
Loss: 5.86302e-05

Predictions:    [[ 1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]]
Labels:     [[ 1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]]

Iteration: 6000
Accuracy: 1.0
Loss: 5.79187e-05

Predictions:    [[ 1.  0.  0.  1.  1.  0.  1.  0.  0.  1.]]
Labels:     [[ 1.  0.  0.  1.  1.  0.  1.  0.  0.  1.]]

Iteration: 7000
Accuracy: 1.0
Loss: 0.000136576

Predictions:    [[ 1.  0.  1.  1.  0.  0.  1.  1.  0.  1.]]
Labels:     [[ 1.  0.  1.  1.  0.  0.  1.  1.  0.  1.]]

Iteration: 8000
Accuracy: 1.0
Loss: 4.11543e-05

Predictions:    [[ 0.  1.  0.  0.  0.  0.  0.  1.  0.  0.]]
Labels:     [[ 0.  1.  0.  0.  0.  0.  0.  1.  0.  0.]]

Iteration: 9000
Accuracy: 1.0
Loss: 7.28511e-06

修改后的代码如下:

from __future__ import print_function
import tensorflow as tf
from tensorflow.contrib import rnn
import random

sequenceLength = 5     # Input Dimension
maxNum = 1000          # Must be >= than (sequenceLength - 1)
outputDim = 1
hiddenDim = 16
learningRate = 0.01
trainingIterations = 10000
batchSize = 10
displayStep = 1000

def generateData():
    data = []
    labels = []
    for _ in range(batchSize):
        type = (1 if random.random() < 0.5 else 0)
        temp = []
        if type == 1:
            labels.append([1])
            temp.append(random.randint(0, maxNum - sequenceLength + 1))
            for i in range(1, sequenceLength):
                temp.append(random.randint(temp[i - 1] + 1, maxNum - sequenceLength + i + 1))
            data.append(temp)
        if type == 0:
            labels.append([0])
            temp.append(random.randint(0 + sequenceLength - 1, maxNum))
            for i in range(1, sequenceLength):
                temp.append(random.randint( 0 + sequenceLength - i - 1, temp[i - 1] - 1))
            data.append(temp)
    return data, labels

x = tf.placeholder(tf.float32, [batchSize, sequenceLength], name="input")
y = tf.placeholder(tf.float32, [batchSize, outputDim], name="label")

W = tf.Variable(tf.random_normal([hiddenDim, outputDim]))
b = tf.Variable(tf.random_normal([outputDim]))

cell = rnn.BasicRNNCell(hiddenDim)
outputs, states = tf.nn.static_rnn(cell, [x], dtype=tf.float32)
prediction = tf.matmul(outputs[0], W + b)

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=prediction, labels=y))
optimiser = tf.train.AdamOptimizer(learning_rate=learningRate).minimize(loss)

correctPrediction = tf.equal(tf.round(tf.sigmoid(prediction)), y)
accuracy = tf.reduce_mean(tf.cast(correctPrediction, tf.float32))

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for i in range(trainingIterations):
        batchX, batchY = generateData()
        dict = {x: batchX, y : batchY}
        session.run(optimiser, feed_dict=dict)
        if i % displayStep == 0:
            print("Predictions:\t" + str(session.run(tf.transpose(tf.round(tf.sigmoid(prediction))), dict)))
            print("Labels:\t\t" + str(session.run(tf.transpose(y), dict)) + "\n")
            batchAccuracy = session.run(accuracy, feed_dict=dict)
            batchLoss = session.run(loss, feed_dict=dict)
            print("Iteration: " + str(i) + "\nAccuracy: " + str(batchAccuracy) + "\nLoss: " + str(batchLoss) + "\n")