Tensorflow 2 自定义损失 return nan

Tensorflow 2 custom loss return nan

我有一个模型,我使用binary_crossentropy编译它,训练过程很顺利,打印出损失。

model = MyModel()
model.compile(optimizer="adadelta", loss="binary_crossentropy")

data1, data2 = get_random_data(4, 3) # this method return data1:(1000,4),data2:(1000,3)
model.fit([data1, data2], y, batch_size=4)

然后我写了一个自定义的损失函数,损失就变成了nan

import tensorflow.keras.backend as K

class MyModel():
    ...
    def batch_loss(self, y_true, y_pred_batch):
        bottom = K.sum(K.exp(y_pred_batch))
        batch_softmax = K.exp(y_pred_batch) / bottom
        batch_log_likelihood = K.log(batch_softmax)
        loss = K.sum(batch_log_likelihood)
        return loss

model.compile(optimizer="adadelta", loss=model.batch_loss) # change above compile code to this

我使用 batch_loss(tf.ones((1,))) 来测试我的损失函数,似乎 return 是正确的结果。

但是运行加上训练就变成了nan,我应该从哪里开始调试呢?


型号及数据代码(需要复现者):

class MyModel(tf.keras.models.Model):
    def __init__(self):
        super().__init__()
        self.t1A = tf.keras.layers.Dense(300, activation='relu', input_dim=1)
        self.t1B = tf.keras.layers.Dense(300, activation='relu', input_dim=1)
        self.t1v = tf.keras.layers.Dense(128, activation='relu')
        self.t2A = tf.keras.layers.Dense(300, activation='relu')
        self.t2B = tf.keras.layers.Dense(300, activation='relu')
        self.t2v = tf.keras.layers.Dense(128, activation='relu')
        self.out = tf.keras.layers.Dot(axes=1)

    def call(self, inputs, training=None, mask=None):
        u, i = inputs[0], inputs[1]
        u = self.t1A(u)
        u = self.t1B(u)
        u = self.t1v(u)
        i = self.t2A(i)
        i = self.t2B(i)
        i = self.t2v(i)
        out = self.out([u, i])
        return out

def get_random_data(user_feature_num, item_feature_num):
    def get_random_ndarray(data_size, dis_list, feature_num):
        data_list = []
        for i in range(feature_num):
            arr = np.random.randint(dis_list[i], size=data_size)
            data_list.append(arr)
        data = np.array(data_list)
        return np.transpose(data, axes=(1, 0))
    uf_dis, if_dis, data_size = [1000, 2, 10, 20], [10000, 50, 60], 1000
    y = np.zeros(data_size)
    for i in range(int(data_size/10)):
        y[i] = 1

    return get_random_ndarray(data_size, uf_dis, feature_num=user_feature_num), \
        get_random_ndarray(data_size, if_dis, feature_num=item_feature_num), y

我认为你的错误是调用exp()造成的。此函数快速增长并且 returns nan.

你的模型输出的值很大。结合函数中对 tf.exp 的调用,值会迅速增长到 nan。您可能会考虑应用像 sigmoid 这样的激活函数来将值保持在 0 和 1 之间。