XOR Tensorflow 非收敛

XOR Tensorflow Non-convergence

这是异或的简单 Tensorflow 实现。

知道为什么当 TF 随机种子为 0 时,它不收敛,而当不收敛时,它会收敛吗?如何在不改变网络架构(即保持隐藏层为 Dense(2))并保持随机种子 = 0 的情况下使其收敛? TIA!

import tensorflow as tf
import numpy as np
from tensorflow.keras import Model
from tensorflow.keras.layers import (
    Dense,
    Input,
)

tf.random.set_seed(0) # to reproduce non-convergence
# tf.random.set_seed(1234) # to converges

# XOR
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], "float32")
Y = np.array([[0], [1], [1], [0]], "float32")

x = Input(shape=(2,))
y = Dense(2, activation="sigmoid")(x)
y = Dense(1, activation="sigmoid")(y)
model = Model(inputs=x, outputs=y)
model.compile(loss="mean_squared_error")

class logger(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        if epoch % 1000 == 0:
            print("epoch=", epoch, "loss=%.3f" % logs["loss"])

model.fit(X, Y, epochs=20000, verbose=0, callbacks=[logger()])

随机种子=0时的输出:

epoch= 0 loss=0.255
epoch= 1000 loss=0.235
epoch= 2000 loss=0.190
epoch= 3000 loss=0.154
epoch= 4000 loss=0.137
epoch= 5000 loss=0.130
epoch= 6000 loss=0.127
epoch= 7000 loss=0.126
epoch= 8000 loss=0.125
epoch= 9000 loss=0.125
epoch= 10000 loss=0.125
epoch= 11000 loss=0.125
epoch= 12000 loss=0.125
epoch= 13000 loss=0.125
epoch= 14000 loss=0.125
epoch= 15000 loss=0.125
epoch= 16000 loss=0.125
epoch= 17000 loss=0.125
epoch= 18000 loss=0.125
epoch= 19000 loss=0.125

随机种子 = 1234 时的输出:

epoch= 0 loss=0.275
epoch= 1000 loss=0.234
epoch= 2000 loss=0.186
epoch= 3000 loss=0.118
epoch= 4000 loss=0.059
epoch= 5000 loss=0.024
epoch= 6000 loss=0.008
epoch= 7000 loss=0.003
epoch= 8000 loss=0.001
epoch= 9000 loss=0.000
epoch= 10000 loss=0.000
epoch= 11000 loss=0.000
epoch= 12000 loss=0.000
epoch= 13000 loss=0.000
epoch= 14000 loss=0.000
epoch= 15000 loss=0.000
epoch= 16000 loss=0.000
epoch= 17000 loss=0.000
epoch= 18000 loss=0.000
epoch= 19000 loss=0.000

默认情况下(因为您没有指定),优化器是 "rmsprop",它似乎不能很好地完成这项任务。原因是:我不知道。但是如果你使用 "sgd""tanh" 激活隐藏层,它将起作用:

model.compile(loss="mean_squared_error", optimizer='sgd')
epoch= 0 loss=0.425
epoch= 1000 loss=0.213
epoch= 2000 loss=0.182
epoch= 3000 loss=0.160
epoch= 4000 loss=0.130
epoch= 5000 loss=0.063
epoch= 6000 loss=0.023
epoch= 7000 loss=0.010
epoch= 8000 loss=0.006
epoch= 9000 loss=0.004

您也可以尝试手动设置权重 ;)

step_activation = lambda x: tf.cast(tf.greater_equal(x, 0.5), tf.float32)

x = Input(shape=(2,))
y = Dense(2, activation=step_activation)(x)
y = Dense(1, activation=step_activation)(y)
model = Model(inputs=x, outputs=y, trainable=False)
model.compile(loss="mean_squared_error")

weights = [np.array([[1, 1], [1, 1]]),
           np.array([-1.5, -0.5]),
           np.array([[-1], [1]]),
           np.array([-0.5])]


model.set_weights(weights)

model.evaluate(X, Y)
1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00

Copy/pastable:

import tensorflow as tf

tf.random.set_seed(0)

X = [[0, 0], [0, 1], [1, 0], [1, 1]]
Y = [[0], [1], [1], [0]]

x = tf.keras.layers.Input(shape=(2,))
y = tf.keras.layers.Dense(2, activation="tanh")(x)
y = tf.keras.layers.Dense(1, activation="tanh")(y)
model = tf.keras.Model(inputs=x, outputs=y)
model.compile(loss="mean_squared_error", optimizer='sgd')

history = model.fit(X, Y, epochs=5000)
Epoch 5000/5000
1/1 [==============================] - 0s 998us/step - loss: 0.0630