三次方程得到高损失

Cubic equation gets high loss

我正在尝试学习一些机器学习,在查阅了一些教程之后,我设法以可接受的精度训练了线性回归和二阶方程。然后我决定提高一个档次并尝试:y = x^3 + 9x^2 .

从现在开始一切正常,但有了这个新设置,我的损失一直保持在 100k 以上,预测偏离了大约 +-100

这是我尝试过的事情的列表:

None 个解决方案有效,每次损失都在 100k 以上。此外,我注意到它并没有稳定下降,由此产生的损失看起来非常随机,从 100k 到 800k,然后再次下降到 400k,然后上升到 100 万,然后再次下降……你只能注意到平均损失在下降但还是很难说随机性

一些例子:

Epoch 832/10000
32/32 [==============================] - 0s 3ms/step - loss: 757260.0625 - val_loss: 624795.0000
Epoch 833/10000
32/32 [==============================] - 0s 3ms/step - loss: 784539.6250 - val_loss: 257286.3906
Epoch 834/10000
32/32 [==============================] - 0s 3ms/step - loss: 481110.4688 - val_loss: 246353.5469
Epoch 835/10000
32/32 [==============================] - 0s 3ms/step - loss: 383954.2812 - val_loss: 508324.5312
Epoch 836/10000
32/32 [==============================] - 0s 3ms/step - loss: 516217.7188 - val_loss: 543258.3750
Epoch 837/10000
32/32 [==============================] - 0s 3ms/step - loss: 1042559.3125 - val_loss: 1702137.1250
Epoch 838/10000
32/32 [==============================] - 0s 3ms/step - loss: 3192045.2500 - val_loss: 1154483.5000
Epoch 839/10000
32/32 [==============================] - 0s 3ms/step - loss: 1195508.7500 - val_loss: 4658847.0000
Epoch 840/10000
32/32 [==============================] - 0s 3ms/step - loss: 1251505.8750 - val_loss: 275300.7188
Epoch 841/10000
32/32 [==============================] - 0s 3ms/step - loss: 294105.2188 - val_loss: 330317.0000
Epoch 842/10000
32/32 [==============================] - 0s 3ms/step - loss: 528083.4375 - val_loss: 4624526.0000
Epoch 843/10000
32/32 [==============================] - 0s 4ms/step - loss: 3371695.2500 - val_loss: 2008547.0000
Epoch 844/10000
32/32 [==============================] - 0s 3ms/step - loss: 723132.8125 - val_loss: 884099.5625
Epoch 845/10000
32/32 [==============================] - 0s 3ms/step - loss: 635335.8750 - val_loss: 372132.1562
Epoch 846/10000
32/32 [==============================] - 0s 3ms/step - loss: 424794.2812 - val_loss: 349575.8438
Epoch 847/10000
32/32 [==============================] - 0s 3ms/step - loss: 266175.3125 - val_loss: 247624.6719
Epoch 848/10000
32/32 [==============================] - 0s 3ms/step - loss: 387106.7500 - val_loss: 1091736.7500

这是我原来的(更干净的)代码:

import tensorflow as tf
import numpy as np
from tensorflow import keras
from time import sleep

model = tf.keras.Sequential([keras.layers.Dense(units=8, activation='relu', input_shape=[1], kernel_regularizer=keras.regularizers.l2(0.001)),
                             keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
                             keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
                             keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
                             keras.layers.Dense(units=1)])

lr = 1e-1
decay = lr/10000

optimizer = keras.optimizers.Adam(lr=lr, decay=decay)
model.compile(optimizer=optimizer, loss='mean_squared_error')

xs = np.random.random((10000, 1)) * 100 - 50;
ys = xs**3 + 9*xs**2 


model.fit(xs, ys, epochs=10000, batch_size=256, validation_split=0.2)

print(model.predict([10.0]))

resp = input('Want to save model? y/n: ')
if resp == 'y':
    model.save('zig-zag')

我还发现 问题报告的解决方案是使用 relu,但我已经实现了,复制代码也没有用。

我错过了什么吗?什么以及为什么?

由于数值原因,神经网络通常不能很好地处理无限大的数字。因此,只需将 x 的值范围从 -50..50 减小到 -5..5 即可训练您的模型。

对于你的情况,你还想删除 l2-regularizer,因为你不能在这里过度拟合并且绝对不会有 1e-5 的衰减。我试了一下 lr=1e-2decay=lr/2

Epoch 1000/1000
32/32 [==============================] - 0s 2ms/step - loss: 0.1471 - val_loss: 0.1370

完整代码:

import tensorflow as tf
import numpy as np
from tensorflow import keras
from time import sleep

model = tf.keras.Sequential([keras.layers.Dense(units=8, activation='relu', input_shape=[1]),
                             keras.layers.Dense(units=8, activation='relu'),
                             keras.layers.Dense(units=8, activation='relu'),
                             keras.layers.Dense(units=8, activation='relu'),
                             keras.layers.Dense(units=1)])

lr = 1e-2
decay = lr/2

optimizer = keras.optimizers.Adam(lr=lr, decay=decay)
model.compile(optimizer=optimizer, loss='mean_squared_error')

xs = np.random.random((10000, 1)) * 10 - 5
ys = xs**3 + 9*xs**2 
print(np.shape(xs))
print(np.shape(ys))

model.fit(xs, ys, epochs=1000, batch_size=256, validation_split=0.2)

print(model.predict([4.0]))