为什么我的 2 层神经网络的二元分类准确率仅为 50%？

Question

在 review post 之后，我构建了这个用于二进制分类的数据集，其中包含 Fashion MNIST T 恤与衬衫。

import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# generate the indices
idx_digit_train = np.argwhere((y_train == 0) | (y_train == 6)).flatten()
idx_digit_test = np.argwhere((y_test == 0) | (y_test == 6)).flatten()

# construct the training set
y_train_mnist = y_train[idx_digit_train].reshape(-1,1)
x_train_mnist = x_train[idx_digit_train].reshape(-1,28*28)

# construct the test set
y_test_mnist = y_test[idx_digit_test].reshape(-1,1)
x_test_mnist = x_test[idx_digit_test].reshape(-1,28*28)

x_train_mnist = x_train_mnist/255.
x_test_mnist = x_test_mnist/255.

y_train_mnist[y_train_mnist==6]=1
y_test_mnist[y_test_mnist==6]=1

y_train_mnist = np.array(y_train_mnist, dtype=np.float32)
y_test_mnist = np.array(y_test_mnist, dtype=np.float32)

改编自另一个 review post，我用 NumPy 从头开始构建了一个 2 层神经网络。

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

class MLPClassifier:
    def __init__(self, eta=.05, n_epoch=10, n_0=2, n_1=2, n_2=1,
          model_w1=[], model_b1=[], model_w2=[], model_b2=[]):
        self.eta = eta
        self.n_epoch = n_epoch
        self.model_w1 = model_w1
        self.model_b1 = model_b1
        self.model_w2 = model_w2
        self.model_b2 = model_b2
        self.n_1 = n_1
        self.n_2 = n_2
        
    def initialize_params(self, n_0, n_1, n_2):
        if len(self.model_w1) == 0:
            self.model_w1 = np.random.random(size=(n_0, n_1))
        if len(self.model_w2) == 0:
            self.model_w2 = np.random.random(size=(n_1, n_2))
        if len(self.model_b1) == 0:
            self.model_b1 = np.random.random(size=(1, n_1))
        if len(self.model_b2) == 0:
            self.model_b2 = np.random.random(size=(1, n_2))
        
    def predict(self, x):
        _, a2 = self.forward_propagation(x)
        yhat = a2 >= 0.5
        return 1*yhat

    def forward_propagation(self, x):
        z1 = np.dot(x, self.model_w1) + self.model_b1
        a1 = sigmoid(z1)
        z2 = np.dot(a1, self.model_w2) + self.model_b2
        a2 = sigmoid(z2)
        return a1, a2
    
    def backward_propagation(self, x, y, a1, a2):
        m = len(x)
        n_1 = self.n_1
        n_2 = self.n_2
        a1 = a1.reshape(m, -1)
        dz2 = a2 - y
        dw2 = np.dot(a1.T, dz2)/m
        dw2 = dw2.reshape(n_1, n_2)
        db2 = np.mean(dz2, keepdims = True)
        dz1 = np.dot(dz2, self.model_w2.T) * (a1*(1-a1))
        dw1 = np.dot(x.T, dz1)/m
        db1 = np.mean(dz1, axis=0)
        return dw2, db2, dw1, db1

    def update_params(self, dw2, db2, dw1, db1):
        self.model_w2 -= self.eta * dw2
        self.model_b2 -= self.eta * db2
        self.model_w1 -= self.eta * dw1
        self.model_b1 -= self.eta * db1

    def fit(self, x, y, verbose=False):
        n_0 = x.shape[-1]
        n_1 = self.n_1
        n_2 = self.n_2
        self.initialize_params(n_0, n_1, n_2)        
        for i in range(self.n_epoch):
            a1, a2 = self.forward_propagation(x)
            dw2, db2, dw1, db1 = self.backward_propagation(x, y, a1, a2)
            self.update_params(dw2, db2, dw1, db1)

根据第一个post，一个逻辑回归模型可以实现

train accuracy: 0.828583

然而，当我训练我的手工艺模型 1、2 时，经过 9 个 epoch 甚至 9999 个 epoch，它在训练集和测试集上的精度都达到了 0.5。

classifier = MLPClassifier(.1, 9999)
classifier.fit(x_train_mnist, y_train_mnist, verbose=False)
acc = np.count_nonzero(np.squeeze(classifier.predict(x_train_mnist)) == np.squeeze(y_train_mnist))

我还尝试了隐藏单元的一系列数字，其中 none 效果更好。

acc_list=[]
for i in np.arange(2, 99):
  for j in range(9):
    classifier = MLPClassifier(.1, j, n_1=i)
    classifier.fit(x_train_mnist, y_train_mnist, verbose=False)
    acc = np.count_nonzero(np.squeeze(classifier.predict(x_train_mnist)) == np.squeeze(y_train_mnist))
    acc_list.append(acc)

与此相反，tensorflow 版本

model = tf.keras.Sequential([
  tf.keras.layers.Dense(2, activation=tf.nn.sigmoid),
  tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

model.compile(loss=tf.keras.losses.binary_crossentropy,
              optimizer='sgd',
              metrics=['accuracy'])

model.fit(x_train_mnist, y_train_mnist, epochs=20, verbose=1)

在 30 个纪元后达到 0.85

时代 30/50 375/375 [==============================] - 1 秒 1 毫秒/步 - 损失：0.3326 - 精度：0.8525

我错过了什么？

如何在不添加更多层的情况下改进我的 2 层手工艺模型？

注意：我的scratch版本也是用的CrossEntropy。

单击 the colab link 以运行在线。

Answer 1

您的代码中存在几个问题。

由于您可能想要实施随机梯度下降 (SGD)，其背后的想法是执行在线训练，这基本上意味着您更新每个训练示例之后的模型参数（相对地，在 batch-SGD 中的每批样本之后）。因此，您的训练循环应如下所示：
```
for i in trange(self.n_epoch):
    for j in range(0, len(x)):
        a1, a2 = self.forward_propagation(x[i])
        dw2, db2, dw1, db1 = self.backward_propagation(x[i], y[i], a1, a2)
        self.update_params(dw2, db2, dw1, db1)
```
您似乎没有在 backward_propagation.
的网络输出端包含 sigmoid 函数的导数此外，通过编写 db2 = np.mean(dz2, keepdims = True)，您正在减少对标量值的偏差更新（这可能不是您想要的）。
此外，您的代码中的数组形状存在一些问题（请注意 np.reshape - 它可能会输出您预期之外的其他内容，我的谦虚建议是 - 除非您真的想重塑数组，否则不要使用它） .
最终，反向传递应该如何工作：
```
def backward_propagation(self, x, y, a1, a2):
    dz2 = (a2 - y) * (a2*(1-a2))
    dw2 = np.dot(a1.T, dz2) / len(a1)
    db2 = dz2/len(dz2)
    dz1 = np.dot(dz2, self.model_w2.T) * (a1 * (1 - a1))
    dw1 = np.dot(np.expand_dims(x, 1), dz1) / len(x)
    db1 = dz1/len(dz1)
    return dw2, db2, dw1, db1
```

最后，仅用正值初始化权重以形成均匀分布并不是最好的主意。一个好的起点是使用标准法线：

def initialize_params(self, n_0, n_1, n_2):
    if len(self.model_w1) == 0:
        self.model_w1 = np.random.randn(n_0, n_1)
    if len(self.model_w2) == 0:
        self.model_w2 = np.random.randn(n_1, n_2)
    if len(self.model_b1) == 0:
        self.model_b1 = np.random.randn(1, n_1)
    if len(self.model_b2) == 0:
        self.model_b2 = np.random.randn(1, n_2)

您可以进一步尝试减少正态分布的标准偏差参数（但不要用零初始化！）。如果您想深入挖掘，请搜索 'Kaming He initialization' 和 'Xavier initialization'.

最后，您可以尝试看看开源深度学习库（即tensorflow或pytorch）的代码。一个人真的可以从中学到很多东西！下面是线性层如何在 torch 中实现的例子：torch.nn.modules.linear

为什么我的 2 层神经网络的二元分类准确率仅为 50%？

Why does my 2-layer neural network achieve just 50% accuracy for binary classification?

python

neural-network