速率层数如何影响我的神经元网络

Question

嗨，我在 java 中创建了一个简单的神经元网络： https://github.com/wutzebaer/Neuronal

它只有3个神经元；输入隐藏输出

当输入 > 0.7 输出应该为 1，否则为 0

问题1：当我将速率设置为 1 时，它似乎发散得很快，当我选择 0.1 时，它没有结果。这是为什么呢，我还以为费率小一点，时间长点就好了。

问题2：为什么这么简单的问题我只能得到 99% 的命中率？神经网络不是完全可以解决的吗？

问题3：每层的神经元数量似乎没有太大影响，但当我选择 2 层或更多层时，结果更差，即使学习了很长时间。为什么？层数不是越多越好吗？

问题4：我的计算正确吗？我将我的值与 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/#comment-17063 进行了比较，它们对于 1 个隐藏层是正确的。但是我不知道我是否正确地抽象了n层

我在这里查看了我的小项目： https://github.com/wutzebaer/Neuronal

代码

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class NeuronNetwork {
    public static void main(String[] args) {
        new NeuronNetwork();
    }

    final Integer layers = 2;
    final Integer hiddenNeuronPerLayer = 10;
    List<InputNeuron> inputNeurons = new ArrayList<InputNeuron>();
    List<OutputNeuron> outputNeurons = new ArrayList<OutputNeuron>();
    List<List<Neuron>> hiddenLayers = new ArrayList<List<Neuron>>();

    Random r = new Random(System.currentTimeMillis());

    public void train(int count) {
        for (int i = 0; i < count; i++) {
            fillTraining();
            updateOutput();
            learn();
        }
    }

    public int test(int count) {
        int correct = 0;
        Double error = 0d;
        for (int i = 0; i < count; i++) {
            fillTraining();
            updateOutput();
            Double calcTotalError = calcTotalError();
            error += Math.abs(calcTotalError);
            if (Math.abs(outputNeurons.get(0).desiredOutput - outputNeurons.get(0).output) < 0.1d) {
                correct++;
            }
        }
        System.out.println((error / (double) count) + " " + correct + " / " + count);
        return correct;
    }

    public NeuronNetwork() {

        System.out.println("staret");
        // add input neurons
        for (int i = 0; i < 1; i++) {
            inputNeurons.add(new InputNeuron(1d));
        }

        // add output neurons
        for (int i = 0; i < 1; i++) {
            outputNeurons.add(new OutputNeuron());
        }

        for (Integer layerIndex = 0; layerIndex < layers; layerIndex++) {
            ArrayList<Neuron> currentHiddenlayer = new ArrayList<Neuron>();
            // add input connections
            for (Integer hiddenNeuronInLayerIndex = 0; hiddenNeuronInLayerIndex < hiddenNeuronPerLayer; hiddenNeuronInLayerIndex++) {
                Neuron hiddenNeuron = new Neuron();
                // add first layer
                if (layerIndex == 0) {
                    for (Neuron input : inputNeurons) {
                        hiddenNeuron.addInput(input);
                    }
                }
                // add inner layers
                else {
                    for (Neuron input : hiddenLayers.get(layerIndex - 1)) {
                        hiddenNeuron.addInput(input);
                    }
                }
                currentHiddenlayer.add(hiddenNeuron);
            }
            hiddenLayers.add(currentHiddenlayer);
        }

        // add all neurons of last hidden layer to each input neuron
        for (Neuron out : outputNeurons) {
            for (Neuron hidden : hiddenLayers.get(hiddenLayers.size() - 1)) {
                out.addInput(hidden);
            }
        }

        for (int i = 0; i < 10000; i++) {
            train(100000);
            if (test(1000) == 1000) {
                test(10000);
                break;
            }
        }

        inputNeurons.get(0).output = 0.0d;

        updateOutput();

        System.out.println("OUTPUT " + outputNeurons.get(0).output);

    }

    public void fillTraining() {
        for (InputNeuron input : inputNeurons) {
            input.output = r.nextDouble();
        }

        if (inputNeurons.get(0).output > 0.7d) {
            outputNeurons.get(0).desiredOutput = 1d;
        } else {
            outputNeurons.get(0).desiredOutput = 0d;
        }

    }

    public Double calcTotalError() {
        Double error = 0d;
        for (OutputNeuron out : outputNeurons) {
            error += Math.pow(out.desiredOutput - out.output, 2);
        }
        error *= 0.5d;
        return error;
    }

    public void updateOutput() {
        for (List<Neuron> layer : hiddenLayers) {
            for (Neuron n : layer) {
                n.updateOutput();
            }
        }
        for (OutputNeuron n : outputNeurons) {
            n.updateOutput();
        }
    }

    public void learn() {
        for (List<Neuron> layer : hiddenLayers) {
            for (Neuron n : layer) {
                n.calcNewW();
            }
        }
        for (OutputNeuron n : outputNeurons) {
            n.calcNewW();
        }
        for (List<Neuron> layer : hiddenLayers) {
            for (Neuron n : layer) {
                n.applyNewW();
            }
        }
        for (OutputNeuron n : outputNeurons) {
            n.applyNewW();
        }
    }

}

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map.Entry;
import java.util.Random;

public class Neuron {
    public static Double rate = 1d;
    public HashSet<Neuron> inputs = new HashSet<Neuron>();
    public HashSet<Neuron> outputs = new HashSet<Neuron>();
    public HashMap<Neuron, Double> weights = new HashMap<Neuron, Double>();
    public HashMap<Neuron, Double> newWeights = new HashMap<Neuron, Double>();
    public Double bias = 0d;
    public Double output = 0d;
    static Random r = new Random(System.currentTimeMillis());

    /**
     * add input with default weight of 0.5
     * 
     * @param i
     */
    public void addInput(Neuron i) {
        addInput(i, r.nextDouble() * 0.1d);
    }

    public void addInput(Neuron i, Double weight) {
        inputs.add(i);
        weights.put(i, weight);
        i.outputs.add(this);
    }

    public void updateOutput() {
        Double sum = bias;
        for (Neuron input : inputs) {
            sum += input.output * weights.get(input);
        }
        output = logistic(sum);
    }

    public static Double logistic(Double sum) {
        return 1d / (1d + Math.exp(-sum));
    }

    protected void calcNewW() {

        // wie wirkt sich der gesamtinput der neurone auf den output der neurone aus?
        // => output der neurone * (1-output der neurone)
        Double wert2 = wieWirktSichDerInputAufDenOutputAus();

        // wie wirkt sich der output der neurone auf den total error aus?
        Double wert3 = wieWirktSichDerOutputAufDenTotalErrorAus();

        for (Entry<Neuron, Double> connection : weights.entrySet()) {
            Neuron input = connection.getKey();
            Double weight = connection.getValue();

            // wie wirkt sich das w auf den input der neurone aus?
            // => output der quellneurone
            Double wert1 = input.output;

            Double result = wert1 * wert2 * wert3;
            newWeights.put(input, weight - rate * result);
        }

        bias -= wert3 * rate;

    }

    protected Double wieWirktSichDerOutputAufDenTotalErrorAus() {
        // => ist zunächst die summe für alle zielneuronen => wie wirkt dich der output auf den fehler der zielneurone aus
        Double wert3 = 0d;
        for (Neuron out : outputs) {
            // ==> wie wirkt sich der netzinput der nächsten neurone auf den fehlerwert der nächsten neurone aus
            // 1. wie wirkt sich der input auf den output aus => output der neurone * (1-output der neurone)
            Double wert3_a_a = out.wieWirktSichDerInputAufDenOutputAus();
            // 2. wie wirkt sich der output auf den fehler aus
            Double wert3_a_b = out.wieWirktSichDerOutputAufDenTotalErrorAus();
            // => rekursoin bis zur out neurone, und dort
            // -(EXPECTED-OUTPUT)

            // ==mal

            // ==> wie wirkt sich der output der neurone auf dern input der nächsten neurone aus => das aktuelle w
            Double wert3_b = out.weights.get(this);

            wert3 += wert3_a_a * wert3_a_b * wert3_b;

        }
        return wert3;
    }

    private double wieWirktSichDerInputAufDenOutputAus() {
        return output * (1d - output);
    }

    public void applyNewW() {
        weights = newWeights;

    }

}

public class InputNeuron extends Neuron {
    public InputNeuron(Double output) {
        this.output = output;
    }
}

public class OutputNeuron extends Neuron {
    public Double desiredOutput = 1d;

    protected Double wieWirktSichDerOutputAufDenTotalErrorAus() {
        return -(desiredOutput - output);
    }

}

Answer 1

Question1: When i set my rate to 1 it seems to divergate fast, when i choose 0.1 it does not come to a result. Why is this, i thought a smaller rate whould just take longer.

较小的速率需要更长的时间，但很难说多长时间。也许您的迭代次数运行不够。

如果 1 太大而 0.1 太小，请尝试 0.2, 0.3, ... 直到找到合适的值，然后再尝试更改迭代次数。

您也可以尝试将 momentum 添加到您的学习中。

Question2: Why to i only get a 99% hit rate for such a simple problem? Is it not toal solveable by a neuronal network?

您可能患有 overfitting。你用什么数据训练你的网络，你用什么测试？

99% 还不错，但您可能可以通过正则化（例如 weight decay）、使用较小的网络（较少的隐藏单元）或使用更多的训练数据来完善它。在您的情况下，训练数据应该很容易生成。

Question3: the amount of neurons per layer does not seem to have much effect, but when i choose 2 or more layers the results are worse, even when learning for a long time. Why? Aren't more layers better?

就像你说的，你的问题很简单。更多层将导致更复杂的网络，这将过度拟合您的简单数据。更大、更强大的网络只会记住你的训练数据，而它在测试数据上的表现会很差。

更深的网络也可能出现其他问题，例如梯度消失和权重过大。不要将深度网络用于这种简单的问题。越大并不总是越好。

速率层数如何影响我的神经元网络

How does rate layer count affect my neuronal network

machine-learning

neural-network