速率层数如何影响我的神经元网络
How does rate layer count affect my neuronal network
嗨,我在 java 中创建了一个简单的神经元网络:
https://github.com/wutzebaer/Neuronal
它只有3个神经元;输入隐藏输出
当输入 > 0.7 输出应该为 1,否则为 0
问题1:
当我将速率设置为 1 时,它似乎发散得很快,当我选择 0.1 时,它没有结果。这是为什么呢,我还以为费率小一点,时间长点就好了。
问题2:
为什么这么简单的问题我只能得到 99% 的命中率?神经网络不是完全可以解决的吗?
问题3:
每层的神经元数量似乎没有太大影响,但当我选择 2 层或更多层时,结果更差,即使学习了很长时间。为什么?层数不是越多越好吗?
问题4:
我的计算正确吗?我将我的值与 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/#comment-17063 进行了比较,它们对于 1 个隐藏层是正确的。但是我不知道我是否正确地抽象了n层
我在这里查看了我的小项目:
https://github.com/wutzebaer/Neuronal
代码
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class NeuronNetwork {
public static void main(String[] args) {
new NeuronNetwork();
}
final Integer layers = 2;
final Integer hiddenNeuronPerLayer = 10;
List<InputNeuron> inputNeurons = new ArrayList<InputNeuron>();
List<OutputNeuron> outputNeurons = new ArrayList<OutputNeuron>();
List<List<Neuron>> hiddenLayers = new ArrayList<List<Neuron>>();
Random r = new Random(System.currentTimeMillis());
public void train(int count) {
for (int i = 0; i < count; i++) {
fillTraining();
updateOutput();
learn();
}
}
public int test(int count) {
int correct = 0;
Double error = 0d;
for (int i = 0; i < count; i++) {
fillTraining();
updateOutput();
Double calcTotalError = calcTotalError();
error += Math.abs(calcTotalError);
if (Math.abs(outputNeurons.get(0).desiredOutput - outputNeurons.get(0).output) < 0.1d) {
correct++;
}
}
System.out.println((error / (double) count) + " " + correct + " / " + count);
return correct;
}
public NeuronNetwork() {
System.out.println("staret");
// add input neurons
for (int i = 0; i < 1; i++) {
inputNeurons.add(new InputNeuron(1d));
}
// add output neurons
for (int i = 0; i < 1; i++) {
outputNeurons.add(new OutputNeuron());
}
for (Integer layerIndex = 0; layerIndex < layers; layerIndex++) {
ArrayList<Neuron> currentHiddenlayer = new ArrayList<Neuron>();
// add input connections
for (Integer hiddenNeuronInLayerIndex = 0; hiddenNeuronInLayerIndex < hiddenNeuronPerLayer; hiddenNeuronInLayerIndex++) {
Neuron hiddenNeuron = new Neuron();
// add first layer
if (layerIndex == 0) {
for (Neuron input : inputNeurons) {
hiddenNeuron.addInput(input);
}
}
// add inner layers
else {
for (Neuron input : hiddenLayers.get(layerIndex - 1)) {
hiddenNeuron.addInput(input);
}
}
currentHiddenlayer.add(hiddenNeuron);
}
hiddenLayers.add(currentHiddenlayer);
}
// add all neurons of last hidden layer to each input neuron
for (Neuron out : outputNeurons) {
for (Neuron hidden : hiddenLayers.get(hiddenLayers.size() - 1)) {
out.addInput(hidden);
}
}
for (int i = 0; i < 10000; i++) {
train(100000);
if (test(1000) == 1000) {
test(10000);
break;
}
}
inputNeurons.get(0).output = 0.0d;
updateOutput();
System.out.println("OUTPUT " + outputNeurons.get(0).output);
}
public void fillTraining() {
for (InputNeuron input : inputNeurons) {
input.output = r.nextDouble();
}
if (inputNeurons.get(0).output > 0.7d) {
outputNeurons.get(0).desiredOutput = 1d;
} else {
outputNeurons.get(0).desiredOutput = 0d;
}
}
public Double calcTotalError() {
Double error = 0d;
for (OutputNeuron out : outputNeurons) {
error += Math.pow(out.desiredOutput - out.output, 2);
}
error *= 0.5d;
return error;
}
public void updateOutput() {
for (List<Neuron> layer : hiddenLayers) {
for (Neuron n : layer) {
n.updateOutput();
}
}
for (OutputNeuron n : outputNeurons) {
n.updateOutput();
}
}
public void learn() {
for (List<Neuron> layer : hiddenLayers) {
for (Neuron n : layer) {
n.calcNewW();
}
}
for (OutputNeuron n : outputNeurons) {
n.calcNewW();
}
for (List<Neuron> layer : hiddenLayers) {
for (Neuron n : layer) {
n.applyNewW();
}
}
for (OutputNeuron n : outputNeurons) {
n.applyNewW();
}
}
}
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map.Entry;
import java.util.Random;
public class Neuron {
public static Double rate = 1d;
public HashSet<Neuron> inputs = new HashSet<Neuron>();
public HashSet<Neuron> outputs = new HashSet<Neuron>();
public HashMap<Neuron, Double> weights = new HashMap<Neuron, Double>();
public HashMap<Neuron, Double> newWeights = new HashMap<Neuron, Double>();
public Double bias = 0d;
public Double output = 0d;
static Random r = new Random(System.currentTimeMillis());
/**
* add input with default weight of 0.5
*
* @param i
*/
public void addInput(Neuron i) {
addInput(i, r.nextDouble() * 0.1d);
}
public void addInput(Neuron i, Double weight) {
inputs.add(i);
weights.put(i, weight);
i.outputs.add(this);
}
public void updateOutput() {
Double sum = bias;
for (Neuron input : inputs) {
sum += input.output * weights.get(input);
}
output = logistic(sum);
}
public static Double logistic(Double sum) {
return 1d / (1d + Math.exp(-sum));
}
protected void calcNewW() {
// wie wirkt sich der gesamtinput der neurone auf den output der neurone aus?
// => output der neurone * (1-output der neurone)
Double wert2 = wieWirktSichDerInputAufDenOutputAus();
// wie wirkt sich der output der neurone auf den total error aus?
Double wert3 = wieWirktSichDerOutputAufDenTotalErrorAus();
for (Entry<Neuron, Double> connection : weights.entrySet()) {
Neuron input = connection.getKey();
Double weight = connection.getValue();
// wie wirkt sich das w auf den input der neurone aus?
// => output der quellneurone
Double wert1 = input.output;
Double result = wert1 * wert2 * wert3;
newWeights.put(input, weight - rate * result);
}
bias -= wert3 * rate;
}
protected Double wieWirktSichDerOutputAufDenTotalErrorAus() {
// => ist zunächst die summe für alle zielneuronen => wie wirkt dich der output auf den fehler der zielneurone aus
Double wert3 = 0d;
for (Neuron out : outputs) {
// ==> wie wirkt sich der netzinput der nächsten neurone auf den fehlerwert der nächsten neurone aus
// 1. wie wirkt sich der input auf den output aus => output der neurone * (1-output der neurone)
Double wert3_a_a = out.wieWirktSichDerInputAufDenOutputAus();
// 2. wie wirkt sich der output auf den fehler aus
Double wert3_a_b = out.wieWirktSichDerOutputAufDenTotalErrorAus();
// => rekursoin bis zur out neurone, und dort
// -(EXPECTED-OUTPUT)
// ==mal
// ==> wie wirkt sich der output der neurone auf dern input der nächsten neurone aus => das aktuelle w
Double wert3_b = out.weights.get(this);
wert3 += wert3_a_a * wert3_a_b * wert3_b;
}
return wert3;
}
private double wieWirktSichDerInputAufDenOutputAus() {
return output * (1d - output);
}
public void applyNewW() {
weights = newWeights;
}
}
public class InputNeuron extends Neuron {
public InputNeuron(Double output) {
this.output = output;
}
}
public class OutputNeuron extends Neuron {
public Double desiredOutput = 1d;
protected Double wieWirktSichDerOutputAufDenTotalErrorAus() {
return -(desiredOutput - output);
}
}
Question1: When i set my rate to 1 it seems to divergate fast, when i choose 0.1 it does not come to a result. Why is this, i thought a smaller rate whould just take longer.
较小的速率需要更长的时间,但很难说多长时间。也许您的迭代次数 运行 不够。
如果 1
太大而 0.1
太小,请尝试 0.2, 0.3, ...
直到找到合适的值,然后再尝试更改迭代次数。
您也可以尝试将 momentum 添加到您的学习中。
Question2: Why to i only get a 99% hit rate for such a simple problem? Is it not toal solveable by a neuronal network?
您可能患有 overfitting。你用什么数据训练你的网络,你用什么测试?
99%
还不错,但您可能可以通过正则化(例如 weight decay)、使用较小的网络(较少的隐藏单元)或使用更多的训练数据来完善它。在您的情况下,训练数据应该很容易生成。
Question3: the amount of neurons per layer does not seem to have much effect, but when i choose 2 or more layers the results are worse, even when learning for a long time. Why? Aren't more layers better?
就像你说的,你的问题很简单。更多层将导致更复杂的网络,这将过度拟合您的简单数据。更大、更强大的网络只会记住你的训练数据,而它在测试数据上的表现会很差。
更深的网络也可能出现其他问题,例如梯度消失和权重过大。不要将深度网络用于这种简单的问题。越大并不总是越好。
嗨,我在 java 中创建了一个简单的神经元网络: https://github.com/wutzebaer/Neuronal
它只有3个神经元;输入隐藏输出
当输入 > 0.7 输出应该为 1,否则为 0
问题1: 当我将速率设置为 1 时,它似乎发散得很快,当我选择 0.1 时,它没有结果。这是为什么呢,我还以为费率小一点,时间长点就好了。
问题2: 为什么这么简单的问题我只能得到 99% 的命中率?神经网络不是完全可以解决的吗?
问题3: 每层的神经元数量似乎没有太大影响,但当我选择 2 层或更多层时,结果更差,即使学习了很长时间。为什么?层数不是越多越好吗?
问题4: 我的计算正确吗?我将我的值与 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/#comment-17063 进行了比较,它们对于 1 个隐藏层是正确的。但是我不知道我是否正确地抽象了n层
我在这里查看了我的小项目: https://github.com/wutzebaer/Neuronal
代码
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class NeuronNetwork {
public static void main(String[] args) {
new NeuronNetwork();
}
final Integer layers = 2;
final Integer hiddenNeuronPerLayer = 10;
List<InputNeuron> inputNeurons = new ArrayList<InputNeuron>();
List<OutputNeuron> outputNeurons = new ArrayList<OutputNeuron>();
List<List<Neuron>> hiddenLayers = new ArrayList<List<Neuron>>();
Random r = new Random(System.currentTimeMillis());
public void train(int count) {
for (int i = 0; i < count; i++) {
fillTraining();
updateOutput();
learn();
}
}
public int test(int count) {
int correct = 0;
Double error = 0d;
for (int i = 0; i < count; i++) {
fillTraining();
updateOutput();
Double calcTotalError = calcTotalError();
error += Math.abs(calcTotalError);
if (Math.abs(outputNeurons.get(0).desiredOutput - outputNeurons.get(0).output) < 0.1d) {
correct++;
}
}
System.out.println((error / (double) count) + " " + correct + " / " + count);
return correct;
}
public NeuronNetwork() {
System.out.println("staret");
// add input neurons
for (int i = 0; i < 1; i++) {
inputNeurons.add(new InputNeuron(1d));
}
// add output neurons
for (int i = 0; i < 1; i++) {
outputNeurons.add(new OutputNeuron());
}
for (Integer layerIndex = 0; layerIndex < layers; layerIndex++) {
ArrayList<Neuron> currentHiddenlayer = new ArrayList<Neuron>();
// add input connections
for (Integer hiddenNeuronInLayerIndex = 0; hiddenNeuronInLayerIndex < hiddenNeuronPerLayer; hiddenNeuronInLayerIndex++) {
Neuron hiddenNeuron = new Neuron();
// add first layer
if (layerIndex == 0) {
for (Neuron input : inputNeurons) {
hiddenNeuron.addInput(input);
}
}
// add inner layers
else {
for (Neuron input : hiddenLayers.get(layerIndex - 1)) {
hiddenNeuron.addInput(input);
}
}
currentHiddenlayer.add(hiddenNeuron);
}
hiddenLayers.add(currentHiddenlayer);
}
// add all neurons of last hidden layer to each input neuron
for (Neuron out : outputNeurons) {
for (Neuron hidden : hiddenLayers.get(hiddenLayers.size() - 1)) {
out.addInput(hidden);
}
}
for (int i = 0; i < 10000; i++) {
train(100000);
if (test(1000) == 1000) {
test(10000);
break;
}
}
inputNeurons.get(0).output = 0.0d;
updateOutput();
System.out.println("OUTPUT " + outputNeurons.get(0).output);
}
public void fillTraining() {
for (InputNeuron input : inputNeurons) {
input.output = r.nextDouble();
}
if (inputNeurons.get(0).output > 0.7d) {
outputNeurons.get(0).desiredOutput = 1d;
} else {
outputNeurons.get(0).desiredOutput = 0d;
}
}
public Double calcTotalError() {
Double error = 0d;
for (OutputNeuron out : outputNeurons) {
error += Math.pow(out.desiredOutput - out.output, 2);
}
error *= 0.5d;
return error;
}
public void updateOutput() {
for (List<Neuron> layer : hiddenLayers) {
for (Neuron n : layer) {
n.updateOutput();
}
}
for (OutputNeuron n : outputNeurons) {
n.updateOutput();
}
}
public void learn() {
for (List<Neuron> layer : hiddenLayers) {
for (Neuron n : layer) {
n.calcNewW();
}
}
for (OutputNeuron n : outputNeurons) {
n.calcNewW();
}
for (List<Neuron> layer : hiddenLayers) {
for (Neuron n : layer) {
n.applyNewW();
}
}
for (OutputNeuron n : outputNeurons) {
n.applyNewW();
}
}
}
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map.Entry;
import java.util.Random;
public class Neuron {
public static Double rate = 1d;
public HashSet<Neuron> inputs = new HashSet<Neuron>();
public HashSet<Neuron> outputs = new HashSet<Neuron>();
public HashMap<Neuron, Double> weights = new HashMap<Neuron, Double>();
public HashMap<Neuron, Double> newWeights = new HashMap<Neuron, Double>();
public Double bias = 0d;
public Double output = 0d;
static Random r = new Random(System.currentTimeMillis());
/**
* add input with default weight of 0.5
*
* @param i
*/
public void addInput(Neuron i) {
addInput(i, r.nextDouble() * 0.1d);
}
public void addInput(Neuron i, Double weight) {
inputs.add(i);
weights.put(i, weight);
i.outputs.add(this);
}
public void updateOutput() {
Double sum = bias;
for (Neuron input : inputs) {
sum += input.output * weights.get(input);
}
output = logistic(sum);
}
public static Double logistic(Double sum) {
return 1d / (1d + Math.exp(-sum));
}
protected void calcNewW() {
// wie wirkt sich der gesamtinput der neurone auf den output der neurone aus?
// => output der neurone * (1-output der neurone)
Double wert2 = wieWirktSichDerInputAufDenOutputAus();
// wie wirkt sich der output der neurone auf den total error aus?
Double wert3 = wieWirktSichDerOutputAufDenTotalErrorAus();
for (Entry<Neuron, Double> connection : weights.entrySet()) {
Neuron input = connection.getKey();
Double weight = connection.getValue();
// wie wirkt sich das w auf den input der neurone aus?
// => output der quellneurone
Double wert1 = input.output;
Double result = wert1 * wert2 * wert3;
newWeights.put(input, weight - rate * result);
}
bias -= wert3 * rate;
}
protected Double wieWirktSichDerOutputAufDenTotalErrorAus() {
// => ist zunächst die summe für alle zielneuronen => wie wirkt dich der output auf den fehler der zielneurone aus
Double wert3 = 0d;
for (Neuron out : outputs) {
// ==> wie wirkt sich der netzinput der nächsten neurone auf den fehlerwert der nächsten neurone aus
// 1. wie wirkt sich der input auf den output aus => output der neurone * (1-output der neurone)
Double wert3_a_a = out.wieWirktSichDerInputAufDenOutputAus();
// 2. wie wirkt sich der output auf den fehler aus
Double wert3_a_b = out.wieWirktSichDerOutputAufDenTotalErrorAus();
// => rekursoin bis zur out neurone, und dort
// -(EXPECTED-OUTPUT)
// ==mal
// ==> wie wirkt sich der output der neurone auf dern input der nächsten neurone aus => das aktuelle w
Double wert3_b = out.weights.get(this);
wert3 += wert3_a_a * wert3_a_b * wert3_b;
}
return wert3;
}
private double wieWirktSichDerInputAufDenOutputAus() {
return output * (1d - output);
}
public void applyNewW() {
weights = newWeights;
}
}
public class InputNeuron extends Neuron {
public InputNeuron(Double output) {
this.output = output;
}
}
public class OutputNeuron extends Neuron {
public Double desiredOutput = 1d;
protected Double wieWirktSichDerOutputAufDenTotalErrorAus() {
return -(desiredOutput - output);
}
}
Question1: When i set my rate to 1 it seems to divergate fast, when i choose 0.1 it does not come to a result. Why is this, i thought a smaller rate whould just take longer.
较小的速率需要更长的时间,但很难说多长时间。也许您的迭代次数 运行 不够。
如果 1
太大而 0.1
太小,请尝试 0.2, 0.3, ...
直到找到合适的值,然后再尝试更改迭代次数。
您也可以尝试将 momentum 添加到您的学习中。
Question2: Why to i only get a 99% hit rate for such a simple problem? Is it not toal solveable by a neuronal network?
您可能患有 overfitting。你用什么数据训练你的网络,你用什么测试?
99%
还不错,但您可能可以通过正则化(例如 weight decay)、使用较小的网络(较少的隐藏单元)或使用更多的训练数据来完善它。在您的情况下,训练数据应该很容易生成。
Question3: the amount of neurons per layer does not seem to have much effect, but when i choose 2 or more layers the results are worse, even when learning for a long time. Why? Aren't more layers better?
就像你说的,你的问题很简单。更多层将导致更复杂的网络,这将过度拟合您的简单数据。更大、更强大的网络只会记住你的训练数据,而它在测试数据上的表现会很差。
更深的网络也可能出现其他问题,例如梯度消失和权重过大。不要将深度网络用于这种简单的问题。越大并不总是越好。