神经网络中输出层的矢量化公式
Vectorized formula for output layer in a neural network
我有一个神经网络,想用训练好的神经网络求解一组测试数据。我苦苦挣扎的是为隐藏层和输出层编写公式。我的目标是拥有一个矢量化公式,但我也很乐意实现循环变化。
现在我相信我有正确的隐藏层公式,输出层只需要一个,但如果有人确认它是矢量化公式,我将不胜感激。
% Variables
% Xtest test training data
% thetah - trained weights for inputs to hidden layer
% thetao - trained weights for hidden layer to outputs
% ytest - output
htest = (1 ./ (1 + exp(-(thetah * Xtest'))))' ; % FORMULA FOR HIDDEN LAYER
ytest = ones(mtest, num_outputs) ; % FORMULA FOR OUTPUT LAYER
假设您的 Xtest
具有维度 N by M
,其中 N 是示例数,M 是特征数,thetah
是一个 M by H1
矩阵,其中 H1是第一层中隐藏层的数量,thetao
是一个 H1 by O
矩阵,其中 O 是输出的数量 类 您执行以下操作:
a1 = Xtest * thetah;
z1 = 1 / (1 + exp(-a1)); %Assuming you are using sigmoid units
a2 = z1 * thetao;
z2 = softmax(a2);
阅读有关 softmax 的更多信息 here。
您可以在下面找到前向传播的矢量化和循环实现。由于不同的符号和您在矩阵中存储数据的方式,您的输入数据可能必须适应下面的代码。
您需要在输入层和隐藏层都添加偏置单元。
为了简化实现和调试的工作,我从开源中获取了一些数据machine learning repository and trained the network for the wine classification task。
网络对输入数据的分离率为97.7%
代码如下:
function [] = nn_fp()
load('Xtest.mat'); %input data 178x13
load('y.mat'); %output data 178x1
load('thetah.mat'); %Parameters of the hidden layer 15x14
load('thetao.mat'); %Parameters of the output layer 3x16
predict_simple(Xtest, y, thetah, thetao);
predict_vectorized(Xtest, y, thetah, thetao);
end
function predict_simple(Xtest, y, thetah, thetao)
mtest = size(Xtest, 1); %number of input examples
n = size(Xtest, 2); %number of features
hl_size = size(thetah, 1); %size of the hidden layer (without the bias unit)
num_outputs = size(thetao, 1); %size of the output layer
%add a bias unit to the input layer
a1 = [ones(mtest, 1) Xtest]; %[mtest x (n+1)]
%compute activations of the hidden layer
z2 = zeros(mtest, hl_size); %[mtest x hl_size]
a2 = zeros(mtest, hl_size); %[mtest x hl_size]
for i=1:mtest
for j=1:hl_size
for k=1:n+1
z2(i, j) = z2(i, j) + a1(i, k)*thetah(j, k);
end
a2(i, j) = sigmoid_simple(z2(i, j));
end
end
%add a bias unit to the hidden layer
a2 = [ones(mtest, 1) a2]; %[mtest x (hl_size+1)]
%compute activations of the output layer
z3 = zeros(mtest, num_outputs); %[mtest x num_outputs]
h = zeros(mtest, num_outputs); %[mtest x num_outputs]
for i=1:mtest
for j=1:num_outputs
for k=1:hl_size+1
z3(i, j) = z3(i, j) + a2(i, k)*thetao(j, k);
end
h(i, j) = sigmoid_simple(z3(i, j)); %the hypothesis
end
end
%calculate predictions for each input example based on the maximum term
%of the hypothesis h
p = zeros(size(y));
for i=1:mtest
max_ind = 1;
max_value = h(i, 1);
for j=2:num_outputs
if (h(i, j) > max_value)
max_ind = j;
max_value = h(i, j);
end
end
p(i) = max_ind;
end
%calculate the success rate of the prediction
correct_count = 0;
for i=1:mtest
if (p(i) == y(i))
correct_count = correct_count + 1;
end
end
rate = correct_count/mtest*100;
display(['simple version rate:', num2str(rate)]);
end
function predict_vectorized(Xtest, y, thetah, thetao)
mtest = size(Xtest, 1); %number of input examples
%add a bias unit to the input layer
a1 = [ones(mtest, 1) Xtest];
%compute activations of the hidden layer
z2 = a1*thetah';
a2 = sigmoid_universal(z2);
%add a bias unit to the hidden layer
a2 = [ones(mtest, 1) a2];
%compute activations of the output layer
z3 = a2*thetao';
h = sigmoid_universal(z3); %the hypothesis
%calculate predictions for each input example based on the maximum term
%of the hypothesis h
[~,p] = max(h, [], 2);
%calculate the success rate of the prediction
rate = mean(double((p == y))) * 100;
display(['vectorized version rate:', num2str(rate)]);
end
function [ s ] = sigmoid_simple( z )
s = 1/(1+exp(-z));
end
function [ s ] = sigmoid_universal( z )
s = 1./(1+exp(-z));
end
我有一个神经网络,想用训练好的神经网络求解一组测试数据。我苦苦挣扎的是为隐藏层和输出层编写公式。我的目标是拥有一个矢量化公式,但我也很乐意实现循环变化。
现在我相信我有正确的隐藏层公式,输出层只需要一个,但如果有人确认它是矢量化公式,我将不胜感激。
% Variables
% Xtest test training data
% thetah - trained weights for inputs to hidden layer
% thetao - trained weights for hidden layer to outputs
% ytest - output
htest = (1 ./ (1 + exp(-(thetah * Xtest'))))' ; % FORMULA FOR HIDDEN LAYER
ytest = ones(mtest, num_outputs) ; % FORMULA FOR OUTPUT LAYER
假设您的 Xtest
具有维度 N by M
,其中 N 是示例数,M 是特征数,thetah
是一个 M by H1
矩阵,其中 H1是第一层中隐藏层的数量,thetao
是一个 H1 by O
矩阵,其中 O 是输出的数量 类 您执行以下操作:
a1 = Xtest * thetah;
z1 = 1 / (1 + exp(-a1)); %Assuming you are using sigmoid units
a2 = z1 * thetao;
z2 = softmax(a2);
阅读有关 softmax 的更多信息 here。
您可以在下面找到前向传播的矢量化和循环实现。由于不同的符号和您在矩阵中存储数据的方式,您的输入数据可能必须适应下面的代码。
您需要在输入层和隐藏层都添加偏置单元。
为了简化实现和调试的工作,我从开源中获取了一些数据machine learning repository and trained the network for the wine classification task。
网络对输入数据的分离率为97.7%
代码如下:
function [] = nn_fp()
load('Xtest.mat'); %input data 178x13
load('y.mat'); %output data 178x1
load('thetah.mat'); %Parameters of the hidden layer 15x14
load('thetao.mat'); %Parameters of the output layer 3x16
predict_simple(Xtest, y, thetah, thetao);
predict_vectorized(Xtest, y, thetah, thetao);
end
function predict_simple(Xtest, y, thetah, thetao)
mtest = size(Xtest, 1); %number of input examples
n = size(Xtest, 2); %number of features
hl_size = size(thetah, 1); %size of the hidden layer (without the bias unit)
num_outputs = size(thetao, 1); %size of the output layer
%add a bias unit to the input layer
a1 = [ones(mtest, 1) Xtest]; %[mtest x (n+1)]
%compute activations of the hidden layer
z2 = zeros(mtest, hl_size); %[mtest x hl_size]
a2 = zeros(mtest, hl_size); %[mtest x hl_size]
for i=1:mtest
for j=1:hl_size
for k=1:n+1
z2(i, j) = z2(i, j) + a1(i, k)*thetah(j, k);
end
a2(i, j) = sigmoid_simple(z2(i, j));
end
end
%add a bias unit to the hidden layer
a2 = [ones(mtest, 1) a2]; %[mtest x (hl_size+1)]
%compute activations of the output layer
z3 = zeros(mtest, num_outputs); %[mtest x num_outputs]
h = zeros(mtest, num_outputs); %[mtest x num_outputs]
for i=1:mtest
for j=1:num_outputs
for k=1:hl_size+1
z3(i, j) = z3(i, j) + a2(i, k)*thetao(j, k);
end
h(i, j) = sigmoid_simple(z3(i, j)); %the hypothesis
end
end
%calculate predictions for each input example based on the maximum term
%of the hypothesis h
p = zeros(size(y));
for i=1:mtest
max_ind = 1;
max_value = h(i, 1);
for j=2:num_outputs
if (h(i, j) > max_value)
max_ind = j;
max_value = h(i, j);
end
end
p(i) = max_ind;
end
%calculate the success rate of the prediction
correct_count = 0;
for i=1:mtest
if (p(i) == y(i))
correct_count = correct_count + 1;
end
end
rate = correct_count/mtest*100;
display(['simple version rate:', num2str(rate)]);
end
function predict_vectorized(Xtest, y, thetah, thetao)
mtest = size(Xtest, 1); %number of input examples
%add a bias unit to the input layer
a1 = [ones(mtest, 1) Xtest];
%compute activations of the hidden layer
z2 = a1*thetah';
a2 = sigmoid_universal(z2);
%add a bias unit to the hidden layer
a2 = [ones(mtest, 1) a2];
%compute activations of the output layer
z3 = a2*thetao';
h = sigmoid_universal(z3); %the hypothesis
%calculate predictions for each input example based on the maximum term
%of the hypothesis h
[~,p] = max(h, [], 2);
%calculate the success rate of the prediction
rate = mean(double((p == y))) * 100;
display(['vectorized version rate:', num2str(rate)]);
end
function [ s ] = sigmoid_simple( z )
s = 1/(1+exp(-z));
end
function [ s ] = sigmoid_universal( z )
s = 1./(1+exp(-z));
end