Theano:如何将训练数据提供给神经网络
Theano: How to give training data to a neural network
我正在尝试在 Theano 中为 "logical and" 创建一个简单的多层感知器 (MLP)。
输入和输出之间有一层。结构是这样的:
2 值输入 -> 乘以权重,添加偏置 -> softmax -> 1 值输出
维度的变化是由权重矩阵引起的。
实现基于本教程:http://deeplearning.net/tutorial/logreg.html
这是我的 class 层:
class Layer():
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros((n_in
, n_out),
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.nnet.softmax(T.dot(input, self.W) + self.b)
self.params = (self.W, self.b)
self.input = input
class 是模块化的。我希望能够添加多层,而不仅仅是一层。
因此,预测、成本和错误的功能在 class 之外(与教程相反):
def y_pred(output):
return T.argmax(output, axis=1)
def negative_log_likelihood(output, y):
return -T.mean(T.log(output)[T.arange(y.shape[0]), y])
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
合乎逻辑且有 4 个训练案例:
- [0,0] -> 0
- [1,0] -> 0
- [0,1] -> 0
- [1,1] -> 1
这里是 classifier 的设置以及用于训练和评估的函数:
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.vector("x",theano.config.floatX) # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.lscalar()
learning_rate = 0.15
updates = [
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
]
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index],
y: train_set_y[index]
}
)
validate_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: train_set_x[index],
y: train_set_y[index]
}
)
我试图遵循惯例。数据矩阵中的每一行都是一个训练样本。每个训练样本都匹配到正确的输出。
不幸的是代码中断了。我无法解释错误消息。我做错了什么 ?
错误:
TypeError: Cannot convert Type TensorType(int32, scalar) (of Variable Subtensor{int64}.0) into Type TensorType(int32, vector). You can try to manually convert Subtensor{int64}.0 into a TensorType(int32, vector).
这个错误发生在 Theano 代码深处。我程序中的冲突行是:
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index],
y: train_set_y[index] # <---------------HERE
}
)
显然 y 的维度与训练数据不匹配。
我在 pastebin 上的完整代码:http://pastebin.com/U5jYitk2
pastebin 上的完整错误消息:http://pastebin.com/hUQJhfNM
简明问题:
将训练数据提供给 theano 中的 mlp 的正确方法是什么?
我的错误在哪里?
我复制了教程的大部分代码。值得注意的变化(错误的可能原因)是:
- y 的训练数据不是矩阵。我认为这是对的,因为我的网络输出只是一个标量值
- 第一层的输入是向量。这个变量被命名为 x.
- 训练数据的访问不使用切片。在教程中,训练数据非常复杂,我发现很难阅读数据访问代码。我相信 x 应该是数据矩阵的一行。我就是这样实现的。
更新:
我使用了Amir的代码。看起来很好,谢谢。
但它也会产生错误。最后一个循环越界:
/usr/bin/python3.4 /home/lhk/programming/sk/mlp/mlp/Layer.py Traceback
(most recent call last): File
"/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in
train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 606, in call
storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206,
in raise_with_op
raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0,
b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64,
matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,),
(4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1],
dtype=int32)]
HINT: Re-running with most Theano optimization disabled could give you
a back-trace of when this node was created. This can be done with by
setting the Theano flag 'optimizer=fast_compile'. If that does not
work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint
and storage map footprint of this apply node.
第 113 行是这个:
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i) # <-----------------HERE
我相信这是因为训练数据的索引使用了 index
和 index+1
。为什么有必要?一排应该是一个训练样本。第一行是 train_set_x[index]
编辑:我调试了代码。不切片 returns 一个一维数组,切片它是一个二维数组。 1d 应该与矩阵 x 不兼容。
但是在这样做的同时,我又发现了一个奇怪的问题:
我添加了这段代码,看看训练的效果:
print("before")
print(classifier.W.get_value())
print(classifier.b.get_value())
for i in range(3):
train_model(i)
print("after")
print(classifier.W.get_value())
print(classifier.b.get_value())
before
[[ 0.]
[ 0.]]
[ 0.]
after
[[ 0.]
[ 0.]]
[ 0.]
这是有道理的,因为前三个样本的正确输出为 0。
如果我改变顺序并将训练样本 (1,1),1 移到前面,程序就会崩溃。
before [[ 0.] [ 0.]] [ 0.] Traceback (most recent call last): File
"/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"/home/lhk/programming/sk/mlp/mlp/Layer.py", line 121, in
train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 606, in call
storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206,
in raise_with_op
raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0,
b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64,
matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,),
(4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1],
dtype=int32)]
HINT: Re-running with most Theano optimization disabled could give you
a back-trace of when this node was created. This can be done with by
setting the Theano flag 'optimizer=fast_compile'. If that does not
work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint
and storage map footprint of this apply node.
更新
我用 Theano 安装了 Python2.7 并再次尝试 运行 代码。出现同样的错误。我添加了详细的异常处理。这是输出:
/usr/bin/python2.7 /home/lhk/programming/sk/mlp/mlp/Layer.py
Traceback (most recent call last):
File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in <module>
train_model(i)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in __call__
outputs = self.fn()
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 485, in streamline_default_f
raise_with_op(node, thunk)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 481, in streamline_default_f
thunk()
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/op.py", line 768, in rval
r = p(n, [x[0] for x in i], o)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 896, in perform
nll[i] = -row[y_idx[i]] + m + numpy.log(sum_j)
IndexError: index 1 is out of bounds for axis 0 with size 1
Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Subtensor{int32:int32:}.0)
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)]
Inputs strides: [(8, 8), (8,), (4,)]
Inputs values: [array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]
Debugprint of the apply node:
CrossentropySoftmaxArgmax1HotWithBias.0 [@A] <TensorType(float64, vector)> ''
|Dot22 [@B] <TensorType(float64, matrix)> ''
| |Subtensor{int32:int32:} [@C] <TensorType(float64, matrix)> ''
| | |<TensorType(float64, matrix)> [@D] <TensorType(float64, matrix)>
| | |ScalarFromTensor [@E] <int32> ''
| | | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
| | |ScalarFromTensor [@G] <int32> ''
| | |Elemwise{add,no_inplace} [@H] <TensorType(int32, scalar)> ''
| | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
| | |TensorConstant{1} [@I] <TensorType(int8, scalar)>
| |W [@J] <TensorType(float64, matrix)>
|b [@K] <TensorType(float64, vector)>
|Subtensor{int32:int32:} [@L] <TensorType(int32, vector)> ''
|Elemwise{Cast{int32}} [@M] <TensorType(int32, vector)> ''
| |<TensorType(float64, vector)> [@N] <TensorType(float64, vector)>
|ScalarFromTensor [@E] <int32> ''
|ScalarFromTensor [@G] <int32> ''
CrossentropySoftmaxArgmax1HotWithBias.1 [@A] <TensorType(float64, matrix)> ''
CrossentropySoftmaxArgmax1HotWithBias.2 [@A] <TensorType(int32, vector)> ''
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
Process finished with exit code 1
更新:
又看了看训练数据。任何以 1 作为标签的样本都会产生上述错误。
data_y = numpy.array([1,
1,
1,
1])
上面的示例标签对于每个 train_model(i) for i in (0,1,2,3) 都会崩溃。
显然,索引样本和样本内容之间存在干扰。
更新:
正如 Amir 的联系人所指出的那样,问题确实出在输出层的维度上。我误以为我可以训练网络在输出神经元中直接对函数 "logical and" 的输出进行编码。虽然这当然是可能的,但这种训练方法使用 y 值索引来选择应该具有最高值的输出节点。将输出大小更改为两个后,代码就可以工作了。经过足够的训练,所有情况下的错误确实都变成了零。
这是您的问题的工作代码。您的代码中有很多小错误。导致错误的原因是将 b
定义为 n_out
矩阵的 n_in
而不是简单地将其定义为 'n_out' 向量。更新部分在括号 []
中定义,而不是括号 ()
。
此外,索引被定义为 int32
符号标量(这不是很重要)。另一个导入更改是在给定正确索引的情况下定义函数。由于某种原因,您使用 index
编译函数的方式不会让函数编译。您还已将输入声明为向量。这样,您将无法使用 mini-batches 或整批训练模型。所以将其声明为符号矩阵是安全的。要使用向量,您需要将输入存储为向量而不是共享变量上的矩阵,以使程序 运行。所以,声明为vector会有这样的头痛。最后,尽管从 Layer
class.
中删除了函数 errors
,但您还是使用 classifier.errors(y)
编译了验证函数
import theano
import theano.tensor as T
import numpy
class Layer(object):
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.x = input
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros(n_out,
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.nnet.softmax(T.dot(self.x, self.W) + self.b)
self.params = [self.W, self.b]
self.input = input
def y_pred(output):
return T.argmax(output, axis=1)
def negative_log_likelihood(output, y):
return -T.mean(T.log(output)[T.arange(y.shape[0]), y])
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.matrix("x") # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()
learning_rate = 0.15
updates = (
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
)
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
validate_model = theano.function(
inputs=[index],
outputs=errors(classifier.output, y),
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i)
这是更新后的代码。请注意,上面的代码和下面的代码之间的主要区别在于,后者适用于二进制问题,而另一个仅适用于 multi-class 问题,而这里不是这种情况。我将两个代码片段放在这里的原因是出于教育目的。请阅读评论以了解上面代码的问题以及我如何解决它。
import theano
import theano.tensor as T
import numpy
class Layer(object):
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.x = input
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros(n_out,
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.reshape(T.nnet.sigmoid(T.dot(self.x, self.W) + self.b), (input.shape[0],))
self.params = [self.W, self.b]
self.input = input
def y_pred(output):
return output
def negative_log_likelihood(output, y):
return T.mean(T.nnet.binary_crossentropy(output,y))
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.matrix("x") # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()
learning_rate = 0.15
updates = (
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
)
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index:index+1],
y: train_set_y[index:index+1]
}
)
validate_model = theano.function(
inputs=[index],
outputs=errors(classifier.output, y),
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i)
你可以试试我的 MLP class:
基于 Lasagne / Theano 的多层感知器 MLP,它接受稀疏和密集输入矩阵,并且非常易于与 scikit-learn api 相似性一起使用。
它具有 drop-out 可配置/稀疏输入/可以更改为逻辑回归,易于更改成本函数和 l1/l2/elasticnet 正则化。
密码是here
我正在尝试在 Theano 中为 "logical and" 创建一个简单的多层感知器 (MLP)。 输入和输出之间有一层。结构是这样的:
2 值输入 -> 乘以权重,添加偏置 -> softmax -> 1 值输出
维度的变化是由权重矩阵引起的。
实现基于本教程:http://deeplearning.net/tutorial/logreg.html
这是我的 class 层:
class Layer():
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros((n_in
, n_out),
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.nnet.softmax(T.dot(input, self.W) + self.b)
self.params = (self.W, self.b)
self.input = input
class 是模块化的。我希望能够添加多层,而不仅仅是一层。 因此,预测、成本和错误的功能在 class 之外(与教程相反):
def y_pred(output):
return T.argmax(output, axis=1)
def negative_log_likelihood(output, y):
return -T.mean(T.log(output)[T.arange(y.shape[0]), y])
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
合乎逻辑且有 4 个训练案例:
- [0,0] -> 0
- [1,0] -> 0
- [0,1] -> 0
- [1,1] -> 1
这里是 classifier 的设置以及用于训练和评估的函数:
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.vector("x",theano.config.floatX) # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.lscalar()
learning_rate = 0.15
updates = [
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
]
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index],
y: train_set_y[index]
}
)
validate_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: train_set_x[index],
y: train_set_y[index]
}
)
我试图遵循惯例。数据矩阵中的每一行都是一个训练样本。每个训练样本都匹配到正确的输出。 不幸的是代码中断了。我无法解释错误消息。我做错了什么 ? 错误:
TypeError: Cannot convert Type TensorType(int32, scalar) (of Variable Subtensor{int64}.0) into Type TensorType(int32, vector). You can try to manually convert Subtensor{int64}.0 into a TensorType(int32, vector).
这个错误发生在 Theano 代码深处。我程序中的冲突行是:
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index],
y: train_set_y[index] # <---------------HERE
}
)
显然 y 的维度与训练数据不匹配。 我在 pastebin 上的完整代码:http://pastebin.com/U5jYitk2 pastebin 上的完整错误消息:http://pastebin.com/hUQJhfNM
简明问题: 将训练数据提供给 theano 中的 mlp 的正确方法是什么? 我的错误在哪里?
我复制了教程的大部分代码。值得注意的变化(错误的可能原因)是:
- y 的训练数据不是矩阵。我认为这是对的,因为我的网络输出只是一个标量值
- 第一层的输入是向量。这个变量被命名为 x.
- 训练数据的访问不使用切片。在教程中,训练数据非常复杂,我发现很难阅读数据访问代码。我相信 x 应该是数据矩阵的一行。我就是这样实现的。
更新: 我使用了Amir的代码。看起来很好,谢谢。
但它也会产生错误。最后一个循环越界:
/usr/bin/python3.4 /home/lhk/programming/sk/mlp/mlp/Layer.py Traceback (most recent call last): File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 595, in call outputs = self.fn() ValueError: y_i value out of bounds
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 606, in call storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206, in raise_with_op raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 595, in call outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)] Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,), (4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
第 113 行是这个:
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i) # <-----------------HERE
我相信这是因为训练数据的索引使用了 index
和 index+1
。为什么有必要?一排应该是一个训练样本。第一行是 train_set_x[index]
编辑:我调试了代码。不切片 returns 一个一维数组,切片它是一个二维数组。 1d 应该与矩阵 x 不兼容。
但是在这样做的同时,我又发现了一个奇怪的问题: 我添加了这段代码,看看训练的效果:
print("before")
print(classifier.W.get_value())
print(classifier.b.get_value())
for i in range(3):
train_model(i)
print("after")
print(classifier.W.get_value())
print(classifier.b.get_value())
before
[[ 0.]
[ 0.]]
[ 0.]
after
[[ 0.]
[ 0.]]
[ 0.]
这是有道理的,因为前三个样本的正确输出为 0。 如果我改变顺序并将训练样本 (1,1),1 移到前面,程序就会崩溃。
before [[ 0.] [ 0.]] [ 0.] Traceback (most recent call last): File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 595, in call outputs = self.fn() ValueError: y_i value out of bounds
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 121, in train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 606, in call storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206, in raise_with_op raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py", line 595, in call outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)] Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,), (4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
更新
我用 Theano 安装了 Python2.7 并再次尝试 运行 代码。出现同样的错误。我添加了详细的异常处理。这是输出:
/usr/bin/python2.7 /home/lhk/programming/sk/mlp/mlp/Layer.py
Traceback (most recent call last):
File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in <module>
train_model(i)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in __call__
outputs = self.fn()
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 485, in streamline_default_f
raise_with_op(node, thunk)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 481, in streamline_default_f
thunk()
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/op.py", line 768, in rval
r = p(n, [x[0] for x in i], o)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 896, in perform
nll[i] = -row[y_idx[i]] + m + numpy.log(sum_j)
IndexError: index 1 is out of bounds for axis 0 with size 1
Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Subtensor{int32:int32:}.0)
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)]
Inputs strides: [(8, 8), (8,), (4,)]
Inputs values: [array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]
Debugprint of the apply node:
CrossentropySoftmaxArgmax1HotWithBias.0 [@A] <TensorType(float64, vector)> ''
|Dot22 [@B] <TensorType(float64, matrix)> ''
| |Subtensor{int32:int32:} [@C] <TensorType(float64, matrix)> ''
| | |<TensorType(float64, matrix)> [@D] <TensorType(float64, matrix)>
| | |ScalarFromTensor [@E] <int32> ''
| | | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
| | |ScalarFromTensor [@G] <int32> ''
| | |Elemwise{add,no_inplace} [@H] <TensorType(int32, scalar)> ''
| | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
| | |TensorConstant{1} [@I] <TensorType(int8, scalar)>
| |W [@J] <TensorType(float64, matrix)>
|b [@K] <TensorType(float64, vector)>
|Subtensor{int32:int32:} [@L] <TensorType(int32, vector)> ''
|Elemwise{Cast{int32}} [@M] <TensorType(int32, vector)> ''
| |<TensorType(float64, vector)> [@N] <TensorType(float64, vector)>
|ScalarFromTensor [@E] <int32> ''
|ScalarFromTensor [@G] <int32> ''
CrossentropySoftmaxArgmax1HotWithBias.1 [@A] <TensorType(float64, matrix)> ''
CrossentropySoftmaxArgmax1HotWithBias.2 [@A] <TensorType(int32, vector)> ''
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
Process finished with exit code 1
更新:
又看了看训练数据。任何以 1 作为标签的样本都会产生上述错误。
data_y = numpy.array([1,
1,
1,
1])
上面的示例标签对于每个 train_model(i) for i in (0,1,2,3) 都会崩溃。 显然,索引样本和样本内容之间存在干扰。
更新: 正如 Amir 的联系人所指出的那样,问题确实出在输出层的维度上。我误以为我可以训练网络在输出神经元中直接对函数 "logical and" 的输出进行编码。虽然这当然是可能的,但这种训练方法使用 y 值索引来选择应该具有最高值的输出节点。将输出大小更改为两个后,代码就可以工作了。经过足够的训练,所有情况下的错误确实都变成了零。
这是您的问题的工作代码。您的代码中有很多小错误。导致错误的原因是将 b
定义为 n_out
矩阵的 n_in
而不是简单地将其定义为 'n_out' 向量。更新部分在括号 []
中定义,而不是括号 ()
。
此外,索引被定义为 int32
符号标量(这不是很重要)。另一个导入更改是在给定正确索引的情况下定义函数。由于某种原因,您使用 index
编译函数的方式不会让函数编译。您还已将输入声明为向量。这样,您将无法使用 mini-batches 或整批训练模型。所以将其声明为符号矩阵是安全的。要使用向量,您需要将输入存储为向量而不是共享变量上的矩阵,以使程序 运行。所以,声明为vector会有这样的头痛。最后,尽管从 Layer
class.
errors
,但您还是使用 classifier.errors(y)
编译了验证函数
import theano
import theano.tensor as T
import numpy
class Layer(object):
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.x = input
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros(n_out,
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.nnet.softmax(T.dot(self.x, self.W) + self.b)
self.params = [self.W, self.b]
self.input = input
def y_pred(output):
return T.argmax(output, axis=1)
def negative_log_likelihood(output, y):
return -T.mean(T.log(output)[T.arange(y.shape[0]), y])
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.matrix("x") # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()
learning_rate = 0.15
updates = (
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
)
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
validate_model = theano.function(
inputs=[index],
outputs=errors(classifier.output, y),
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i)
这是更新后的代码。请注意,上面的代码和下面的代码之间的主要区别在于,后者适用于二进制问题,而另一个仅适用于 multi-class 问题,而这里不是这种情况。我将两个代码片段放在这里的原因是出于教育目的。请阅读评论以了解上面代码的问题以及我如何解决它。
import theano
import theano.tensor as T
import numpy
class Layer(object):
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.x = input
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros(n_out,
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.reshape(T.nnet.sigmoid(T.dot(self.x, self.W) + self.b), (input.shape[0],))
self.params = [self.W, self.b]
self.input = input
def y_pred(output):
return output
def negative_log_likelihood(output, y):
return T.mean(T.nnet.binary_crossentropy(output,y))
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.matrix("x") # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()
learning_rate = 0.15
updates = (
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
)
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index:index+1],
y: train_set_y[index:index+1]
}
)
validate_model = theano.function(
inputs=[index],
outputs=errors(classifier.output, y),
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i)
你可以试试我的 MLP class:
基于 Lasagne / Theano 的多层感知器 MLP,它接受稀疏和密集输入矩阵,并且非常易于与 scikit-learn api 相似性一起使用。
它具有 drop-out 可配置/稀疏输入/可以更改为逻辑回归,易于更改成本函数和 l1/l2/elasticnet 正则化。
密码是here