Tensorflow 保护程序:内核似乎已经死亡
Tensorflow saver: Kernel appears to have died
我在保存/恢复 tensorflow 模型时遇到了很多麻烦,要么是我的 "Kernels seems to have died",要么是我收到错误 ("Variable ... already exits")。
当我的内核死机时,我在控制台中收到以下错误日志:
[I 21:13:41.505 NotebookApp] Saving file at /Nanodegree_MachineLearning/06_Capstone/capstone.ipynb
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
[I 21:17:05.416 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel 81679b46-ec9b-4ce6-b5be-ae2d9cf01210 restarted
[I 21:17:41.778 NotebookApp] Saving file at /Nanodegree_MachineLearning/06_Capstone/capstone.ipynb
[19324:20881:1229/212110:ERROR:object_proxy.cc(583)] Failed to call method: org.freedesktop.UPower.GetDisplayDevice: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.UnknownMethod: Method "GetDisplayDevice" with signature "" on interface "org.freedesktop.UPower" doesn't exist
我的代码如下:
if __name__ == '__main__':
if LEARN_MODUS:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
steps_per_epoch = len(X_train) // BATCH_SIZE
num_examples = steps_per_epoch * BATCH_SIZE
# Train model
for i in range(EPOCHS):
for step in range(steps_per_epoch):
#Calculate next Batch
batch_start = step * BATCH_SIZE
batch_end = (step + 1) * BATCH_SIZE
batch_x = X_train[batch_start:batch_end]
batch_y = y_train[batch_start:batch_end]
#Run Training
loss = sess.run(train_op, feed_dict={x: batch_x, y: batch_y, keep_prob: 0.5})
try:
saver
except NameError:
saver = tf.train.Saver()
saver.save(sess, 'foo')
print("Model saved")
要恢复模型,我使用:
predicions = tf.argmax(fc2,1)
predicted_classes = []
try:
saver
except NameError:
saver = tf.train.Saver()
with tf.Session() as sess:
saver = tf.train.import_meta_graph('foo.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
predicted_classes = sess.run(predicions, feed_dict={x: X_test, keep_prob: 1.0})
我尝试了很多不同的方法,有时它有效(但并非总是如此!?),有时它会崩溃,有时我会收到变量错误。我必须以其他方式使用 saving/restoring 吗?
我正在使用:
Ubuntu 14.04
蟒蛇3
Python 3.5.2
张量流 0.12
jupyter 笔记本内部
谢谢!
当您 运行 内存不足时可能会发生这种情况,解决方案是尝试使用较小的批处理大小。我看到您将测试集输入到单个 run
调用中,这需要足够的内存来一次完成所有示例。您可以执行 eval_in_batches 之类的操作来汇总几个较小的 运行 调用
的准确性
我在保存/恢复 tensorflow 模型时遇到了很多麻烦,要么是我的 "Kernels seems to have died",要么是我收到错误 ("Variable ... already exits")。
当我的内核死机时,我在控制台中收到以下错误日志:
[I 21:13:41.505 NotebookApp] Saving file at /Nanodegree_MachineLearning/06_Capstone/capstone.ipynb
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
[I 21:17:05.416 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel 81679b46-ec9b-4ce6-b5be-ae2d9cf01210 restarted
[I 21:17:41.778 NotebookApp] Saving file at /Nanodegree_MachineLearning/06_Capstone/capstone.ipynb
[19324:20881:1229/212110:ERROR:object_proxy.cc(583)] Failed to call method: org.freedesktop.UPower.GetDisplayDevice: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.UnknownMethod: Method "GetDisplayDevice" with signature "" on interface "org.freedesktop.UPower" doesn't exist
我的代码如下:
if __name__ == '__main__':
if LEARN_MODUS:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
steps_per_epoch = len(X_train) // BATCH_SIZE
num_examples = steps_per_epoch * BATCH_SIZE
# Train model
for i in range(EPOCHS):
for step in range(steps_per_epoch):
#Calculate next Batch
batch_start = step * BATCH_SIZE
batch_end = (step + 1) * BATCH_SIZE
batch_x = X_train[batch_start:batch_end]
batch_y = y_train[batch_start:batch_end]
#Run Training
loss = sess.run(train_op, feed_dict={x: batch_x, y: batch_y, keep_prob: 0.5})
try:
saver
except NameError:
saver = tf.train.Saver()
saver.save(sess, 'foo')
print("Model saved")
要恢复模型,我使用:
predicions = tf.argmax(fc2,1)
predicted_classes = []
try:
saver
except NameError:
saver = tf.train.Saver()
with tf.Session() as sess:
saver = tf.train.import_meta_graph('foo.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
predicted_classes = sess.run(predicions, feed_dict={x: X_test, keep_prob: 1.0})
我尝试了很多不同的方法,有时它有效(但并非总是如此!?),有时它会崩溃,有时我会收到变量错误。我必须以其他方式使用 saving/restoring 吗?
我正在使用: Ubuntu 14.04 蟒蛇3 Python 3.5.2 张量流 0.12
jupyter 笔记本内部
谢谢!
当您 运行 内存不足时可能会发生这种情况,解决方案是尝试使用较小的批处理大小。我看到您将测试集输入到单个 run
调用中,这需要足够的内存来一次完成所有示例。您可以执行 eval_in_batches 之类的操作来汇总几个较小的 运行 调用