保存和恢复张量流模型 (LSTM) 的问题
Issues saving and restoring tensorflow model (LSTM)
我正在研究生成文本的 LSTM,但在重用之前训练的模型时遇到了问题。我在使用 tensorflow website 作为资源时分解了下面的代码。
在这里我建立了我所有的变量:
graph = tf.Graph()
with graph.as_default():
global_step = tf.Variable(0)
data = tf.placeholder(tf.float32, [batch_size, len_section, char_size])
labels = tf.placeholder(tf.float32, [batch_size, char_size])
.....
#Reset at the beginning of each test
reset_test_state = tf.group(test_output.assign(tf.zeros([1, hidden_nodes])),
test_state.assign(tf.zeros([1, hidden_nodes])))
#LSTM
test_output, test_state = lstm(test_data, test_output, test_state)
test_prediction = tf.nn.softmax(tf.matmul(test_output, w) + b)
saver = tf.train.Saver()
在这里,我正在训练我的模型并在 30 次迭代时保存一个检查点
with tf.Session(graph = graph) as sess:
tf.global_variables_initializer().run()
offset = 0
for step in range(10000):
offset = offset % len(X)
if offset <= (len(X) - batch_size):
batch_data = X[offset: offset + batch_size]
batch_labels = y[offset:offset+batch_size]
offset += batch_size
else:
to_add = batch_size - (len(X) - offset)
batch_data = np.concatenate((X[offset: len(X)], X[0: to_add]))
batch_labels = np.concatenate((y[offset: len(X)], y[0: to_add]))
offset = to_add
_, training_loss = sess.run([optimizer, loss], feed_dict = {data : batch_data, labels : batch_labels})
if step % 10 == 0:
print('training loss at step %d: %.2f (%s)' % (step, training_loss, datetime.datetime.now()))
if step % save_every == 0:
saver.save(sess, checkpoint_directory + '/model.ckpt', global_step=step)
if step == 30:
break
我查看该目录并创建了以下文件:
这里我应该恢复我的训练模型并测试它:
with tf.Session(graph=graph) as sess:
#standard init step
offset = 0
saver = tf.train.Saver()
saver.restore(sess, "/ckpt/model-150.meta")
tf.global_variables_initializer().run()
test_start = "I plan to make this world a better place "
test_generated = test_start
....
执行此操作后出现以下错误:
DataLossError (see above for traceback): Unable to open table file /ckpt/model.ckpt-30.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
我不太确定我做错了什么。该教程看起来非常简单,但我显然遗漏了一些东西。任何类型的反馈将不胜感激。
首先,请注意,如果您在从检查点恢复后初始化所有变量,您将获得它们的随机初始值而不是训练值。
其次,如果您使用 tf.estimator.Estimator
而不是自己实施,则更容易获得正确的保存/恢复。
第三,我不明白你是如何通过 model-150.meta
来恢复的,但看到关于 model-30.meta
的错误。不过,我相信您应该只传递 model-30
(没有 .meta 后缀)。
我正在研究生成文本的 LSTM,但在重用之前训练的模型时遇到了问题。我在使用 tensorflow website 作为资源时分解了下面的代码。
在这里我建立了我所有的变量:
graph = tf.Graph()
with graph.as_default():
global_step = tf.Variable(0)
data = tf.placeholder(tf.float32, [batch_size, len_section, char_size])
labels = tf.placeholder(tf.float32, [batch_size, char_size])
.....
#Reset at the beginning of each test
reset_test_state = tf.group(test_output.assign(tf.zeros([1, hidden_nodes])),
test_state.assign(tf.zeros([1, hidden_nodes])))
#LSTM
test_output, test_state = lstm(test_data, test_output, test_state)
test_prediction = tf.nn.softmax(tf.matmul(test_output, w) + b)
saver = tf.train.Saver()
在这里,我正在训练我的模型并在 30 次迭代时保存一个检查点
with tf.Session(graph = graph) as sess:
tf.global_variables_initializer().run()
offset = 0
for step in range(10000):
offset = offset % len(X)
if offset <= (len(X) - batch_size):
batch_data = X[offset: offset + batch_size]
batch_labels = y[offset:offset+batch_size]
offset += batch_size
else:
to_add = batch_size - (len(X) - offset)
batch_data = np.concatenate((X[offset: len(X)], X[0: to_add]))
batch_labels = np.concatenate((y[offset: len(X)], y[0: to_add]))
offset = to_add
_, training_loss = sess.run([optimizer, loss], feed_dict = {data : batch_data, labels : batch_labels})
if step % 10 == 0:
print('training loss at step %d: %.2f (%s)' % (step, training_loss, datetime.datetime.now()))
if step % save_every == 0:
saver.save(sess, checkpoint_directory + '/model.ckpt', global_step=step)
if step == 30:
break
我查看该目录并创建了以下文件:
这里我应该恢复我的训练模型并测试它:
with tf.Session(graph=graph) as sess:
#standard init step
offset = 0
saver = tf.train.Saver()
saver.restore(sess, "/ckpt/model-150.meta")
tf.global_variables_initializer().run()
test_start = "I plan to make this world a better place "
test_generated = test_start
....
执行此操作后出现以下错误:
DataLossError (see above for traceback): Unable to open table file /ckpt/model.ckpt-30.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
我不太确定我做错了什么。该教程看起来非常简单,但我显然遗漏了一些东西。任何类型的反馈将不胜感激。
首先,请注意,如果您在从检查点恢复后初始化所有变量,您将获得它们的随机初始值而不是训练值。
其次,如果您使用 tf.estimator.Estimator
而不是自己实施,则更容易获得正确的保存/恢复。
第三,我不明白你是如何通过 model-150.meta
来恢复的,但看到关于 model-30.meta
的错误。不过,我相信您应该只传递 model-30
(没有 .meta 后缀)。