Tensorflow RNN 文本生成实例教程
Tensorflow RNN text generation example tutorial
查看本教程here,他们使用“Romeo:”的起始序列。
int(generate_text(model, start_string=u"ROMEO: "))
然而,看看实际的生成步骤,可以说它只使用最后一个字符“”吗?所以我们用“ROMEO:”还是“”是一样的?很难测试,因为它是从输出分布中采样的...
与此相关,目前尚不清楚它如何从如此短的字符串中进行预测,因为原始训练序列要长得多。我知道如果我们根据 100 个字符的历史记录进行训练,我们会预测第 101 个字符,然后使用 2-101 来预测 102...但是它是如何从 7 个字符开始的?
编辑
作为具体示例,我将模型修改为以下形式:
model = tf.keras.Sequential()
model.add(tf.keras.layers.SimpleRNN(units=512, input_shape = (seq_len, 1), activation="tanh"))
model.add(tf.keras.layers.Dense(len(vocab)))
model.compile(loss=loss, optimizer='adam')
model.summary()
注意,我使用 simpleRNN 而不是 GRU 并放弃了嵌入步骤。这两项更改都是为了简化模型,但这无关紧要。
我的训练和输出数据如下:
>>> input_array_reshaped
array([[46., 47., 53., ..., 39., 58., 1.],
[ 8., 0., 20., ..., 33., 31., 10.],
[63., 1., 44., ..., 58., 46., 43.],
...,
[47., 41., 47., ..., 0., 21., 57.],
[59., 58., 1., ..., 1., 61., 43.],
[52., 57., 43., ..., 1., 63., 53.]])
>>> input_array_reshaped.shape
(5000, 100)
>>> output_array_reshaped.shape
(5000, 1, 1)
>>> output_array_reshaped
array([[[40.]],
[[ 0.]],
[[56.]],
...,
[[ 1.]],
[[56.]],
[[59.]]])
但是,如果我尝试预测少于 100 个字符的字符串,我会得到:
ValueError: Error when checking input: expected simple_rnn_1_input to have shape (100, 1) but got array with shape (50, 1)
如果需要,下面是我的预测函数。如果我将 required_training_length 更改为 100 以外的任何值,它就会崩溃。它需要长度为 100 的 "specifically" time_steps。
有人能告诉我如何调整模型以使其更灵活吗?我缺少什么微妙之处?
def generateText(starting_corpus, num_char_to_generate = 1000, required_training_length = 100):
random_starting_int = random.sample(range(len(text)),1)[0]
ending_position = random_starting_int+required_training_length
starting_string = text[random_starting_int:ending_position]
print("Starting string is: " + starting_string)
numeric_starting_string = [char2idx[x] for x in starting_string]
reshaped_numeric_string = np.reshape(numeric_starting_string, (1, len(numeric_starting_string), 1)).astype('float32')
output_numeric_vector = []
for i in range(num_char_to_generate):
if i%50 == 0:
print("Processing character index: "+str(i))
predicted_values = model.predict(reshaped_numeric_string)
selected_predicted_value = tf.random.categorical(predicted_values, num_samples = 1)[0][0].numpy().astype('float32') #sample from the predicted values
#temp = reshaped_numeric_string.copy()
output_numeric_vector.append(selected_predicted_value)
reshaped_numeric_string = np.append(reshaped_numeric_string[:,1:,:], np.reshape(selected_predicted_value, (1,1,1)), axis = 1)
predicted_chars = [idx2char[x] for x in output_numeric_vector]
final_text = ''.join(predicted_chars)
return(final_text)
However, looking at the actual generation step, is it fair to say
it’s only using the last character “ “? So it’s the same whether we
use “ROMEO: “ or just “ “? It’s hard to test as it samples from the
output distribution ...
不,它正在考虑所有字符。您可以轻松地
通过使用固定的随机种子验证:
from numpy.random import seed
from tensorflow.random import set_seed
seed(1)
set_seed(1)
print('======')
print(generate_text(m, 'ROMEO: '))
seed(1)
set_seed(1)
print('======')
print(generate_text(m, ' '))
Relatedly, it’s unclear how it would predict from such a short
string as the original training sequence is much longer. I
understand if we trained on a history of 100 chars we predict the
101st and then use 2-101 to predict 102... but how does it start
with just 7 characters?
它在内部循环运行序列。它需要第一个
字符并预测第二个。然后第二个预测
第三等等。在这样做的同时,它会更新其隐藏状态,以便
它的预测变得越来越好。最终它停滞不前
因为它不能记住任意长序列。
查看本教程here,他们使用“Romeo:”的起始序列。
int(generate_text(model, start_string=u"ROMEO: "))
然而,看看实际的生成步骤,可以说它只使用最后一个字符“”吗?所以我们用“ROMEO:”还是“”是一样的?很难测试,因为它是从输出分布中采样的...
与此相关,目前尚不清楚它如何从如此短的字符串中进行预测,因为原始训练序列要长得多。我知道如果我们根据 100 个字符的历史记录进行训练,我们会预测第 101 个字符,然后使用 2-101 来预测 102...但是它是如何从 7 个字符开始的?
编辑
作为具体示例,我将模型修改为以下形式:
model = tf.keras.Sequential()
model.add(tf.keras.layers.SimpleRNN(units=512, input_shape = (seq_len, 1), activation="tanh"))
model.add(tf.keras.layers.Dense(len(vocab)))
model.compile(loss=loss, optimizer='adam')
model.summary()
注意,我使用 simpleRNN 而不是 GRU 并放弃了嵌入步骤。这两项更改都是为了简化模型,但这无关紧要。
我的训练和输出数据如下:
>>> input_array_reshaped
array([[46., 47., 53., ..., 39., 58., 1.],
[ 8., 0., 20., ..., 33., 31., 10.],
[63., 1., 44., ..., 58., 46., 43.],
...,
[47., 41., 47., ..., 0., 21., 57.],
[59., 58., 1., ..., 1., 61., 43.],
[52., 57., 43., ..., 1., 63., 53.]])
>>> input_array_reshaped.shape
(5000, 100)
>>> output_array_reshaped.shape
(5000, 1, 1)
>>> output_array_reshaped
array([[[40.]],
[[ 0.]],
[[56.]],
...,
[[ 1.]],
[[56.]],
[[59.]]])
但是,如果我尝试预测少于 100 个字符的字符串,我会得到:
ValueError: Error when checking input: expected simple_rnn_1_input to have shape (100, 1) but got array with shape (50, 1)
如果需要,下面是我的预测函数。如果我将 required_training_length 更改为 100 以外的任何值,它就会崩溃。它需要长度为 100 的 "specifically" time_steps。
有人能告诉我如何调整模型以使其更灵活吗?我缺少什么微妙之处?
def generateText(starting_corpus, num_char_to_generate = 1000, required_training_length = 100):
random_starting_int = random.sample(range(len(text)),1)[0]
ending_position = random_starting_int+required_training_length
starting_string = text[random_starting_int:ending_position]
print("Starting string is: " + starting_string)
numeric_starting_string = [char2idx[x] for x in starting_string]
reshaped_numeric_string = np.reshape(numeric_starting_string, (1, len(numeric_starting_string), 1)).astype('float32')
output_numeric_vector = []
for i in range(num_char_to_generate):
if i%50 == 0:
print("Processing character index: "+str(i))
predicted_values = model.predict(reshaped_numeric_string)
selected_predicted_value = tf.random.categorical(predicted_values, num_samples = 1)[0][0].numpy().astype('float32') #sample from the predicted values
#temp = reshaped_numeric_string.copy()
output_numeric_vector.append(selected_predicted_value)
reshaped_numeric_string = np.append(reshaped_numeric_string[:,1:,:], np.reshape(selected_predicted_value, (1,1,1)), axis = 1)
predicted_chars = [idx2char[x] for x in output_numeric_vector]
final_text = ''.join(predicted_chars)
return(final_text)
However, looking at the actual generation step, is it fair to say it’s only using the last character “ “? So it’s the same whether we use “ROMEO: “ or just “ “? It’s hard to test as it samples from the output distribution ...
不,它正在考虑所有字符。您可以轻松地 通过使用固定的随机种子验证:
from numpy.random import seed
from tensorflow.random import set_seed
seed(1)
set_seed(1)
print('======')
print(generate_text(m, 'ROMEO: '))
seed(1)
set_seed(1)
print('======')
print(generate_text(m, ' '))
Relatedly, it’s unclear how it would predict from such a short string as the original training sequence is much longer. I understand if we trained on a history of 100 chars we predict the 101st and then use 2-101 to predict 102... but how does it start with just 7 characters?
它在内部循环运行序列。它需要第一个 字符并预测第二个。然后第二个预测 第三等等。在这样做的同时,它会更新其隐藏状态,以便 它的预测变得越来越好。最终它停滞不前 因为它不能记住任意长序列。