Keras w/ Tensorflow 批量提取中间层
Keras w/ Tensorflow intermediate layer extraction in batches
我目前正在尝试利用我已经训练好的 DL 模型中的中间层作为对给定输入的嵌入。下面的代码已经可以获取我想要的层,但是对于大量输入迭代地执行此操作非常慢。
model = load_model('model.h5')
inp = model.input
outputs = [layer.output for layer in model.layers]
functors = [K.function([inp]+ [K.learning_phase()], [out]) for out in outputs]
def text2tensor(text):
"""Convert string to tensor"""
tensor = tokenizer.texts_to_sequences([text])
tensor = pad_sequences(tensor, maxlen=10, padding='pre')
return tensor
def get_embedding(tensor, at_layer):
"""Get output at particular layer in network """
functors = [K.function([inp]+ [K.learning_phase()], [out]) for out in outputs][at_layer-1]
layer_outs = [func([tensor, 1.]) for func in [functors]]
return layer_outs[0][0]
texts = ['this is my first text',
'this is my second text',
'this is my third text',
.....nth text]
embeddings = np.empty((0,256))
for t in texts:
tensor = text2tensor(t)
embedding = get_embedding(tensor,at_layer=4)
embeddings = np.append(embeddings,[embedding[0]],axis=0)
我如何利用批处理,而不必一个接一个地做这件事?上面的实现速度非常慢,但它有效。
除了我在评论中提到的一点,我建议你创建一个模型而不是后端函数:
input_tensor = Input(shape=(10,)) # assuming maxlen=10
new_model = Model(input_tensor, my_desired_layer.output)
然后,首先预处理您的文本数据以形成输入数组(即下面的 my_data
),然后使用 predict
方法并向其传递 batch_size
参数以利用批处理:
out = new_model.predict(my_data) # the default batch size is 32
我目前正在尝试利用我已经训练好的 DL 模型中的中间层作为对给定输入的嵌入。下面的代码已经可以获取我想要的层,但是对于大量输入迭代地执行此操作非常慢。
model = load_model('model.h5')
inp = model.input
outputs = [layer.output for layer in model.layers]
functors = [K.function([inp]+ [K.learning_phase()], [out]) for out in outputs]
def text2tensor(text):
"""Convert string to tensor"""
tensor = tokenizer.texts_to_sequences([text])
tensor = pad_sequences(tensor, maxlen=10, padding='pre')
return tensor
def get_embedding(tensor, at_layer):
"""Get output at particular layer in network """
functors = [K.function([inp]+ [K.learning_phase()], [out]) for out in outputs][at_layer-1]
layer_outs = [func([tensor, 1.]) for func in [functors]]
return layer_outs[0][0]
texts = ['this is my first text',
'this is my second text',
'this is my third text',
.....nth text]
embeddings = np.empty((0,256))
for t in texts:
tensor = text2tensor(t)
embedding = get_embedding(tensor,at_layer=4)
embeddings = np.append(embeddings,[embedding[0]],axis=0)
我如何利用批处理,而不必一个接一个地做这件事?上面的实现速度非常慢,但它有效。
除了我在评论中提到的一点,我建议你创建一个模型而不是后端函数:
input_tensor = Input(shape=(10,)) # assuming maxlen=10
new_model = Model(input_tensor, my_desired_layer.output)
然后,首先预处理您的文本数据以形成输入数组(即下面的 my_data
),然后使用 predict
方法并向其传递 batch_size
参数以利用批处理:
out = new_model.predict(my_data) # the default batch size is 32