如何在 HuggingFace Transformers 库中获取预训练 BERT 模型的中间层输出？

Question

（我正在关注 this 关于 BERT 词嵌入的 pytorch 教程，在教程中作者访问了 BERT 模型的中间层。）

我想要的是使用 HuggingFace 的 Transformers 库访问 TensorFlow2 中 BERT 模型的最后一个输入标记的最后 4 层。因为每一层输出一个长度为 768 的向量，所以最后 4 层的形状将是 4*768=3072（对于每个标记）。

我如何在 TF/keras/TF2 中实现这一点，以获得输入标记的预训练模型的中间层？（稍后我将尝试获取句子中每个标记的标记，但现在一个标记就足够了）。

我正在使用 HuggingFace 的 BERT 模型：

!pip install transformers
from transformers import (TFBertModel, BertTokenizer)

bert_model = TFBertModel.from_pretrained("bert-base-uncased")  # Automatically loads the config
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
sentence_marked = "hello"
tokenized_text = bert_tokenizer.tokenize(sentence_marked)
indexed_tokens = bert_tokenizer.convert_tokens_to_ids(tokenized_text)

print (indexed_tokens)
>> prints [7592]

输出是一个token（[7592]），应该是BERT模型的输入。

Answer 1

BERT 模型输出的第三个元素是一个元组，它由嵌入层的输出以及中间层的隐藏状态组成。来自 documentation:

hidden_states (tuple(tf.Tensor), optional, returned when config.output_hidden_states=True): tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

对于 bert-base-uncased 型号，config.output_hidden_states 默认为 True。因此，要访问 12 个中间层的隐藏状态，您可以执行以下操作：

outputs = bert_model(input_ids, attention_mask)
hidden_states = outputs[2][1:]

hidden_states元组中有12个元素对应从头到尾的所有层，每一个都是一个(batch_size, sequence_length, hidden_size)形状的数组。因此，例如，要访问批次中所有样本的第五个标记的第三层隐藏状态，您可以这样做：hidden_states[2][:,4].

请注意，如果您正在加载的模型默认情况下没有 return 隐藏状态，那么您可以使用 BertConfig class 加载配置并传递 output_hidden_state=True参数，像这样：

config = BertConfig.from_pretrained("name_or_path_of_model",
                                    output_hidden_states=True)

bert_model = TFBertModel.from_pretrained("name_or_path_of_model",
                                         config=config)

如何在 HuggingFace Transformers 库中获取预训练 BERT 模型的中间层输出？

How to get intermediate layers' output of pre-trained BERT model in HuggingFace Transformers library?

keras

tensorflow

tensorflow2.0

bert-language-model

huggingface-transformers