使用新标签微调模型的分类器层
Fine-tuning model's classifier layer with new label
我想使用新数据集微调已经微调的 BertForSequenceClassification 模型,新数据集仅包含 1 个模型之前未见过的附加标签。
至此,我想将 1 个新标签添加到模型当前能够正确分类的标签集中。
此外,我不希望随机初始化分类器权重,我希望保持它们完好无损,并根据数据集示例相应地更新它们,同时将分类器层的大小增加 1。
用于进一步微调的数据集可能如下所示:
sentece,label
intent example 1,new_label
intent example 2,new_label
...
intent example 10,new_label
我模型的当前分类器层如下所示:
Linear(in_features=768, out_features=135, bias=True)
如何实现?
这甚至是一个好方法吗?
您可以使用新值扩展模型的权重和偏差。请查看下面的注释示例:
#This is the section that loads your model
#I will just use an pretrained model for this example
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jpcorb20/toxic-detector-distilroberta")
model = AutoModelForSequenceClassification.from_pretrained("jpcorb20/toxic-detector-distilroberta")
#we check the output of one sample to compare it later with the extended layer
#to verify that we kept the previous learnt "knowledge"
f = tokenizer.encode_plus("This is an example", return_tensors='pt')
print(model(**f).logits)
#Now we need to find out the name of the linear layer you want to extend
#The layers on top of distilroberta are wrapped inside a classifier section
#This name can differ for you because it can be chosen randomly
#use model.parameters instead find the classification layer
print(model.classifier)
#The output shows us that the classification layer is called `out_proj`
#We can now extend the weights by creating a new tensor that consists of the
#old weights and a randomly initialized tensor for the new label
model.classifier.out_proj.weight = nn.Parameter(torch.cat((model.classifier.out_proj.weight, torch.randn(1,768)),0))
#We do the same for the bias:
model.classifier.out_proj.bias = nn.Parameter(torch.cat((model.classifier.out_proj.bias, torch.randn(1)),0))
#and be happy when we compare the output with our expectation
print(model(**f).logits)
输出:
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895]],
grad_fn=<AddmmBackward>)
RobertaClassificationHead(
(dense): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(out_proj): Linear(in_features=768, out_features=6, bias=True)
)
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895, 2.2124]],
grad_fn=<AddmmBackward>)
我想使用新数据集微调已经微调的 BertForSequenceClassification 模型,新数据集仅包含 1 个模型之前未见过的附加标签。
至此,我想将 1 个新标签添加到模型当前能够正确分类的标签集中。
此外,我不希望随机初始化分类器权重,我希望保持它们完好无损,并根据数据集示例相应地更新它们,同时将分类器层的大小增加 1。
用于进一步微调的数据集可能如下所示:
sentece,label
intent example 1,new_label
intent example 2,new_label
...
intent example 10,new_label
我模型的当前分类器层如下所示:
Linear(in_features=768, out_features=135, bias=True)
如何实现?
这甚至是一个好方法吗?
您可以使用新值扩展模型的权重和偏差。请查看下面的注释示例:
#This is the section that loads your model
#I will just use an pretrained model for this example
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jpcorb20/toxic-detector-distilroberta")
model = AutoModelForSequenceClassification.from_pretrained("jpcorb20/toxic-detector-distilroberta")
#we check the output of one sample to compare it later with the extended layer
#to verify that we kept the previous learnt "knowledge"
f = tokenizer.encode_plus("This is an example", return_tensors='pt')
print(model(**f).logits)
#Now we need to find out the name of the linear layer you want to extend
#The layers on top of distilroberta are wrapped inside a classifier section
#This name can differ for you because it can be chosen randomly
#use model.parameters instead find the classification layer
print(model.classifier)
#The output shows us that the classification layer is called `out_proj`
#We can now extend the weights by creating a new tensor that consists of the
#old weights and a randomly initialized tensor for the new label
model.classifier.out_proj.weight = nn.Parameter(torch.cat((model.classifier.out_proj.weight, torch.randn(1,768)),0))
#We do the same for the bias:
model.classifier.out_proj.bias = nn.Parameter(torch.cat((model.classifier.out_proj.bias, torch.randn(1)),0))
#and be happy when we compare the output with our expectation
print(model(**f).logits)
输出:
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895]],
grad_fn=<AddmmBackward>)
RobertaClassificationHead(
(dense): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(out_proj): Linear(in_features=768, out_features=6, bias=True)
)
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895, 2.2124]],
grad_fn=<AddmmBackward>)