Fine-tuning Model with Trainer API
Table of contents
Transformers provides a Trainer object to help in fine-tuning any of the pretrained models it provides on your dataset.
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding
raw_dataset = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(sample):
return tokenizer(sample["sentence1"], sample["sentence2"], truncation=True)
tokenized_datasets = raw_dataset.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Training
from transformers import TrainingArguments
training_args = TrainingArguments("trainer")
Defining the model
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
Now, we define Trainer by passing all the objects constructed up to now.
from transformers import Trainer
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer
)
trainer.train()
| Step | Training Loss |
|---|---|
| 500 | 0.506700 |
| 1000 | 0.249000 |
TrainOutput(global_step=1377, training_loss=0.30210025984044514, metrics={'train_runtime': 236.2031, 'train_samples_per_second': 46.587, 'train_steps_per_second': 5.83, 'total_flos': 405114969714960.0, 'train_loss': 0.30210025984044514, 'epoch': 3.0})
Evaluation
Now, we’ll create predictions for the model we built, and for that, we will use the Trainer.predict() command.
predictions = trainer.predict(tokenized_datasets["validation"])
print(predictions.predictions.shape, predictions.label_ids.shape)
(408, 2) (408,)
The output of the predict() method is a named tuple containing three fields:
- predictions
- label_ids, and
- metrics
We can see from the above output that we need to convert the logits returned by the model to transform them into predictions.
import numpy as np
preds = np.argmax(predictions.predictions, axis=-1)
It’s time to evaluate the model
import evaluate
metric = evaluate.load("glue", "mrpc")
metric.compute(
predictions=preds,
references=predictions.label_ids
)
{'accuracy': 0.8480392156862745, 'f1': 0.8949152542372881}
Here’s time to wrap everything together.
def compute_metrics(eval_predictions):
metric = evaluate.load("glue", "mrpc")
logits, labels = eval_predictions
preds = np.argmax(logits, axis=-1)
return metric.compute(
predictions=preds,
references=labels
)
Now, we’ll define Trainer with functionality of reporting metrucs at the end of each epoch.
training_args = TrainingArguments("trainer", evaluation_strategy="epoch")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
Now that we have defined the Trainer with passing additional arguments of evaluation strategy and compute_metric, we execute new training.
trainer.train()
| Epoch | Training Loss | Validation Loss | Accuracy | F1 |
|---|---|---|---|---|
| 1 | No log | 0.417862 | 0.821078 | 0.875639 |
| 2 | 0.542100 | 0.450163 | 0.845588 | 0.891566 |
| 3 | 0.340800 | 0.662934 | 0.843137 | 0.889273 |
TrainOutput(global_step=1377, training_loss=0.3811595143835529, metrics={'train_runtime': 230.8587, 'train_samples_per_second': 47.666, 'train_steps_per_second': 5.965, 'total_flos': 405114969714960.0, 'train_loss': 0.3811595143835529, 'epoch': 3.0})