Fine-tuning Model with Trainer API

Table of contents

Training
Evaluation

Transformers provides a Trainer object to help in fine-tuning any of the pretrained models it provides on your dataset.

from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_dataset = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize_function(sample):
  return tokenizer(sample["sentence1"], sample["sentence2"], truncation=True)

tokenized_datasets = raw_dataset.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Training

from transformers import TrainingArguments
training_args = TrainingArguments("trainer")

Defining the model

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

Now, we define Trainer by passing all the objects constructed up to now.

from transformers import Trainer

trainer = Trainer(
  model,
  training_args,
  train_dataset=tokenized_datasets["train"],
  eval_dataset=tokenized_datasets["validation"],
  data_collator=data_collator,
  tokenizer=tokenizer
)
trainer.train()

[1377/1377 03:53, Epoch 3/3]

Step	Training Loss
500	0.506700
1000	0.249000

TrainOutput(global_step=1377, training_loss=0.30210025984044514, metrics={'train_runtime': 236.2031, 'train_samples_per_second': 46.587, 'train_steps_per_second': 5.83, 'total_flos': 405114969714960.0, 'train_loss': 0.30210025984044514, 'epoch': 3.0})

Evaluation

Now, we’ll create predictions for the model we built, and for that, we will use the Trainer.predict() command.

predictions = trainer.predict(tokenized_datasets["validation"])
print(predictions.predictions.shape, predictions.label_ids.shape)

(408, 2) (408,)

The output of the predict() method is a named tuple containing three fields:

predictions
label_ids, and
metrics

We can see from the above output that we need to convert the logits returned by the model to transform them into predictions.

import numpy as np
preds = np.argmax(predictions.predictions, axis=-1)

It’s time to evaluate the model

import evaluate

metric = evaluate.load("glue", "mrpc")
metric.compute(
    predictions=preds,
    references=predictions.label_ids
)

{'accuracy': 0.8480392156862745, 'f1': 0.8949152542372881}

Here’s time to wrap everything together.

def compute_metrics(eval_predictions):
    metric = evaluate.load("glue", "mrpc")
    logits, labels = eval_predictions
    preds = np.argmax(logits, axis=-1)

    return metric.compute(
        predictions=preds,
        references=labels
  )

Now, we’ll define Trainer with functionality of reporting metrucs at the end of each epoch.

training_args = TrainingArguments("trainer", evaluation_strategy="epoch")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Now that we have defined the Trainer with passing additional arguments of evaluation strategy and compute_metric, we execute new training.

trainer.train()

[1377/1377 03:50, Epoch 3/3]

Epoch	Training Loss	Validation Loss	Accuracy	F1
1	No log	0.417862	0.821078	0.875639
2	0.542100	0.450163	0.845588	0.891566
3	0.340800	0.662934	0.843137	0.889273

TrainOutput(global_step=1377, training_loss=0.3811595143835529, metrics={'train_runtime': 230.8587, 'train_samples_per_second': 47.666, 'train_steps_per_second': 5.965, 'total_flos': 405114969714960.0, 'train_loss': 0.3811595143835529, 'epoch': 3.0})