Skip to the content.

Fine-tuning


In one sentence

Fine-tuning is the process of taking a pre-trained large language model and continuing its training on a smaller, specific dataset so the model permanently learns new behaviour, style, or domain knowledge that lives inside its own weights rather than being passed in at query time.

Why fine-tuning exists

A pre-trained language model is a generalist. It has read a great deal but knows nothing about the specifics of your firm, your tone of voice, or the quirks of your domain. There are three ways to address that:

  1. Prompt engineering — write better instructions in the prompt. Fast, free, but limited.
  2. RAG — fetch relevant documents at query time and paste them into the prompt. Fast to set up, stays current, no model retraining required. (See rag.md.)
  3. Fine-tuning — actually retrain the model on your data so it learns the patterns directly. Slower, more expensive, but produces a model that behaves differently, not just one that knows differently.

Fine-tuning is the heaviest of the three. Most projects should start with the lighter options and only reach for fine-tuning when those genuinely run out of road.

What it actually does — concretely

Pre-training a frontier model from scratch is a months-long, billion-token, millions-of-dollars effort. Fine-tuning is the much cheaper sibling: take an already-trained model and continue its training for a relatively small number of steps on a focused dataset.

Three common varieties:

Where fine-tuning genuinely beats RAG

RAG is brilliant for “the model needs to know things from my corpus.” Fine-tuning is the right move when one of these is true:

If none of these apply, RAG plus good prompting is almost always the right choice.

Working example — a hypothetical for an Isenberg context

Imagine the Management Department wanted an AI tutor for case-method discussion that talks like Isenberg faculty do — same level of rigour, same vocabulary, same Socratic style. The pieces:

The cost of doing this in 2026 is no longer prohibitive — a one-time LoRA fine-tune on a 70B model can be run for a few thousand dollars, and the resulting adapter weights are tiny (megabytes) and easy to share among colleagues.

The pre-conditions, though, are non-trivial: someone has to curate those 5,000 transcripts, with quality labels, with permissions, with FERPA-clean handling. The data work, again, is the hard part.

Why this matters in a teaching context

For BBA and MBA students, fine-tuning is interesting because of where the cost has moved over the past three years:

The strategic point worth surfacing in class: the bottleneck has moved from compute to data quality and evaluation. Most organizations that fail at fine-tuning fail because they had garbage training data or no honest way to measure whether the fine-tuned model actually does better than the base model on real tasks. Both problems are organizational, not technical.

Fine-tuning vs. RAG — when to use which

  Fine-tuning RAG
Speed to deploy Days to weeks Hours to days
Cost per change Expensive (re-train) Cheap (re-index)
Stays current No (frozen at training time) Yes (always reads latest corpus)
Privacy Documents permanently in weights Documents leave at query time only
Best for Style, tone, domain language, output formats Private knowledge bases, current data
Model size impact Can let you use a smaller, faster model Generally needs a capable base model

The two are not mutually exclusive. Many production systems use both: fine-tune the model for tone and behaviour; layer RAG on top for current knowledge. They solve different problems.

Trade-offs


Related entries: rag.md, embedding.md, *(planned).*

Return to Dictionary All Entries (A–Z)