Fine-Tuning
Training a model on your specific data — Hermione studying twelve targeted textbooks versus winging it from general knowledge.
Fine-tuning takes a pre-trained model — which has learned general language understanding from massive training datasets — and continues training it on a curated, task-specific dataset to shift its behavior. During fine-tuning, the model sees examples of the target behavior (usually formatted as input-output pairs) and adjusts its weights to reproduce those patterns. The result is a model that, for the target task or domain, behaves more consistently with the desired style, terminology, and output format than the base model could achieve through prompting alone. Common fine-tuning applications include: adapting a model to a company's specific writing style, training it to produce outputs in a specific structured format without requiring format instructions in every prompt, teaching it domain-specific terminology and conventions, and shifting its default behavior for a specific persona or role.
The practical landscape of fine-tuning has been transformed by parameter-efficient methods, particularly Low-Rank Adaptation (LoRA). Traditional fine-tuning updates all of a model's billions of parameters — computationally expensive and requiring significant GPU resources. LoRA instead trains a small number of additional parameters that modulate the model's existing weights, achieving comparable task adaptation at a fraction of the compute cost. LoRA fine-tunes can run on consumer hardware for smaller models and are the basis of most custom fine-tuned models in the open-source ecosystem. Full fine-tuning still has advantages for deep behavioral changes, but LoRA is the practical choice for most customization tasks.
For B2B teams, the decision between fine-tuning, RAG, and prompt engineering involves genuine tradeoffs. Fine-tuning excels when: the behavior change is stylistic rather than factual (tone, format, persona), the target task is very well-defined with many consistent examples, or the goal is to reduce token usage by baking instructions into the model weights rather than repeating them in every prompt. Fine-tuning underperforms when: the goal is to update the model with current or changing facts (facts should go in retrieval, not weights), when high-quality training data is unavailable, or when the customization need changes frequently. For most enterprise use cases, RAG with well-designed prompts solves the immediate problem faster and more flexibly than fine-tuning — but fine-tuning becomes compelling once the application is stable and the cost of prompt tokens at scale justifies the investment in training.
Related terms
- Large Language Model (LLM)— The Sorting Hat of language models — probabilistic, trained on everything, occasionally wrong about which house you belong in.
- Retrieval-Augmented Generation (RAG)— Giving your LLM access to the Restricted Section — so it answers from real knowledge instead of confident hallucination.
- Prompt Engineering— Asking the Mirror of Erised exactly what you need, not what you want — the difference between useful AI and a wish gone wrong.
- LoRA— Like the One Ring: small, lightweight, but changes everything about how the model behaves once you put it on.
- Inference— Running the trained model to generate output — activating the Sorting Hat after all the training: it just decides.