Fine-Tuning

Training a model on your specific data — Hermione studying twelve targeted textbooks versus winging it from general knowledge.

Fine-tuning takes a pre-trained model — which has learned general language understanding from massive training datasets — and continues training it on a curated, task-specific dataset to shift its behavior. During fine-tuning, the model sees examples of the target behavior (usually formatted as input-output pairs) and adjusts its weights to reproduce those patterns. The result is a model that, for the target task or domain, behaves more consistently with the desired style, terminology, and output format than the base model could achieve through prompting alone. Common fine-tuning applications include: adapting a model to a company's specific writing style, training it to produce outputs in a specific structured format without requiring format instructions in every prompt, teaching it domain-specific terminology and conventions, and shifting its default behavior for a specific persona or role.

The practical landscape of fine-tuning has been transformed by parameter-efficient methods, particularly Low-Rank Adaptation (LoRA). Traditional fine-tuning updates all of a model's billions of parameters — computationally expensive and requiring significant GPU resources. LoRA instead trains a small number of additional parameters that modulate the model's existing weights, achieving comparable task adaptation at a fraction of the compute cost. LoRA fine-tunes can run on consumer hardware for smaller models and are the basis of most custom fine-tuned models in the open-source ecosystem. Full fine-tuning still has advantages for deep behavioral changes, but LoRA is the practical choice for most customization tasks.

For B2B teams, the decision between fine-tuning, RAG, and prompt engineering involves genuine tradeoffs. Fine-tuning excels when: the behavior change is stylistic rather than factual (tone, format, persona), the target task is very well-defined with many consistent examples, or the goal is to reduce token usage by baking instructions into the model weights rather than repeating them in every prompt. Fine-tuning underperforms when: the goal is to update the model with current or changing facts (facts should go in retrieval, not weights), when high-quality training data is unavailable, or when the customization need changes frequently. For most enterprise use cases, RAG with well-designed prompts solves the immediate problem faster and more flexibly than fine-tuning — but fine-tuning becomes compelling once the application is stable and the cost of prompt tokens at scale justifies the investment in training.

fine-tuningLLMmodel trainingcustom AImachine learningdomain adaptation

Related terms

← Back to Glossary