Temperature (AI)
The randomness dial — Vulcan logic at 0.0, hobbit improvisation at 1.5, Gollum at anything above 2.0.
Temperature is a sampling parameter applied during LLM inference that controls how the model selects each generated token from the probability distribution over its vocabulary. At each generation step, the model assigns a probability to every possible next token; temperature scales these probabilities before sampling. At temperature 0, the model always picks the highest-probability token — producing perfectly deterministic, reproducible output. As temperature increases toward 1.0, lower-probability tokens become more likely to be selected — introducing controlled randomness that produces more varied output. At temperatures above 1.0, the distribution becomes increasingly flat, and tokens that the model considers unlikely become nearly as probable as highly likely ones — generating creative but potentially incoherent or "hallucinating" output.
Different tasks benefit from different temperature settings. Factual question answering, data extraction, and code generation benefit from low temperature (0.0-0.3) — determinism and accuracy matter more than variety. Creative writing, brainstorming, generating multiple diverse options, and playful conversational applications benefit from higher temperature (0.7-1.2) — variety and unexpectedness add value. Classification and structured output tasks typically use low temperature to ensure consistent formatting. Most AI APIs default to a temperature around 0.7-1.0, which represents a reasonable middle ground for general-purpose use. The temperature parameter is usually the first knob to adjust when an AI application's outputs are too repetitive and generic (increase temperature) or too erratic and unreliable (decrease temperature).
For B2B teams building AI applications, temperature is a tunable parameter that profoundly affects user experience and output quality. A product description generator set to low temperature will produce accurate but formulaic descriptions; set to high temperature, it might produce vivid, varied descriptions that occasionally wander off-brand or make unsupported claims. A code assistant set to high temperature might generate creative solutions but introduce bugs; at low temperature, it produces predictable, conventional code. Building AI applications with temperature as a configurable parameter — and testing the right value for each specific use case — is a straightforward optimization that often produces significant output quality improvements without any model change or fine-tuning.
Related terms
- Large Language Model (LLM)— The Sorting Hat of language models — probabilistic, trained on everything, occasionally wrong about which house you belong in.
- Inference— Running the trained model to generate output — activating the Sorting Hat after all the training: it just decides.
- Prompt Engineering— Asking the Mirror of Erised exactly what you need, not what you want — the difference between useful AI and a wish gone wrong.
- Structured Output— When the AI returns JSON instead of prose — Spock filing a report in regulation format rather than speaking freely.