Skip to main content
Malaysia
AIMenta
foundational · Generative AI

Temperature (Sampling)

A hyperparameter that controls how random LLM outputs are — low temperature gives deterministic, focused outputs; high temperature gives diverse, creative outputs.

Temperature is the most commonly tuned inference-time hyperparameter in LLM deployments and the one most often misunderstood. Mechanically, it divides the model's raw logits before they pass through softmax to produce a probability distribution over next tokens. **Temperature = 0** (often implemented as greedy decoding) picks the single highest-probability token every time — deterministic, focused, brittle on edge cases. **Temperature = 1** samples from the unchanged distribution — balanced, the pretraining default. **Temperature = 2** flattens the distribution — much higher chance of unusual tokens, more diverse but more error-prone.

The decision heuristic for production: use low temperature (0 to 0.3) for **extractive / structured work** — classification, function-calling argument selection, JSON generation, SQL generation, factual question-answering over grounded context. Use medium temperature (0.5 to 0.8) for **writing and summarisation** where variety reads as natural. Use higher temperature (0.9+) only for **explicitly creative** tasks — brainstorming, naming, fiction — and even then most production systems find 0.7 produces output that human reviewers prefer.

The common failure modes: running generation-style creative tasks at temperature 0 (the outputs are stiff and repetitive across calls), running extraction tasks at temperature 1 (the JSON is occasionally malformed or the function call occasionally picks the wrong tool), and forgetting that **top-p / nucleus sampling** interacts with temperature. Most production stacks fix top-p at 0.95 and tune temperature alone; tuning both without a principled eval harness is an easy way to waste a week.

For APAC mid-market, the non-obvious operational note is that different foundation-model vendors calibrate temperature differently — Claude at 0.5 does not match GPT at 0.5 or Gemini at 0.5. If you run the same prompt across vendors, tune per-vendor and record the setting alongside the prompt version. Otherwise a seemingly harmless vendor switch will silently change output style in ways a rubric will detect but a human reviewer may not.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies