Reasoning models represent the biggest architectural shift in LLMs since RLHF. Instead of predicting the final answer directly, these models produce a long chain of intermediate thought (chain-of-thought reasoning) that is often hidden from the user but scored during training. OpenAI's **o1** and **o3** series opened the paradigm; **DeepSeek-R1**, **Claude Opus 4 with extended thinking**, and **Google Gemini 2.5 Deep Think** now compete at the frontier. Open-weight reasoning models (DeepSeek-R1, QwQ) brought frontier-level performance to self-hosting for the first time.
The trade-off is **latency and cost**: reasoning models think for seconds to minutes before the first output token, consuming 10–100× more compute per response than equivalent non-reasoning models. The payoff is higher accuracy on problems where one-shot prediction fails — competitive math (AIME, IMO), code-generation under complex specs, multi-hop retrieval questions, and agentic tool-use sequences.
Production heuristic for APAC enterprises: use a reasoning model when the cost of a wrong answer is large (medical triage, legal analysis, finance reconciliation, code that will ship to production). Use a faster non-reasoning model for high-volume, low-stakes workflows (support drafts, summarisation, tagging). The cost gap will compress — as of 2026, reasoning-model tokens cost 3–6× non-reasoning tokens — but the **latency gap** (5–60 second first-token delay) may remain structural.
Where AIMenta applies this
Service lines where this concept becomes a deliverable for clients.
Beyond this term
Where this concept ships in practice.
Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.
Other service pillars
By industry