Large Language Model (LLM) — AIMenta AI Encyclopedia

A Large Language Model (LLM) is a neural network — nearly always a decoder-only Transformer — trained on vast text corpora to predict the next token given the preceding ones. The "large" in the name is qualitative rather than precise; by 2026, models ranging from a few billion to a trillion-plus parameters are all commonly called LLMs. What makes them transformative is not the architecture itself (which is mostly unchanged since 2017) but the scale at which pretraining happens — trillions of tokens of text, vision, audio, and code — which produces emergent capabilities unavailable at smaller scale: instruction following, in-context learning, tool use, multi-step reasoning.

The LLM landscape has three tiers in 2026. **Frontier closed-weight models** — GPT-4-family, Claude Opus / Sonnet, Gemini Ultra / Pro — sit at the capability top, accessed via API. **Strong open-weight models** — Llama 4, Mistral, Qwen 2.5 / 3, DeepSeek V3, Yi — run on self-hosted or cloud-managed infrastructure and closed most of the capability gap over 2024-25. **Small task-specialised models** — Phi, Gemma, Qwen-small, distilled or fine-tuned open models — handle specific workloads at a fraction of the cost and latency. Choice depends on quality requirements, latency / cost budgets, data-residency constraints, and willingness to carry MLOps overhead.

For APAC mid-market enterprises, the core strategic decision is **what tier to build on for each workload**, not whether to use LLMs at all. High-stakes customer-facing reasoning usually justifies frontier models; high-volume classification or extraction often runs cheaper and faster on small specialised models; anything involving data-residency-sensitive content may require open-weight models on local infrastructure. Most production AI systems end up with a portfolio — frontier for quality-critical paths, small models for bulk paths — routed by an orchestration layer.

The non-obvious operational note: **LLM capability ceilings move every 3-6 months**. A workload that required GPT-4 in 2023 often runs acceptably on a small open-weight model by 2026. Architect so the backing model is swappable — thin adapter layer, model-agnostic evaluation harness, no hard-coded vendor assumptions in business logic. Teams that did this early have shipped multiple model upgrades cheaply; teams that embedded vendor specifics throughout their code have paid for each upgrade in weeks of rework.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

service Workflow Automation service Software & Platforms

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Other service pillars

AI Strategy & Advisory Training & Enablement Talent & Hiring Infrastructure & Cloud

By industry

Financial services Retail & e-commerce Manufacturing Logistics Healthcare Professional services Public sector Real estate Technology Education

By Asian market

🇭🇰 Hong Kong 🇨🇳 Mainland China 🇹🇼 Taiwan 🇯🇵 Japan 🇰🇷 Korea 🇸🇬 Singapore 🇲🇾 Malaysia 🇻🇳 Vietnam 🇮🇩 Indonesia

Continue with All terms · AI tools · Insights · Case studies