Skip to main content
Global
AIMenta
foundational · Generative AI

Foundation Model

A large model pretrained on broad data at scale that can be adapted to many downstream tasks — the base layer under modern generative AI.

A foundation model is a large model pretrained on broad data at scale (text, images, audio, code, or mixtures of these) that can be adapted to a wide range of downstream tasks via prompting, fine-tuning, retrieval augmentation, or adapter layers. The term was coined by Stanford's CRFM in 2021 to name a pattern that was already dominating AI: rather than train task-specific models from scratch, train one very large model on a very broad corpus, then specialise it cheaply for each application. GPT, Claude, Gemini, Llama, Mistral, Qwen, DeepSeek, and their vision / multimodal cousins are all foundation models.

The architecture is almost always a Transformer variant; the training is predominantly self-supervised (next-token prediction for text, contrastive for vision-language, masked prediction for some modalities); the downstream adaptation is whatever matches the task — zero-shot prompting for the simplest cases, few-shot or RAG for fact-sensitive tasks, supervised fine-tuning or DPO for custom behaviours, LoRA or QLoRA for parameter-efficient specialisation. The unifying trait is the **pretraining-then-adaptation** split, which shifted the economic centre of ML from custom-model training to prompt and adapter engineering.

For APAC mid-market enterprises, the strategic implication is that the question is no longer *should we train a model* but *which foundation model do we build on and how do we adapt it safely*. Picking the wrong base model (outdated, over-priced, poorly instruction-tuned, weak in your required languages) locks in months of downstream rework; picking the right one collapses quarters of effort into weeks. Evaluate on your actual workload, in your actual language mix, with your actual latency and cost constraints, before choosing.

The non-obvious operational note: **foundation models commoditise quickly**. A model that was state-of-the-art in 2024 is table-stakes in 2026. Architect every integration so the underlying model can be swapped without rewriting your product — stable prompt interfaces, model-agnostic evaluation harnesses, no hard-coded vendor assumptions outside a thin adapter layer. Teams that did this early spent weeks on model upgrades; teams that embedded vendor specifics throughout their code spent quarters.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies