Skip to main content
Mainland China
AIMenta
Acronym foundational · Generative AI

GPT (Generative Pre-trained Transformer)

OpenAI's family of autoregressive language models — the architecture and product line that defined the modern LLM era.

GPT (Generative Pre-trained Transformer) is OpenAI's family of autoregressive language models and the architecture that, more than any other, defined the modern LLM era. GPT-1 (2018) demonstrated that a decoder-only Transformer pretrained on unlabelled text, then fine-tuned on labelled tasks, could outperform task-specific models. GPT-2 (2019) showed that scaling produced emergent capabilities without task-specific supervision. GPT-3 (2020) crossed the threshold at which in-context few-shot learning became reliable, making the base model directly useful without fine-tuning. GPT-3.5 (late 2022) paired with instruction tuning and RLHF produced ChatGPT, which moved LLMs from research curiosity to consumer product overnight.

The architecture is consistent across the family: a decoder-only Transformer trained on next-token prediction over a very large text corpus, with later members adding multimodality (GPT-4, GPT-4o), longer context (GPT-4 Turbo, GPT-4.1 with 1M context), and reasoning-specialised variants (o1, o3, o-mini). OpenAI's API exposes these behind `gpt-4o`, `gpt-4o-mini`, `o3-mini`, `o1` and successor model names that churn every few months as capabilities advance. The competitive landscape — Claude (Anthropic), Gemini (Google), Llama (Meta open weights), Qwen (Alibaba), DeepSeek, Mistral — has largely converged on the same decoder-only Transformer recipe with vendor-specific differences in pretraining data, safety tuning, and product surface.

For APAC mid-market teams, GPT is usually one of several foundation-model options evaluated on price, latency, quality on the specific workload, region availability (data residency for Japan, Korea, ASEAN), and non-English performance. The right answer varies per workload — GPT often excels at code and general reasoning; Claude often leads on long-form writing and constitutional safety behaviours; Gemini has advantages in multimodal and search-grounded tasks; open-weight models (Qwen, Llama) win when self-hosting is mandatory.

The non-obvious operational note: **never hard-code a GPT model name in production paths**. OpenAI deprecates and renames models on a tight cadence; teams that pinned to `gpt-4-0314` or `text-davinci-003` have spent real effort migrating. Route all model calls through a thin adapter that takes a logical model identifier ("primary", "cheap", "reasoning") and maps to whatever the current best concrete model is — this lets upgrades happen in one place.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies