Skip to main content
Singapore
AIMenta
foundational · Machine Learning

Overfitting

When a model memorises the training set instead of learning generalisable patterns — low training error, high test error.

Overfitting is the condition where a model has learned the training data too well — including its noise and idiosyncrasies — and consequently fails to generalise to unseen data. The signature is low training error paired with substantially higher validation and test error. The gap between the two is the most direct empirical measurement of how much overfitting has occurred. Every supervised-learning project lives on a tension between training a model rich enough to capture real patterns (avoiding underfitting) and keeping it constrained enough that it does not memorise noise (avoiding overfitting).

The toolbox for fighting overfitting is mature and multi-layered. **Regularisation** — L1, L2, elastic net on classical models; weight decay, dropout, stochastic depth on neural networks — constrains the hypothesis space. **Data augmentation** expands the effective training set without collecting more labels (random crops and flips for images, token masking and back-translation for text, mixup and cutmix for harder cases). **Early stopping** halts training when validation loss stops improving. **Cross-validation** reveals overfitting before production ever sees it. Modern large-model practice adds **parameter-efficient fine-tuning** (LoRA, adapters) as a form of implicit regularisation — updating only a small fraction of weights constrains what the model can memorise.

For APAC mid-market teams, the highest-frequency form of overfitting is not the classical kind — it's **overfitting to the evaluation set**. Teams iterate, watch one leaderboard metric go up, and ship. The model learned to do well on that specific eval, not the actual task. Defence is a **held-out test set you never look at during development**, plus periodic reality checks via user-facing evaluation. Also beware **temporal leakage** — training on data that postdates your test set, silently leaking future signal. TimeSeriesSplit-style folding fixes this.

The non-obvious diagnostic: very large foundation models can appear to overfit in misleading ways. The classical U-shape curve (validation loss rises after training loss flattens) is replaced by phenomena like **double descent** — the loss goes down, up, then down again as capacity increases. The practical advice is to trust out-of-distribution generalisation over in-distribution validation loss, especially at the scale of modern transformer fine-tuning.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies