Skip to main content
Hong Kong
AIMenta
intermediate · Machine Learning

Bias-Variance Tradeoff

The fundamental ML tradeoff: simpler models have high bias (underfit), complex models have high variance (overfit) — total error minimises somewhere between.

The bias-variance tradeoff is the fundamental decomposition of a supervised-learning model's expected error. **Bias** is the error from the model's structural inability to represent the true function — a linear model on curved data has high bias because no amount of data will let a line fit the curve. **Variance** is the error from sensitivity to the specific training set — a very flexible model may fit one sample of data well but produce a completely different function given a different sample. **Irreducible error** is the noise in the labels themselves, which no model can remove. The classical result decomposes expected squared error as `bias² + variance + irreducible error`, and practitioners spend careers learning to locate the sweet spot.

The curve is U-shaped along model complexity: very simple models have high bias and low variance, very complex models have low bias and high variance, and the total error minimises somewhere between. Every ML technique that fights overfitting (regularisation, early stopping, data augmentation, cross-validation) is implicitly trading bias for variance or vice versa. Bagging (random forests) reduces variance by averaging many independent high-variance models; boosting (XGBoost, LightGBM) reduces bias by iteratively fitting residuals.

For APAC mid-market ML teams, the practical advice is to reason explicitly about where on the bias-variance curve your current model sits. **Symptom: training error high, validation error high** — you are on the high-bias side; try more model capacity, more features, less regularisation. **Symptom: training error low, validation error high** — you are on the high-variance side; try more data, more regularisation, earlier stopping. The worst failure mode is blindly changing settings without knowing which direction you are moving in.

The non-obvious modern wrinkle is **double descent** — the phenomenon where very over-parameterised models (modern deep networks, pretrained foundation models) can have lower test error than less over-parameterised ones, violating the neat U-shape. This does not repeal the tradeoff; it shows that at sufficiently large scale the interpolation regime behaves differently than the classical regime. For practical work, the classical intuition still guides everything up to the point where you are fine-tuning a foundation model, at which point empirical evaluation beats theoretical intuition.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies