Skip to main content
Vietnam
AIMenta
foundational · Machine Learning

Hyperparameter

A setting you choose before training begins (learning rate, batch size, number of layers) — distinct from the parameters the model learns.

Hyperparameters are the settings you choose before training; parameters are the values the model learns from data. Learning rate, batch size, number of layers, layer width, dropout rate, weight decay, number of training epochs, optimiser choice, data augmentation policy — all hyperparameters. The weights of a neural network, the split points of a decision tree, the coefficients of a linear model — all parameters. The distinction matters because the two are tuned by completely different processes: parameters by gradient descent on training data, hyperparameters by search over validation performance.

Hyperparameter search is where empirical ML actually happens. **Grid search** enumerates every combination — exhaustive but exponential in dimensions. **Random search** samples combinations — often dominates grid search because most dimensions do not matter equally. **Bayesian optimisation** (Optuna, Hyperopt, Ax) builds a probabilistic model of the validation-metric surface and picks each next trial where expected improvement is highest — the default for serious tuning budgets. **Population-based training** evolves hyperparameters during training — powerful for long-running large-scale jobs.

For APAC mid-market teams, the practical advice is to not over-invest in hyperparameter search before you have established that the problem framing and data are right. Spend the first week making a weak baseline better through data and feature work; spend week two on hyperparameter search once the ceiling of the default configuration is actually the limiting factor. Teams that invert this order discover months in that a well-tuned model of the wrong shape still does not solve the problem.

The high-leverage hyperparameters for most deep-learning workloads are **learning rate** (the single most consequential setting — order of magnitude matters), **batch size** (interacts with learning rate; scale together), and **weight decay / regularisation strength** (controls generalisation). Everything else is usually second-order.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies