Skip to main content
Malaysia
AIMenta
Acronym foundational · Deep Learning

ReLU (Rectified Linear Unit)

The most common activation function in deep learning: outputs the input if positive, zero otherwise. Simple, fast, and effective.

ReLU (Rectified Linear Unit) is defined as `f(x) = max(0, x)` — output the input when positive, zero otherwise. The function looks almost trivial but its introduction (Nair & Hinton, 2010) unlocked deep-learning training at scale. Sigmoid and tanh, the activations that preceded it, saturate to near-zero gradients for large inputs; backpropagating through many such layers collapsed gradients to the point where early-layer weights barely updated. ReLU's gradient is exactly 1 in its active region, which keeps signal flowing through arbitrarily deep stacks.

The modern activation landscape is a ReLU family tree: **Leaky ReLU** (small negative slope, avoids dead neurons), **ELU / SELU** (self-normalising variants), **GELU** (Gaussian error linear unit, the default in TransformersBERT, GPT, Llama), **Swish / SiLU** (smooth variant used in newer vision networks), **Mish** (a GELU-adjacent smooth function that appears in modern CV backbones). GELU has displaced ReLU as the default in language models; ReLU itself remains ubiquitous in classical CNNs and tabular MLPs for its combination of simplicity, speed, and sparsity.

For APAC mid-market practitioners, the activation function is almost never the place to spend engineering effort. Stick with whatever the base architecture specifies — ReLU for CNNs, GELU for transformers, whatever the recipe your pretrained checkpoint used — and focus on the decisions that actually move metrics: data quality, task framing, and evaluation design.

The one failure mode worth knowing is **dead ReLU neurons** — units that got pushed to always-negative territory by a bad initialisation or too-high learning rate and can never recover because their gradient is zero in the off region. Leaky ReLU and ELU exist in large part to patch this corner. Modern frameworks handle it with good defaults; it shows up in hand-rolled architectures.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies