Skip to main content
Hong Kong
AIMenta

Category · 14 terms

Deep Learning
defined clearly.

Neural network architectures — CNNs, RNNs, Transformers — that power today's frontier models.

foundational

Activation Function

The nonlinear function applied at each neuron — what gives neural networks their expressive power. Common choices: ReLU, GELU, Sigmoid, Tanh.

intermediate

Attention Mechanism

A neural-network mechanism that lets a model selectively focus on relevant parts of its input — the operation at the heart of the Transformer.

intermediate

Backpropagation

The algorithm that computes gradients through a neural network by the chain rule — the engine that makes deep learning possible.

Acronym intermediate

Convolutional Neural Network (CNN)

A neural-network architecture specialised for grid-structured data (especially images) via learned convolutional filters that exploit spatial locality.

foundational

Deep Learning

A branch of machine learning using multi-layer neural networks — the dominant paradigm behind modern AI breakthroughs.

intermediate

Encoder-Decoder Architecture

A two-stage architecture where an encoder compresses input into a latent representation and a decoder generates output from it — the foundation of seq2seq models.

Acronym advanced

Long Short-Term Memory (LSTM)

A type of RNN with explicit gates that learn what to remember and what to forget — the workhorse of pre-transformer sequence modelling.

advanced

Multi-Head Attention

Parallel attention computations that let a model attend to different relationship types simultaneously — the workhorse layer of every Transformer.

foundational

Neural Network

A machine-learning model loosely inspired by biological neurons — layers of interconnected nodes that learn representations through backpropagation.

Acronym intermediate

Recurrent Neural Network (RNN)

A neural-network architecture for sequential data that processes one timestep at a time, carrying hidden state forward — displaced by Transformers for most tasks.

Acronym foundational

ReLU (Rectified Linear Unit)

The most common activation function in deep learning: outputs the input if positive, zero otherwise. Simple, fast, and effective.

advanced

Self-Attention

The mechanism that lets every position in a sequence attend to every other — the core operation of Transformers.

intermediate

Softmax

A function that converts a vector of arbitrary real numbers into a probability distribution that sums to 1.

intermediate

Transformer

The neural-network architecture that replaced RNNs for sequence modelling and became the backbone of modern AI — introduced in "Attention Is All You Need" (2017).