Category · 14 terms
Deep Learning
defined clearly.
Neural network architectures — CNNs, RNNs, Transformers — that power today's frontier models.
Activation Function
The nonlinear function applied at each neuron — what gives neural networks their expressive power. Common choices: ReLU, GELU, Sigmoid, Tanh.
Attention Mechanism
A neural-network mechanism that lets a model selectively focus on relevant parts of its input — the operation at the heart of the Transformer.
Backpropagation
The algorithm that computes gradients through a neural network by the chain rule — the engine that makes deep learning possible.
Convolutional Neural Network (CNN)
A neural-network architecture specialised for grid-structured data (especially images) via learned convolutional filters that exploit spatial locality.
Deep Learning
A branch of machine learning using multi-layer neural networks — the dominant paradigm behind modern AI breakthroughs.
Encoder-Decoder Architecture
A two-stage architecture where an encoder compresses input into a latent representation and a decoder generates output from it — the foundation of seq2seq models.
Long Short-Term Memory (LSTM)
A type of RNN with explicit gates that learn what to remember and what to forget — the workhorse of pre-transformer sequence modelling.
Multi-Head Attention
Parallel attention computations that let a model attend to different relationship types simultaneously — the workhorse layer of every Transformer.
Neural Network
A machine-learning model loosely inspired by biological neurons — layers of interconnected nodes that learn representations through backpropagation.
Recurrent Neural Network (RNN)
A neural-network architecture for sequential data that processes one timestep at a time, carrying hidden state forward — displaced by Transformers for most tasks.
ReLU (Rectified Linear Unit)
The most common activation function in deep learning: outputs the input if positive, zero otherwise. Simple, fast, and effective.
Self-Attention
The mechanism that lets every position in a sequence attend to every other — the core operation of Transformers.
Softmax
A function that converts a vector of arbitrary real numbers into a probability distribution that sums to 1.
Transformer
The neural-network architecture that replaced RNNs for sequence modelling and became the backbone of modern AI — introduced in "Attention Is All You Need" (2017).