Skip to main content
Singapore
AIMenta

Category · 9 terms

Hardware & Infrastructure
defined clearly.

GPUs, TPUs, accelerators, inference engines, and the silicon under it all.

Acronym intermediate

CUDA

NVIDIA's parallel computing platform and programming model that exposes GPU compute to general-purpose code — the de facto language of AI research.

advanced

Distributed Training

Splitting model training across multiple GPUs or nodes — required for any model too large or training run too long to fit on a single accelerator.

Acronym advanced

FP8

8-bit floating-point number formats (E4M3, E5M2) that enable faster training and inference at minimal accuracy loss on modern AI accelerators.

Acronym foundational

GPU

Graphics Processing Unit — massively parallel hardware that powers virtually all modern AI training and most inference workloads.

foundational

Inference (Serving)

Running a trained model to produce predictions on new data — the production workload that dominates AI cost and latency after training completes.

advanced

Mixed-Precision Training

Training a neural network using a combination of higher precision (FP32 master weights) and lower precision (FP16/BF16 compute) to gain speed without sacrificing convergence.

advanced

Quantization

Compressing a neural network by representing weights and activations in lower-precision integer formats (INT8, INT4) — typically applied at inference time to reduce memory and latency.

advanced

Tensor Core

Specialized matrix-multiplication units inside NVIDIA GPUs (Volta and later) that deliver order-of-magnitude speedups for AI workloads in lower precisions.

Acronym intermediate

TPU

Tensor Processing Unit — Google's custom AI accelerator chip, used in Google Cloud and to train Google's own models including Gemini.