Skip to main content
Taiwan
AIMenta
intermediate · Foundations & History

Markov Chain

A stochastic process where the next state depends only on the current state — the mathematical foundation of many ML techniques and the original language-model idea.

A Markov chain is a stochastic process in which the probability of transitioning to the next state depends only on the current state, not on the full history. This **Markov property** — "the future is conditionally independent of the past, given the present" — is a dramatic simplification that makes many otherwise-intractable probabilistic computations tractable. The chain is defined by a set of states and a transition-probability matrix; from any state, a random walk through the chain can be simulated by sampling transitions according to their probabilities.

Markov chains have been foundational across machine learning, statistics, and AI. **Hidden Markov Models** (HMMs) dominated speech recognition and sequence modelling from the 1980s through the early 2010s. **Markov Chain Monte Carlo** (MCMC) remains the standard for Bayesian inference at modest scale. **PageRank**, Google's original ranking algorithm, is a Markov chain over the web graph. **Language models** themselves began as Markov chains — the n-gram models of the 1980s-2000s were explicit k-th-order Markov chains over words. Modern Transformers relax the Markov property (they attend to arbitrarily-far history in context), but much of the statistical framing of language modelling still has Markovian roots.

For APAC mid-market teams, Markov chains directly appear in relatively narrow applications — queue modelling, customer-journey analysis, A/B-test posterior estimation, reinforcement-learning problem formulations. The broader value is conceptual: **when a system's next behaviour depends on many things but most of them are captured by current state**, Markov chains are the right modelling lens. When history genuinely matters, that is a signal to use a richer model (RNN, Transformer, higher-order Markov, or full dynamic Bayesian network).

The non-obvious operational note: **the Markov property is often a working approximation rather than a truth**. Many real-world processes have history effects that the current state does not fully capture. Assuming Markov when reality is non-Markov produces models that look reasonable on average but fail in edge cases where history genuinely drives the outcome. Check the assumption before you rely on it — compare a fitted Markov model's predictions with and without longer history context.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies