Skip to main content
Malaysia
AIMenta
G

Guidance AI

by Microsoft

LLM constrained generation framework — interleaving prompt templates and code to precisely control LLM output structure, enabling APAC teams to guarantee valid JSON, decision branches, and typed responses from local and API-based LLMs.

AIMenta verdict
Decent fit
4/5

"Constrained LLM generation — APAC AI engineers use Guidance to write interleaved prompts and code that constrain LLM outputs to valid structures, enabling reliable JSON extraction, decision trees, and structured data generation from APAC LLM applications."

Features
6
Use cases
1
Watch outs
3
What it does

Key features

  • Token-level constraints: APAC structurally valid JSON/regex output (not post-processing)
  • Interleaved programs: APAC prompt + code + generation in unified template syntax
  • Local LLM support: llama.cpp/vLLM/Transformers for APAC data sovereignty
  • Select constraints: APAC predefined option sets for LLM choice points
  • Decision trees: APAC conditional generation branching based on LLM outputs
  • Open-source: MIT licensed for APAC commercial deployment and modification
When to reach for it

Best for

  • APAC AI engineers building structured data extraction and document processing pipelines who need guaranteed output format compliance — particularly APAC financial services, legal, and healthcare teams extracting specific fields from unstructured APAC documents where JSON parsing failures are unacceptable.
Don't get burned

Limitations to know

  • ! Token-level constraints only work with supported APAC local backends — OpenAI API constraints are approximate
  • ! Steeper learning curve than prompt engineering for APAC developers new to constrained generation
  • ! Program complexity increases for APAC multi-step generation with many interleaved constraints
Context

About Guidance AI

Guidance AI is a Microsoft open-source framework for constrained LLM generation — allowing APAC developers to write programs that interleave natural language prompts with Python code, branching logic, and output constraints that the LLM must satisfy. Unlike prompt engineering (which asks the LLM to produce valid JSON), Guidance constrains the LLM's token sampling to only produce tokens that remain valid according to the specified structure.

Guidance's generation constraints work at the token level — when generating JSON, Guidance ensures the LLM only produces tokens that are valid continuations of a JSON string, making malformed JSON structurally impossible rather than just unlikely. This token-level constraint is the key difference from output parsers that try to fix LLM JSON after generation: Guidance produces valid structure on the first pass, eliminating APAC retry logic and error handling for malformed outputs.

Guidance programs use a template syntax that mixes static text, `{{gen}}` LLM generation blocks, and `{{select}}` constrained choice blocks — APAC teams write decision trees where the LLM fills in choices from a predefined APAC option set, generates text within validated format constraints, or branches based on LLM outputs. This enables APAC structured data extraction pipelines where the LLM populates a specific APAC schema without post-processing.

Guidance supports local LLM backends (llama.cpp, vLLM, Transformers) and API providers (OpenAI, Anthropic) — APAC teams running on-premise LLMs for data sovereignty use Guidance with local models for fully constrained generation without cloud API calls. For APAC financial services extracting structured data from regulatory documents, Guidance eliminates the JSON parsing failure mode that affects prompt-based extraction approaches.

Beyond this tool

Where this category meets practice depth.

A tool only matters in context. Browse the service pillars that operationalise it, the industries where it ships, and the Asian markets where AIMenta runs adoption programs.