Skip to main content
Taiwan
AIMenta
G

Guidance

by Microsoft

Microsoft's open-source LLM generation control library that enforces output structure through token-level probability masking — enabling APAC engineering teams to guarantee JSON, regex-matching, and grammar-constrained LLM outputs for Japanese, Korean, and Chinese structured data extraction without post-processing parsing failures.

AIMenta verdict
Recommended
5/5

"Microsoft Guidance for APAC structured LLM output — token-level generation steering that enforces regex, JSON schema, and grammar constraints on LLM output, eliminating APAC post-processing parsing failures for JSON extraction and structured data pipelines."

Features
6
Use cases
1
Watch outs
3
What it does

Key features

  • Token masking: APAC zero JSON parse failures via logit-level constraint enforcement
  • JSON schema: APAC guarantee schema-valid output from APAC-language LLM extraction
  • Regex constraint: APAC regex-pattern enforcement on LLM character-level output
  • Prompt programs: APAC control flow (if/loops/variables) in generation templates
  • Local model: APAC full constraint on HuggingFace/llama.cpp APAC-language models
  • Multi-step: APAC branch extraction strategy based on intermediate generation results
When to reach for it

Best for

  • APAC engineering teams building structured data extraction pipelines from Japanese, Korean, and Chinese text — particularly APAC organizations that need guaranteed parseable JSON output from LLM extraction with zero parsing failure tolerance, using local models where full logit access enables token-level constraint enforcement rather than prompt-based output structuring.
Don't get burned

Limitations to know

  • ! APAC full token masking requires logit access — not available for all cloud LLM APIs
  • ! APAC complex generation templates increase prompt debugging complexity vs simple structured output
  • ! APAC constraint overhead adds inference latency compared to unconstrained generation
Context

About Guidance

Guidance is an open-source LLM generation control library from Microsoft that provides APAC engineering teams with token-level output constraint enforcement — masking the probability distribution of LLM token generation to make it structurally impossible for the model to produce output that violates the specified JSON schema, regex pattern, or context-free grammar. APAC data extraction pipelines that require structured output from Japanese, Korean, and Chinese text use Guidance to guarantee parseable outputs without the retry loops and fallback logic required by unconstrained prompt-based extraction.

Guidance's generation constraint model works by intercepting the LLM's logit distribution at each token generation step and masking to zero the probability of any token that would make the output unable to satisfy the declared constraint — if the output schema requires a JSON integer field and the LLM begins generating a non-numeric token, Guidance masks that token to zero probability before sampling, forcing the model to generate a valid token. APAC teams using Guidance measure zero JSON parsing failures in production extraction pipelines, compared to 2-8% parsing failure rates with unconstrained prompt-based extraction on the same APAC-language inputs.

Guidance's template language enables APAC teams to interleave LLM generation with programmatic control flow — inserting conditional branches, loop constructs, and variable reuse directly in the generation template — creating prompt programs that adapt their generation strategy based on LLM outputs at intermediate steps. APAC information extraction systems use Guidance templates to implement multi-step extraction logic: extract entity type first, then branch to the appropriate schema for that entity type, then validate the extracted values against business rules within the same generation pass.

Guidance integrates with local models (HuggingFace transformers, llama.cpp) and cloud APIs supporting logit access — APAC teams deploying on Llama or Mistral models locally use Guidance's full token-masking capability, while APAC teams using OpenAI APIs use Guidance's JSON schema mode which leverages OpenAI's structured output feature for schema-constrained generation without direct logit access.

Beyond this tool

Where this category meets practice depth.

A tool only matters in context. Browse the service pillars that operationalise it, the industries where it ships, and the Asian markets where AIMenta runs adoption programs.