What it does

Key features

Token masking: APAC zero JSON parse failures via logit-level constraint enforcement
JSON schema: APAC guarantee schema-valid output from APAC-language LLM extraction
Regex constraint: APAC regex-pattern enforcement on LLM character-level output
Prompt programs: APAC control flow (if/loops/variables) in generation templates
Local model: APAC full constraint on HuggingFace/llama.cpp APAC-language models
Multi-step: APAC branch extraction strategy based on intermediate generation results

When to reach for it

Best for

APAC engineering teams building structured data extraction pipelines from Japanese, Korean, and Chinese text — particularly APAC organizations that need guaranteed parseable JSON output from LLM extraction with zero parsing failure tolerance, using local models where full logit access enables token-level constraint enforcement rather than prompt-based output structuring.

Don't get burned

Limitations to know

! APAC full token masking requires logit access — not available for all cloud LLM APIs
! APAC complex generation templates increase prompt debugging complexity vs simple structured output
! APAC constraint overhead adds inference latency compared to unconstrained generation

Context

About Guidance

Guidance is an open-source LLM generation control library from Microsoft that provides APAC engineering teams with token-level output constraint enforcement — masking the probability distribution of LLM token generation to make it structurally impossible for the model to produce output that violates the specified JSON schema, regex pattern, or context-free grammar. APAC data extraction pipelines that require structured output from Japanese, Korean, and Chinese text use Guidance to guarantee parseable outputs without the retry loops and fallback logic required by unconstrained prompt-based extraction.

Guidance's generation constraint model works by intercepting the LLM's logit distribution at each token generation step and masking to zero the probability of any token that would make the output unable to satisfy the declared constraint — if the output schema requires a JSON integer field and the LLM begins generating a non-numeric token, Guidance masks that token to zero probability before sampling, forcing the model to generate a valid token. APAC teams using Guidance measure zero JSON parsing failures in production extraction pipelines, compared to 2-8% parsing failure rates with unconstrained prompt-based extraction on the same APAC-language inputs.

Guidance's template language enables APAC teams to interleave LLM generation with programmatic control flow — inserting conditional branches, loop constructs, and variable reuse directly in the generation template — creating prompt programs that adapt their generation strategy based on LLM outputs at intermediate steps. APAC information extraction systems use Guidance templates to implement multi-step extraction logic: extract entity type first, then branch to the appropriate schema for that entity type, then validate the extracted values against business rules within the same generation pass.

Guidance integrates with local models (HuggingFace transformers, llama.cpp) and cloud APIs supporting logit access — APAC teams deploying on Llama or Mistral models locally use Guidance's full token-masking capability, while APAC teams using OpenAI APIs use Guidance's JSON schema mode which leverages OpenAI's structured output feature for schema-constrained generation without direct logit access.

Guidance

Key features

Best for

Limitations to know

About Guidance

Where this category meets practice depth.