Key features
- Token masking: APAC zero JSON parse failures via logit-level constraint enforcement
- JSON schema: APAC guarantee schema-valid output from APAC-language LLM extraction
- Regex constraint: APAC regex-pattern enforcement on LLM character-level output
- Prompt programs: APAC control flow (if/loops/variables) in generation templates
- Local model: APAC full constraint on HuggingFace/llama.cpp APAC-language models
- Multi-step: APAC branch extraction strategy based on intermediate generation results
Best for
- APAC engineering teams building structured data extraction pipelines from Japanese, Korean, and Chinese text — particularly APAC organizations that need guaranteed parseable JSON output from LLM extraction with zero parsing failure tolerance, using local models where full logit access enables token-level constraint enforcement rather than prompt-based output structuring.
Limitations to know
- ! APAC full token masking requires logit access — not available for all cloud LLM APIs
- ! APAC complex generation templates increase prompt debugging complexity vs simple structured output
- ! APAC constraint overhead adds inference latency compared to unconstrained generation
About Guidance
Guidance is an open-source LLM generation control library from Microsoft that provides APAC engineering teams with token-level output constraint enforcement — masking the probability distribution of LLM token generation to make it structurally impossible for the model to produce output that violates the specified JSON schema, regex pattern, or context-free grammar. APAC data extraction pipelines that require structured output from Japanese, Korean, and Chinese text use Guidance to guarantee parseable outputs without the retry loops and fallback logic required by unconstrained prompt-based extraction.
Guidance's generation constraint model works by intercepting the LLM's logit distribution at each token generation step and masking to zero the probability of any token that would make the output unable to satisfy the declared constraint — if the output schema requires a JSON integer field and the LLM begins generating a non-numeric token, Guidance masks that token to zero probability before sampling, forcing the model to generate a valid token. APAC teams using Guidance measure zero JSON parsing failures in production extraction pipelines, compared to 2-8% parsing failure rates with unconstrained prompt-based extraction on the same APAC-language inputs.
Guidance's template language enables APAC teams to interleave LLM generation with programmatic control flow — inserting conditional branches, loop constructs, and variable reuse directly in the generation template — creating prompt programs that adapt their generation strategy based on LLM outputs at intermediate steps. APAC information extraction systems use Guidance templates to implement multi-step extraction logic: extract entity type first, then branch to the appropriate schema for that entity type, then validate the extracted values against business rules within the same generation pass.
Guidance integrates with local models (HuggingFace transformers, llama.cpp) and cloud APIs supporting logit access — APAC teams deploying on Llama or Mistral models locally use Guidance's full token-masking capability, while APAC teams using OpenAI APIs use Guidance's JSON schema mode which leverages OpenAI's structured output feature for schema-constrained generation without direct logit access.
Beyond this tool
Where this category meets practice depth.
A tool only matters in context. Browse the service pillars that operationalise it, the industries where it ships, and the Asian markets where AIMenta runs adoption programs.
Other service pillars
By industry