Building APAC RAG Infrastructure That Meets Compliance Requirements
Retrieval-augmented generation — the pattern where a system retrieves relevant documents and passes them as context to an LLM before generating a response — is the dominant APAC enterprise AI architecture in 2026. But the implementation details that determine whether a RAG system works well in APAC regulated environments differ significantly from the basic tutorials that make RAG look simple.
APAC financial services institutions, healthcare organizations, and enterprise software companies building RAG face three infrastructure decisions that US-centric tutorials typically underspecify:
Where do embeddings live? APAC organizations with existing PostgreSQL infrastructure often don't want to operate a separate vector database alongside their application database. The choice of embedding store determines operational complexity, query flexibility, and scale limits for APAC RAG systems.
How is the RAG pipeline orchestrated? Simple chain-based RAG (embed query → retrieve documents → stuff into prompt → generate) works for toy applications but fails on APAC production workloads where document quality varies, APAC languages mix within documents, and retrieval precision matters for regulated output accuracy.
How does structured output get extracted from LLM generation? Many APAC RAG applications need to return structured data (classification labels, extracted entities, validated APAC schema objects) rather than free-form text — and raw LLM output is unreliable enough that production APAC systems need validation and retry logic.
pgvector, Haystack, and Instructor address these three decisions for APAC engineering teams building data-sovereign RAG on self-hosted or APAC-region-hosted infrastructure.
pgvector: Embedding Storage in APAC PostgreSQL Infrastructure
Why APAC teams reach for pgvector first
The most common APAC RAG architecture question from engineering teams with existing PostgreSQL is: "Do we need a separate vector database, or can we use Postgres?"
For most APAC RAG applications at small-to-medium scale (under 10 million document chunks), the answer is pgvector — add the extension to your existing APAC Postgres instance and store embeddings as a column alongside your document metadata. The operational benefit is significant: one APAC database to back up, one to monitor, one set of credentials to manage, and SQL joins between vector similarity results and relational APAC data filters (by document type, by access permission, by APAC language).
Installing pgvector for APAC Postgres
-- Enable pgvector extension (requires Postgres 13+ and pgvector installed)
CREATE EXTENSION vector;
-- APAC document embedding table
CREATE TABLE apac_documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
language VARCHAR(10) NOT NULL, -- 'ja', 'ko', 'zh-CN', 'en', etc.
source VARCHAR(255) NOT NULL, -- APAC document source identifier
category VARCHAR(100), -- APAC document category for filtering
embedding vector(1536), -- OpenAI text-embedding-3-small dimension
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index for APAC approximate nearest neighbor (faster queries, slightly less accurate)
CREATE INDEX ON apac_documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- IVFFlat index alternative (faster to build, slightly different accuracy profile)
-- CREATE INDEX ON apac_documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
APAC RAG retrieval with pgvector
import psycopg2
from openai import OpenAI
client = OpenAI()
def retrieve_apac_context(query: str, language: str = None, top_k: int = 5) -> list[dict]:
"""Retrieve APAC documents most similar to the query."""
# Generate query embedding (or use self-hosted Ollama for data sovereignty)
query_embedding = client.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
# pgvector similarity search with optional APAC language filter
with psycopg2.connect(os.environ['APAC_DATABASE_URL']) as conn:
with conn.cursor() as cur:
cur.execute("""
SELECT content, source, category,
1 - (embedding <=> %s::vector) AS similarity
FROM apac_documents
WHERE (%s IS NULL OR language = %s) -- APAC language filter
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (
query_embedding, language, language,
query_embedding, top_k
))
return [
{"content": row[0], "source": row[1], "category": row[2], "similarity": row[3]}
for row in cur.fetchall()
]
# APAC RAG query: retrieve Japanese context, generate response
context_docs = retrieve_apac_context(
"MAS規制の顧客デューデリジェンス要件は何ですか?",
language="ja",
top_k=3
)
Hybrid APAC search with pgvector + full-text search
For APAC RAG applications where queries combine semantic understanding (cosine similarity) with exact keyword matching (product codes, regulatory article numbers, APAC organization names), pgvector combines with Postgres's built-in full-text search:
-- Hybrid APAC search: combine vector similarity with full-text rank
SELECT content, source,
(1 - (embedding <=> query_embedding::vector)) * 0.7
+ ts_rank(to_tsvector('simple', content), query_tsquery) * 0.3
AS hybrid_score
FROM apac_documents,
(SELECT '[query_embedding_json]'::vector AS query_embedding,
to_tsquery('simple', 'MAS & 顧客 & デューデリジェンス') AS query_tsquery) params
ORDER BY hybrid_score DESC
LIMIT 5;
The 70/30 vector/BM25 weighting is a starting point for APAC teams — adjust based on whether your APAC queries are primarily semantic (higher vector weight) or keyword-dependent (higher BM25 weight).
Haystack: APAC RAG Pipeline Orchestration
When pgvector + simple prompting isn't enough
Simple RAG — retrieve top-k chunks by cosine similarity, concatenate, pass to LLM — works for homogeneous APAC document collections where chunk quality is consistent. It breaks when:
- APAC document quality varies: Some APAC retrieved chunks are highly relevant; others are noise that confuses the LLM. A re-ranker filters these.
- APAC queries are ambiguous: "What is our policy?" doesn't specify which APAC policy. A query decomposition step clarifies before retrieval.
- APAC multi-hop reasoning is required: The answer requires information from two APAC documents that must be combined. A multi-step pipeline handles this.
- APAC retrieval quality must be measured: The APAC team needs to know if changing from cosine to hybrid retrieval improved answer quality. An evaluation framework measures this.
Haystack provides the pipeline architecture for these cases.
Haystack RAG pipeline for APAC
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack_integrations.components.retrievers.pgvector import PgvectorEmbeddingRetriever
from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import RAGPromptBuilder
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.embedders import OpenAITextEmbedder
# APAC pgvector document store
apac_store = PgvectorDocumentStore(
connection_string=os.environ['APAC_DATABASE_URL'],
table_name="apac_haystack_documents",
embedding_dimension=1536,
vector_function="cosine_similarity",
language_field="language" # APAC language-aware store
)
# APAC RAG pipeline: embed → retrieve → rerank → generate
apac_rag = Pipeline()
apac_rag.add_component("embedder", OpenAITextEmbedder(model="text-embedding-3-small"))
apac_rag.add_component("retriever", PgvectorEmbeddingRetriever(
document_store=apac_store,
top_k=10,
filters={"field": "language", "operator": "in", "value": ["ja", "en"]} # APAC languages
))
apac_rag.add_component("ranker", TransformersSimilarityRanker(
model="cross-encoder/ms-marco-MiniLM-L-6-v2",
top_k=3 # keep only top 3 APAC chunks after reranking
))
apac_rag.add_component("prompt_builder", RAGPromptBuilder(template="""
Given these APAC regulatory documents:
{% for doc in documents %}
Source: {{ doc.meta.source }}
{{ doc.content }}
{% endfor %}
Answer the following question about APAC financial regulations. If the answer cannot be found
in the provided documents, state that clearly rather than generating an unsupported answer.
Question: {{ question }}
"""))
apac_rag.add_component("generator", OpenAIGenerator(model="gpt-4o"))
# Connect pipeline components
apac_rag.connect("embedder.embedding", "retriever.query_embedding")
apac_rag.connect("retriever.documents", "ranker.documents")
apac_rag.connect("ranker.documents", "prompt_builder.documents")
apac_rag.connect("prompt_builder.prompt", "generator.prompt")
# Run APAC RAG query
result = apac_rag.run({
"embedder": {"text": "MASのデジタルバンキング規制の主要要件は何ですか?"},
"ranker": {"query": "MASのデジタルバンキング規制の主要要件は何ですか?"},
"prompt_builder": {"question": "MASのデジタルバンキング規制の主要要件は何ですか?"}
})
print(result["generator"]["replies"][0])
The re-ranker step — using a cross-encoder model to score retrieved APAC chunks against the query more accurately than cosine similarity alone — is the single highest-impact improvement for APAC RAG quality on dense regulatory document collections.
Haystack RAG evaluation for APAC teams
from haystack.evaluation import RAGEvaluationPipeline
from haystack.evaluation.metrics import (
Faithfulness,
ContextPrecision,
ContextRecall,
AnswerRelevancy
)
# APAC RAG evaluation pipeline
eval_pipeline = RAGEvaluationPipeline(
rag_pipeline=apac_rag,
rag_pipeline_inputs={
"embedder": {"text": "$query"},
"ranker": {"query": "$query"},
"prompt_builder": {"question": "$query"}
},
metrics=[Faithfulness(), ContextPrecision(), ContextRecall(), AnswerRelevancy()]
)
# Run evaluation against APAC test set
apac_test_questions = [
{"query": "MAS TRM Guideline 2021のITセキュリティ要件は何ですか?",
"ground_truth": "MAS TRM Guideline 2021は..."},
# ... more APAC regulatory Q&A pairs
]
results = eval_pipeline.run(inputs=apac_test_questions)
print(f"APAC Faithfulness: {results['faithfulness'].score:.2f}")
print(f"APAC Context Precision: {results['context_precision'].score:.2f}")
APAC ML engineering teams that run evaluation before and after changing retrieval strategy (BM25 → vector → hybrid) get quantitative evidence of whether the change improved APAC answer quality — replacing guesswork with measurement.
Instructor: Structured Output for APAC Data Extraction Pipelines
Why raw LLM output is unreliable for APAC production pipelines
APAC RAG applications that need structured output — extract entities, classify documents, validate compliance flags — cannot rely on raw LLM text output. The LLM might return the right information in the wrong format, miss a required field, or use an unexpected APAC value that downstream systems cannot parse.
Instructor patches the LLM SDK to guarantee typed, validated output matching an APAC Pydantic schema — with automatic retry if validation fails.
Instructor for APAC document extraction
import instructor
from anthropic import Anthropic
from pydantic import BaseModel, Field
from typing import Literal, Optional
from enum import Enum
# APAC regulatory compliance extraction schema
class APACComplianceFlag(str, Enum):
COMPLIANT = "compliant"
NON_COMPLIANT = "non_compliant"
REQUIRES_REVIEW = "requires_review"
class APACDocumentExtraction(BaseModel):
document_type: Literal["contract", "policy", "regulatory_filing", "internal_memo"]
jurisdiction: Literal["SG", "HK", "JP", "KR", "TW", "MY", "ID", "VN", "TH"]
compliance_status: APACComplianceFlag
key_obligations: list[str] = Field(min_length=1, max_length=10,
description="Key APAC compliance obligations from the document")
deadline: Optional[str] = Field(None,
description="APAC compliance deadline in ISO 8601 format, if present")
risk_level: Literal["low", "medium", "high"] = Field(
description="Assessed APAC compliance risk level")
# Patch Anthropic SDK for structured APAC output
apac_client = instructor.from_anthropic(Anthropic())
def extract_apac_compliance(document_text: str) -> APACDocumentExtraction:
return apac_client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Extract APAC compliance information from this document.
Return the required structured data.
Document:
{document_text}"""
}],
response_model=APACDocumentExtraction,
max_retries=3 # retry up to 3 times if APAC schema validation fails
)
# Process APAC regulatory document
apac_doc = """
MAS Notice 655 requires financial institutions to implement customer due diligence
measures for all new account openings. Singapore-incorporated institutions must...
"""
result = extract_apac_compliance(apac_doc)
print(f"Jurisdiction: {result.jurisdiction}") # "SG"
print(f"Compliance: {result.compliance_status}") # "requires_review"
print(f"Risk: {result.risk_level}") # "high"
print(f"Obligations: {result.key_obligations}") # ["Implement CDD measures", ...]
Instructor with self-hosted APAC LLMs
For APAC financial services teams routing sensitive document text through self-hosted vLLM:
import instructor
from openai import OpenAI
# Connect Instructor to self-hosted APAC vLLM cluster
apac_vllm_client = instructor.from_openai(
OpenAI(
base_url="http://vllm.internal.apac.example.com:8000/v1",
api_key="none"
),
mode=instructor.Mode.JSON # vLLM uses JSON mode for structured output
)
# Same extraction code — document text never leaves APAC infrastructure
result = apac_vllm_client.chat.completions.create(
model="qwen2.5-72b-instruct",
messages=[{"role": "user", "content": f"Extract: {apac_doc}"}],
response_model=APACDocumentExtraction,
max_retries=3
)
APAC RAG Architecture Patterns Summary
APAC RAG Stack Decision Tree:
Scale < 5M vectors + existing Postgres?
→ pgvector: add vector extension, avoid new infrastructure
Need retrieval quality evaluation?
→ Haystack: modular pipelines with built-in RAG metrics
Need structured LLM output?
→ Instructor: Pydantic models + auto-retry for validated extraction
APAC data sovereignty required?
→ All three: pgvector stores on APAC Postgres, Haystack pipeline
calls self-hosted vLLM/Ollama, Instructor patches self-hosted endpoint
Production APAC RAG full stack:
pgvector (embedding store)
+ Haystack (pipeline + reranker + evaluation)
+ Instructor (structured extraction from generation)
+ vLLM/Ollama (self-hosted APAC LLM inference)
For APAC regulated industries, the combination of pgvector (data in existing APAC Postgres), Haystack with vLLM/Ollama generators (APAC-hosted generation), and Instructor with self-hosted client (no extraction data to external APIs) delivers a complete APAC RAG stack where document content, embeddings, and LLM interactions remain within APAC infrastructure boundaries.
Related APAC AI Infrastructure Resources
For the self-hosted LLM inference servers that power APAC-sovereign RAG generation, see the APAC self-hosted LLM deployment guide covering vLLM, Ollama, and LiteLLM.
For the AI developer tools that APAC engineers use alongside these RAG components, see the APAC AI developer tools guide covering Aider, Continue, and Open WebUI.
For the data platform that prepares APAC documents for ingestion into RAG pipelines, see the APAC data engineering platform guide covering ClickHouse, DuckDB, and Apache Flink.
Beyond this insight
Cross-reference our practice depth.
If this article matches your stage of thinking, the underlying capabilities ship across all six pillars, ten verticals, and nine Asian markets.