Internal AI Platform Reference Architecture

TL;DR

A useful internal AI platform for a 200-1,000 person enterprise has six components, not 20.
Build them in sequence over 9-12 months. Skipping the early components produces a platform no one uses.
The total annual operating cost for a working platform is US$280,000-US$650,000 including team.

Why now

Mid-market enterprises that want to ship AI use cases reliably are converging on the "internal AI platform" pattern: a small team that owns shared infrastructure, enabling business teams to deploy AI use cases without rebuilding plumbing each time.

Done well, the platform is a force multiplier. Done badly, it is a tax on the rest of the organisation. McKinsey's Generative AI in the Enterprise report found that companies with a centralised AI platform team shipped AI use cases 2.7x faster than companies with distributed AI delivery.[^1] But the same report noted that 40% of platform teams produced infrastructure their business units found unusable.

This article describes a reference architecture sized for 200-1,000 person enterprises and the build sequence that avoids the unusable-platform trap.

The six components

A working platform has six components. Build them in this order.

Component 1: Model gateway. A unified API that fronts multiple model providers (Anthropic, OpenAI, Google, regional providers, self-hosted). Centralises authentication, rate limiting, observability, and cost tracking.

Component 2: Vector store and embedding service. Shared vector storage (typically pgvector for mid-market scale) and an embedding API. Use cases consume embeddings without each team picking and operating their own store.

Component 3: Evaluation harness. Shared infrastructure for running model and prompt evaluations against curated test sets. Used by every team launching a new use case.

Component 4: Observability stack. Tracing of LLM calls, cost attribution per use case, prompt version tracking, output sampling for review.

Component 5: Prompt and template registry. Versioned storage of prompts, with deployment and rollback capabilities. The application code does not contain prompts; it references them.

Component 6: Governance toolkit. PII detection, content moderation, audit logging, automated compliance checks. Plumbed into the gateway so use cases inherit the controls.

Six components, not 30. The temptation is always to add: a feature store, an experiment tracker, a model registry, an annotation tool, a feedback collection UI, a fine-tuning pipeline. Most of these are not needed in year one for mid-market scale. Add them when a real use case demands them, not on speculation.

Build sequence: months 1-3

The first three months build component 1 (model gateway) and the foundations of component 4 (observability).

The model gateway is the highest-payoff early build. It centralises:

Authentication and authorisation across model providers
Rate limiting (essential for cost control)
Per-use-case cost attribution
Provider failover (when one model provider has an outage)
Audit logging of every LLM call

A working gateway can be built in 8-12 weeks by 1-2 engineers using the open-source LiteLLM or building on a similar foundation. Observability is the same engineering team in parallel.

Anti-pattern in months 1-3: building elaborate features (fine-tuning pipelines, agent frameworks) before the gateway is operational. Without the gateway every use case rebuilds the basics.

Build sequence: months 4-6

Months 4-6 add components 2 (vector store and embedding service) and 5 (prompt registry).

Vector store and embedding: standardise on a default (pgvector for most mid-market) and provide a thin embedding API. Teams can ask for alternative stores once they exceed 50M vectors; default to the boring choice for everyone else.

Prompt registry: versioned storage with simple deployment. Even a basic registry (a Git repo with a thin loading library) is better than prompts hardcoded in application code. Mature prompt management lets teams iterate prompts without redeploying applications.

By end of month 6 a new AI use case can be shipped with about 30% of the engineering effort it would have required at the start of month 1. This is when the platform's existence starts to feel justified to business teams.

Build sequence: months 7-9

Months 7-9 add component 3 (evaluation harness) and the bulk of component 6 (governance toolkit).

The evaluation harness is the most undervalued component. Without it, prompt and model changes are deployed without regression testing, and quality drift is discovered through customer complaints. With it, every change runs through a curated test set and quality regressions are caught in CI.

Build the evaluation harness on open-source frameworks (deepeval, OpenAI Evals, lm-evaluation-harness as starting points). Curate per-use-case test sets with the help of the use-case team. The harness is shared infrastructure; the test sets are use-case specific.

Governance toolkit: PII detection (often a small fine-tuned model or a pattern-matching service), content moderation (provider APIs or self-hosted), audit logging built on top of the gateway logs.

Build sequence: months 10-12

Months 10-12 mature the components based on real-world use. Common additions:

Self-hosted small open-weights model behind the gateway, for cost optimisation on high-volume use cases
Per-team cost dashboards
Approval workflow for new prompt deployments across teams
Integration with the company's identity provider for fine-grained authorisation

Resist the temptation to build a full MLOps platform. Most mid-market AI use cases do not need feature stores, experiment trackers, or model registries in year one. They need a gateway, a vector store, evals, observability, prompts, and governance. Build the six. Add more only on real-use-case demand.

Team shape

A working internal AI platform team for a 200-1,000 person enterprise has 4-6 people:

1 platform lead (engineering manager or staff engineer)
2-3 platform engineers (build and operate the components)
1 ML engineer (model evaluation, embedding pipelines)
1 governance and security engineer (often dotted-line into security team)

Reporting line: typically into the CTO, CDO, or head of platform engineering. Not into a business unit. Not into the CIO without engineering depth in CIO's organisation. The platform team makes architectural decisions that span the company.

Cost shape

Annual all-in operating cost for a working platform team and the platform itself, sized for a 200-1,000 person enterprise:

Team: US$180,000-US$420,000 (5 FTE at US$36K-$84K loaded cost, varies by market)
Infrastructure: US$60,000-US$160,000 (compute, storage, vector store, embedding pipelines, observability tools)
External model API spend: US$40,000-US$120,000 (varies dramatically by use case volume; this is the gateway's pass-through to use cases)

Total: US$280,000-US$700,000 per year, with the team being the biggest line. Use case teams consume gateway capacity at their own cost; the platform itself is shared infrastructure.

Implementation playbook

Run this sequence to stand up the platform.

Month 0: Get explicit sponsorship from CTO or CDO. Confirm reporting line.
Month 0-1: Hire or assemble the team. Recruiting platform engineers in Asia is competitive; budget for time.
Month 1-3: Ship the model gateway. Deploy with three real use cases (pick willing partners).
Month 4-6: Add vector store and prompt registry. Migrate existing use cases.
Month 7-9: Add evaluation harness and governance toolkit. Make eval mandatory for new production deployments.
Month 10-12: Iterate based on use-case team feedback. Resist the urge to over-build.
Quarterly: Survey use-case teams. The platform succeeds when business teams ask for more, not when they avoid it.

Anti-patterns to avoid

Building a custom framework on day one. Use established open-source tools where possible. Customisation comes later, on demand.

Centralising too much. The platform should provide infrastructure, not own use cases. Use cases live with the business teams.

Skipping the evaluation harness. Without it, model changes are gambles. With it, model changes are managed.

Underestimating governance. The toolkit is component 6 in build order, not component 16. Without it the platform fails security review.

Treating the platform as a destination rather than a service. A platform that requires use-case teams to use it because of mandate fails. A platform they choose to use because it makes them faster succeeds.

Counter-arguments

"We are too small for a platform." At 200 people, possibly. At 400 people with three or more AI use cases in production, the platform pays back. Run the math: how many use cases, how much per use case is plumbing rebuild, what would shared plumbing save.

"Our cloud provider's AI services give us all this." They give you components. They do not give you the integration and policy layer that makes the components a platform. The integration is your work.

"We should build a multi-tenant platform that scales to enterprise." Maybe in year three. In year one, build for the 200-1,000 person scale you actually have. Premature scale is the most expensive form of premature optimisation.

Bottom line

A useful internal AI platform for a mid-market enterprise has six components built in sequence over 9-12 months. Done well, it accelerates use-case delivery by 2-3x and centralises cost and risk control. Done badly, it produces unused infrastructure and an alienated business.

The six components are: model gateway, vector store and embedding service, evaluation harness, observability stack, prompt registry, and governance toolkit. Build them in order. Resist additions until real use cases demand them. Measure success by use-case team adoption, not by feature count.

Next read

By Daniel Chen, Director, AI Advisory.

[^1]: McKinsey & Company, Generative AI in the Enterprise, June 2025.

Building an Internal AI Platform: Reference Architecture for 200-1,000 Person Companies

TL;DR

Why now

The six components

Build sequence: months 1-3

Build sequence: months 4-6

Build sequence: months 7-9

Build sequence: months 10-12

Team shape

Cost shape

Implementation playbook

Anti-patterns to avoid

Counter-arguments

Bottom line

Next read

Where this applies

Cross-reference our practice depth.

Related reading

Vietnam Enterprise AI in 2026: Manufacturing Hub, Digital Economy, and the Vietnamese Language Constraint

Professional Services AI in APAC 2026: Legal, Accounting, and Consulting Transformation

Enterprise AI Evaluation Framework: How to Select the Right LLM for Your Workload

Want this applied to your firm?