Knowledge Base & RAG Stack for Enterprise AI

Name: Knowledge Base / RAG Stack
Brand: AIMenta

The problem

You have 14 SharePoint sites, three Confluence spaces, a Notion workspace, a thousand Google Docs, the original SOP PDFs from 2017, and a quarter-century of email threads. Your team cannot find anything. The "AI assistant" your IT department spun up answers from the public internet because it has no path into your real knowledge.

A McKinsey survey of 1,491 executives found that 73% of mid-market enterprises cite "data quality and accessibility" as the top blocker to AI value capture, ahead of model cost, talent, or compliance.[^1] The fix is not better models. The fix is making your knowledge retrievable.

Our approach

Source connectors (50+): SharePoint, Confluence, Notion, Google Drive,
  Slack, MS Teams, Salesforce, ServiceNow, GitHub, Jira, custom DB / API
          │
          ▼
Ingestion pipeline
   - file extraction (Apache Tika / Unstructured.io)
   - structure preservation (headings, tables, lists)
   - PII scrub (configurable)
          │
          ▼
Chunking layer
   - semantic chunking (heading-aware)
   - 800-1,500 token chunks with 200-token overlap
   - per-source metadata enrichment
          │
          ▼
Embedding layer
   - OpenAI text-embedding-3-large (default)
   - or BGE-M3 / Cohere Embed v3 multilingual
          │
          ▼
Vector store
   - pgvector (default, < 50M chunks)
   - Qdrant (50M-500M chunks)
   - Pinecone (managed, global multi-region)
          │
          ▼
Retrieval API
   - hybrid (BM25 + dense) with cross-encoder reranking (Cohere Rerank 3)
   - per-user permission filtering (mirrors source ACLs)
   - source citation in response
          │
          ▼
Consumed by: Customer Service Assistant, Sales Copilot, internal AI tools, etc.

Who it is for

Any company building 2+ internal AI features that need access to the same knowledge corpus.
A 700-person professional-services firm in Hong Kong with 14 years of client deliverables locked inside individual partner email accounts.
A 400-person engineering organisation in Vietnam trying to onboard 80 new hires per year against a documentation graveyard.

Tech stack

Source connectors: 50+ pre-built connectors. Custom connectors built for proprietary systems on a fixed-fee basis (typical: US$4,000-US$11,000 per connector)
Chunking + extraction: Unstructured.io for complex PDFs, Apache Tika for general files, custom parsers for scanned documents (paired with the Document Intelligence Suite)
Embeddings: OpenAI text-embedding-3-large (default), BGE-M3 for multilingual deployments, Cohere Embed v3 multilingual for sovereign deployments
Vector store: pgvector on Postgres 16 (default), Qdrant for scale, Pinecone for managed multi-region
Reranking: Cohere Rerank 3 (default), open-weights cross-encoders for air-gapped deployments
Backend: Laravel 12 with queue workers; FastAPI sidecar for Python ML components

Integration list

Microsoft SharePoint and OneDrive, Atlassian Confluence and Jira, Notion, Google Drive and Workspace, Slack, Microsoft Teams, Salesforce (Knowledge), ServiceNow, GitHub and GitLab, Box, Dropbox, Egnyte, custom databases via JDBC / REST, custom file shares via SMB / SFTP.

Deployment timeline

Week	Activity
Week 1	Knowledge audit; pick 3-5 priority sources; permission model designed
Week 2	Source connectors deployed; ingestion pipeline live
Week 3	Chunking and embedding tuned; first 100K chunks indexed
Week 4	Retrieval API live; permission enforcement tested
Week 5-6	Consumer integration (first downstream AI feature)
Week 7-8	Tuning loop based on first month of real queries

Mini-ROI

The ROI is downstream. Every AI feature that consumes the RAG Stack inherits the quality of retrieval. McKinsey's 2024 productivity research finds that AI features grounded in proprietary knowledge deliver 3-7x the perceived utility of public-internet-grounded features for knowledge workers.[^2]

A 700-person professional-services firm in Hong Kong indexed 280,000 deliverables, briefs, and client memos in week 3 of 2025. The deployed search reduced average research time per new client engagement from 9 hours to 2.4 hours. Across 1,400 engagements per year, the saving was equivalent to 4.7 FTE — redeployed into senior consultant capacity.

Pricing tiers

Tier	Setup (one-time)	Monthly run cost	Best for
Starter	US$18,000 - US$32,000	From US$900/mo	3-5 sources, < 100K documents, single language.
Scale	US$45,000 - US$85,000	From US$2,800/mo	8-15 sources, < 5M documents, multi-language, permission enforcement.
Strategic	US$110,000 - US$240,000	From US$6,800/mo	Enterprise-wide indexing, multi-region, custom connectors, dedicated FinOps.

All tiers include the quarterly retrieval-quality audit and re-indexing as needed.

Frequently asked questions

How do you respect document permissions? The retrieval layer mirrors source ACLs. A user querying the assistant can only retrieve content they have access to in the source system. Permission changes propagate within 15 minutes (configurable). We have shipped this model under SOC2 and ISO 27001 audit.

What about confidential documents that should never be searchable? Configurable exclusion rules at the source level (folder, label, classification tag). A redaction layer can also strip PII or commercially-sensitive sections before embedding. Excluded content is never indexed, never embedded, never returned.

How do you handle Mandarin, Japanese, Korean, Vietnamese, Thai content? BGE-M3 and Cohere Embed v3 multilingual handle all five well. We benchmark per-language retrieval quality against a golden set during setup and tune chunking parameters per language (Japanese needs different chunk sizes than English due to lack of whitespace).

Will this work with our scanned PDFs from 2003? Paired with the Document Intelligence Suite, yes. Scanned PDFs go through OCR before chunking. We have indexed scanned client files going back 20+ years for two professional-services clients.

Can we update content in real time? Most sources sync within 5-15 minutes of update. For real-time-critical sources (e.g., live ticket data), we deploy webhook-driven incremental indexing. For batch sources (legacy file shares), nightly is typical.

What happens when an employee leaves? Their access is revoked immediately at the source level, which propagates to the retrieval layer within minutes. Documents they authored remain searchable for users with permission. Their chat history with assistants is retained per your policy.

Can we run this fully on-premise? Yes. Air-gapped deployments use BGE-M3 embeddings (open-weights), Qdrant for vector storage, and open-weights LLMs (Llama 3.1, Qwen 2.5) for any consuming features. Performance and tuning trade-offs are explicit at architecture stage.

How does the retrieval quality stay high over time? Three mechanisms: a feedback loop on every retrieved answer (thumbs-up/down with optional comment), a monthly retrieval-quality dashboard, and a quarterly retrieval-tuning sprint included in Scale and Strategic tiers.

Where this is most often deployed

Industries where AIMenta frequently scopes this kind of solution.

industry Professional Services industry Technology & SaaS industry Public Sector

Beyond this solution

Browse our other productized solutions, plus the verticals and Asian markets where they ship.

By industry

Financial Services Retail & E-commerce Manufacturing Logistics & Supply Chain Healthcare Professional Services Public Sector Real Estate Technology & SaaS Education

By Asian market

Hong Kong Taiwan Singapore Malaysia Mainland China South Korea Japan Vietnam Indonesia

Continue exploring: All solutions Services Case studies Insights

Engagement profile

Category: Internal
Scope: Fixed
Timeline: 2–4 months
Handover: Knowledge transfer

Built on service pillar

Software & Platforms

Custom AI products built on your stack. Shipped in 11 weeks median.

View pillar

Scope this solution

A 30-min call confirms fit and gives you a fixed-price scope within a week.

Book a scoping call

Common questions

Frequently asked questions

What data sources can be connected to the knowledge base?

The stack ingests from Confluence, SharePoint, Notion, Google Drive, Jira, GitHub, internal wikis, PDF repositories, and relational databases via JDBC. New connector types can be added in 3–5 days for sources with a REST API. Change-detection crawlers keep the index current as documents are updated.

How does the RAG stack prevent the AI from hallucinating answers?

Responses are generated only from passages retrieved from your indexed corpus — the model cannot draw on general training knowledge for factual claims. Each answer includes source citations (document name, page, and section) so users can verify. Passages with retrieval confidence below a configurable threshold trigger a 'source not found' response rather than an approximate answer.

How is access control handled across different departments?

The stack inherits permissions from your identity provider (Entra ID, Okta, Google Workspace). A user can only retrieve passages from documents their SSO role can access. Permission changes propagate to the index within 15 minutes via a sync job. This means Finance can query its own documents without Finance data appearing in results for other departments.

Adjacent solutions

Knowledge Base / RAG Stack

The problem

Our approach

Who it is for

Tech stack

Integration list

Deployment timeline

Mini-ROI

Pricing tiers

Frequently asked questions

Where this is most often deployed

Beyond this solution

Other solutions

By industry

By Asian market

Frequently asked questions

What data sources can be connected to the knowledge base?

How does the RAG stack prevent the AI from hallucinating answers?

How is access control handled across different departments?

Related solutions

Sales Enablement Copilot

HR & Recruiting Copilot

Don't see exactly what you need?