Skip to main content
Japan
AIMenta
Internal Productized · Fixed scope

Knowledge Base / RAG Stack

The retrieval foundation that powers every other AI feature in your company.

74%
3-7x
4 weeks
< 15 min

The problem

You have 14 SharePoint sites, three Confluence spaces, a Notion workspace, a thousand Google Docs, the original SOP PDFs from 2017, and a quarter-century of email threads. Your team cannot find anything. The "AI assistant" your IT department spun up answers from the public internet because it has no path into your real knowledge.

A McKinsey survey of 1,491 executives found that 73% of mid-market enterprises cite "data quality and accessibility" as the top blocker to AI value capture, ahead of model cost, talent, or compliance.[^1] The fix is not better models. The fix is making your knowledge retrievable.

Our approach

Source connectors (50+): SharePoint, Confluence, Notion, Google Drive,
  Slack, MS Teams, Salesforce, ServiceNow, GitHub, Jira, custom DB / API
          │
          ▼
Ingestion pipeline
   - file extraction (Apache Tika / Unstructured.io)
   - structure preservation (headings, tables, lists)
   - PII scrub (configurable)
          │
          ▼
Chunking layer
   - semantic chunking (heading-aware)
   - 800-1,500 token chunks with 200-token overlap
   - per-source metadata enrichment
          │
          ▼
Embedding layer
   - OpenAI text-embedding-3-large (default)
   - or BGE-M3 / Cohere Embed v3 multilingual
          │
          ▼
Vector store
   - pgvector (default, < 50M chunks)
   - Qdrant (50M-500M chunks)
   - Pinecone (managed, global multi-region)
          │
          ▼
Retrieval API
   - hybrid (BM25 + dense) with cross-encoder reranking (Cohere Rerank 3)
   - per-user permission filtering (mirrors source ACLs)
   - source citation in response
          │
          ▼
Consumed by: Customer Service Assistant, Sales Copilot, internal AI tools, etc.

Who it is for

  • Any company building 2+ internal AI features that need access to the same knowledge corpus.
  • A 700-person professional-services firm in Hong Kong with 14 years of client deliverables locked inside individual partner email accounts.
  • A 400-person engineering organisation in Vietnam trying to onboard 80 new hires per year against a documentation graveyard.

Tech stack

  • Source connectors: 50+ pre-built connectors. Custom connectors built for proprietary systems on a fixed-fee basis (typical: US$4,000-US$11,000 per connector)
  • Chunking + extraction: Unstructured.io for complex PDFs, Apache Tika for general files, custom parsers for scanned documents (paired with the Document Intelligence Suite)
  • Embeddings: OpenAI text-embedding-3-large (default), BGE-M3 for multilingual deployments, Cohere Embed v3 multilingual for sovereign deployments
  • Vector store: pgvector on Postgres 16 (default), Qdrant for scale, Pinecone for managed multi-region
  • Reranking: Cohere Rerank 3 (default), open-weights cross-encoders for air-gapped deployments
  • Backend: Laravel 12 with queue workers; FastAPI sidecar for Python ML components

Integration list

Microsoft SharePoint and OneDrive, Atlassian Confluence and Jira, Notion, Google Drive and Workspace, Slack, Microsoft Teams, Salesforce (Knowledge), ServiceNow, GitHub and GitLab, Box, Dropbox, Egnyte, custom databases via JDBC / REST, custom file shares via SMB / SFTP.

Deployment timeline

Week Activity
Week 1 Knowledge audit; pick 3-5 priority sources; permission model designed
Week 2 Source connectors deployed; ingestion pipeline live
Week 3 Chunking and embedding tuned; first 100K chunks indexed
Week 4 Retrieval API live; permission enforcement tested
Week 5-6 Consumer integration (first downstream AI feature)
Week 7-8 Tuning loop based on first month of real queries

Mini-ROI

The ROI is downstream. Every AI feature that consumes the RAG Stack inherits the quality of retrieval. McKinsey's 2024 productivity research finds that AI features grounded in proprietary knowledge deliver 3-7x the perceived utility of public-internet-grounded features for knowledge workers.[^2]

A 700-person professional-services firm in Hong Kong indexed 280,000 deliverables, briefs, and client memos in week 3 of 2025. The deployed search reduced average research time per new client engagement from 9 hours to 2.4 hours. Across 1,400 engagements per year, the saving was equivalent to 4.7 FTE — redeployed into senior consultant capacity.

Pricing tiers

Tier Setup (one-time) Monthly run cost Best for
Starter US$18,000 - US$32,000 From US$900/mo 3-5 sources, < 100K documents, single language.
Scale US$45,000 - US$85,000 From US$2,800/mo 8-15 sources, < 5M documents, multi-language, permission enforcement.
Strategic US$110,000 - US$240,000 From US$6,800/mo Enterprise-wide indexing, multi-region, custom connectors, dedicated FinOps.

All tiers include the quarterly retrieval-quality audit and re-indexing as needed.

Frequently asked questions

How do you respect document permissions? The retrieval layer mirrors source ACLs. A user querying the assistant can only retrieve content they have access to in the source system. Permission changes propagate within 15 minutes (configurable). We have shipped this model under SOC2 and ISO 27001 audit.

What about confidential documents that should never be searchable? Configurable exclusion rules at the source level (folder, label, classification tag). A redaction layer can also strip PII or commercially-sensitive sections before embedding. Excluded content is never indexed, never embedded, never returned.

How do you handle Mandarin, Japanese, Korean, Vietnamese, Thai content? BGE-M3 and Cohere Embed v3 multilingual handle all five well. We benchmark per-language retrieval quality against a golden set during setup and tune chunking parameters per language (Japanese needs different chunk sizes than English due to lack of whitespace).

Will this work with our scanned PDFs from 2003? Paired with the Document Intelligence Suite, yes. Scanned PDFs go through OCR before chunking. We have indexed scanned client files going back 20+ years for two professional-services clients.

Can we update content in real time? Most sources sync within 5-15 minutes of update. For real-time-critical sources (e.g., live ticket data), we deploy webhook-driven incremental indexing. For batch sources (legacy file shares), nightly is typical.

What happens when an employee leaves? Their access is revoked immediately at the source level, which propagates to the retrieval layer within minutes. Documents they authored remain searchable for users with permission. Their chat history with assistants is retained per your policy.

Can we run this fully on-premise? Yes. Air-gapped deployments use BGE-M3 embeddings (open-weights), Qdrant for vector storage, and open-weights LLMs (Llama 3.1, Qwen 2.5) for any consuming features. Performance and tuning trade-offs are explicit at architecture stage.

How does the retrieval quality stay high over time? Three mechanisms: a feedback loop on every retrieved answer (thumbs-up/down with optional comment), a monthly retrieval-quality dashboard, and a quarterly retrieval-tuning sprint included in Scale and Strategic tiers.

Where this is most often deployed

Industries where AIMenta frequently scopes this kind of solution.

Common questions

Frequently asked questions

What data sources can be connected to the knowledge base?

The stack ingests from Confluence, SharePoint, Notion, Google Drive, Jira, GitHub, internal wikis, PDF repositories, and relational databases via JDBC. New connector types can be added in 3–5 days for sources with a REST API. Change-detection crawlers keep the index current as documents are updated.

How does the RAG stack prevent the AI from hallucinating answers?

Responses are generated only from passages retrieved from your indexed corpus — the model cannot draw on general training knowledge for factual claims. Each answer includes source citations (document name, page, and section) so users can verify. Passages with retrieval confidence below a configurable threshold trigger a 'source not found' response rather than an approximate answer.

How is access control handled across different departments?

The stack inherits permissions from your identity provider (Entra ID, Okta, Google Workspace). A user can only retrieve passages from documents their SSO role can access. Permission changes propagate to the index within 15 minutes via a sync job. This means Finance can query its own documents without Finance data appearing in results for other departments.

Adjacent solutions

Related solutions

Don't see exactly what you need?

Most engagements start as custom scopes. Send us your problem; we'll tell you whether one of our productized solutions fits — or what a custom build looks like.