Skip to main content
Hong Kong
AIMenta
Customer Productized · Fixed scope

Customer Service AI Assistant

Deflect 60%+ of tickets across nine Asian languages, with confident escalation your CX team controls.

38-62%
+/- 0.1pt
US$120K-160K
6 weeks

The problem

Your support team handles 14,000 tickets a month. 62% are repeats of the same 30 questions: order status, refund policy, integration help, password reset. Your agents burn out on tier-1 work while VIP customers wait. You tried a rule-based chatbot in 2022; customers learned to type "human" in three turns and gave up on self-service forever.

McKinsey estimates that generative AI can deflect 30-50% of customer-service contacts in mid-market B2B and B2C without measurable CSAT degradation, when paired with confident escalation and correct grounding.[^1] Most rule-based deployments deflect under 10%. The gap is the model and the retrieval design, not the channel.

Our approach

End-user channel (web chat / WhatsApp / LINE / WeChat)
          │
          ▼
Channel adapter (Laravel webhook handlers)
          │
          ▼
Intent + safety layer  ← Llama Guard 3 (prompt-injection + PII scrub)
          │
          ▼
Retrieval (pgvector, hybrid BM25 + dense, 1,500-token chunks)
          │
          ▼
Generation: Claude Sonnet 4.6 (default) → Claude Haiku 4 (high volume)
          │
          ▼
Confidence scorer (logprobs + retrieval score + heuristic rules)
          │
          ▼
Auto-reply (≥ 0.85 confidence) │ Human escalation (< 0.85)
          │
          ▼
Observability + analytics: Langfuse, internal CX dashboard

Who it is for

  • A 1,200-agent contact centre at a Korean retailer with seasonal volume swings of 4x and ticket SLA pressure.
  • A 60-agent SaaS support team in Singapore where 70% of tickets are documented in the help centre but customers do not find the answer.
  • A regulated financial-services support function in Hong Kong where every reply needs an audit trail and a human reviewer for material questions.

Tech stack

  • LLMs: Claude Sonnet 4.6 (default reasoning), Claude Haiku 4 (high-volume cheap inference), GPT-4o (multimodal screenshots)
  • Vector store: pgvector on Postgres 16 (default), Qdrant when scale exceeds 50M chunks
  • Retrieval: Hybrid BM25 + dense, with cross-encoder reranking via Cohere Rerank 3
  • Safety: Llama Guard 3 for prompt-injection and PII scrubbing
  • Observability: Langfuse for LLM traces, Grafana for system metrics
  • Backend: Laravel 12 (PHP 8.5) with queue workers (Redis-backed), running on AWS ECS or Azure Container Apps
  • Channels: Web chat (custom widget), WhatsApp Business API, LINE, WeChat Work, Zendesk, Intercom

Integration list

Zendesk, Intercom, Freshdesk, Salesforce Service Cloud, HubSpot Service Hub, ServiceNow, Jira Service Management, Slack, Microsoft Teams, WhatsApp Business API, LINE Official Account, WeChat Work, Telegram, custom in-house ticketing systems via REST or webhook.

Deployment timeline

Week Activity
Week 1 Knowledge base audit; channel inventory; success metrics agreed
Week 2-3 Knowledge ingestion, chunking, embedding; safety policy configured
Week 4-5 Channel adapters built and tested in staging
Week 6 Shadow mode launch (AI proposes, agent decides) on 100% of tickets
Week 7-8 Cutover to live deflection on top 10 intents; monitoring on
Week 9-10 Tune confidence thresholds; expand to additional intents and channels

Mini-ROI

A 1,200-agent contact centre at a Korean retailer deflected 38% of inbound tickets in the first 90 days post-launch. Average handle time on remaining tickets dropped 22% (agents spent less time on repetitive questions). Annualised labour saving: US$1.4M. CSAT held at 4.3/5 (pre: 4.4/5).

McKinsey's 2024 customer operations research benchmarks fully-deployed AI support deflection at US$80,000-US$160,000 per 100 agents annually after factoring in model cost, integration cost, and tooling.[^2] Our deployments sit at the upper end of that range due to aggressive caching and model-tier routing.

Pricing tiers

Tier Setup (one-time) Monthly run cost Best for
Starter US$28,000 - US$48,000 From US$1,800/mo (model + infrastructure) Single channel, single language, up to 5,000 tickets/month deflection capacity.
Scale US$60,000 - US$120,000 From US$4,200/mo 3-5 channels, 4-6 languages, up to 30,000 tickets/month deflection.
Strategic US$140,000 - US$280,000 From US$9,500/mo 6+ channels, 9 languages, custom workflows, dedicated SRE backstop.

All tiers include 90-day post-launch hypercare and a re-tuning sprint at month six.

Frequently asked questions

Will the agent hallucinate refund or policy answers? The agent only answers from your retrieved knowledge base. If the retrieved content does not cover the question with a confidence above the threshold, the request escalates to a human. We measure hallucination rate weekly during shadow mode and set the threshold accordingly.

How do you handle languages outside the nine supported? Claude Sonnet 4.6 handles 95+ languages well. Adding a new language requires translating intent labels and confidence-test sets. Typical cost per additional language: US$8,000-US$14,000 setup, no recurring uplift.

What happens if the model deprecates? The evaluation harness runs the new model against a golden set of 1,200+ representative tickets. If accuracy drops more than 2 points, we adjust the prompt template or fall back to the previous model. Last 18 months: zero customer-visible regressions.

Can the agent take actions, not just answer? Yes. Tier-2 deployments use tool-calling to look up order status, issue refunds within policy limits, reset passwords, or open tickets. Every action that changes state writes to an audit log and routes to a human for amounts above a configured threshold.

What is the data residency posture? Default deployment in the AWS region of your choice (Tokyo, Singapore, Seoul, Mumbai, or Jakarta). Conversation data stays in region. For Mainland China, we deploy on Alibaba Cloud Beijing with PIPL-compliant logging. For air-gapped requirements, we self-host on open-weights models.

How do we measure success? Three numbers: deflection rate (% tickets fully resolved by AI), CSAT delta (post-launch vs pre), and average handle time on escalated tickets. Reported weekly through a dashboard your CX manager owns.

Do agents resist this rollout? Less than you expect, when the agent assist quality is high. We deploy in shadow mode first so agents see the suggestions, vote on them, and shape the deployment. Net Promoter Score among agents in our last six rollouts averaged +22.

Can we extend the agent ourselves? Yes. Code lives in your repository. Most clients add 4-9 new intents and 1-2 new channels in the six months post-handover with no AIMenta involvement.

Where this is most often deployed

Industries where AIMenta frequently scopes this kind of solution.

Common questions

Frequently asked questions

How quickly can a Customer Service AI Assistant be deployed?

Most deployments reach live production in 6–10 weeks. The first two weeks cover knowledge-base ingestion and intent modelling; weeks 3–6 handle system integration (CRM, ticketing, telephony); weeks 7–10 complete UAT and go-live. A phased rollout—email and chat first, voice later—is typical for teams with active SLA obligations.

Will the AI handle Cantonese, Mandarin, and Bahasa as well as English?

Yes. The solution is built on multilingual foundation models fine-tuned on APAC service corpora, and supports Cantonese (spoken and written), Mandarin (Simplified and Traditional), Bahasa Malaysia, Bahasa Indonesia, Vietnamese, Korean, and Japanese out of the box. Terminology dictionaries for your product catalogue are loaded at configuration time.

What happens when the AI cannot resolve a query?

The AI applies a confidence threshold (configurable, default 0.72). Below that threshold it transfers to a human agent with a full context summary—conversation transcript, detected intent, suggested resolution, and customer tier—so the agent never asks the customer to repeat themselves. Fallback routing can be configured by issue type, language, VIP status, or time of day.

How is customer data protected under PDPO, PDPA, and PIPL?

The stack is designed for APAC data-residency requirements: data processed in HK stays on HK infrastructure; Singapore deployments use AWS Singapore or GCP asia-southeast1 regions. No customer PII is sent to offshore LLM APIs without explicit consent. All conversation logs are encrypted at rest (AES-256) and purged on a configurable retention schedule.

Adjacent solutions

Related solutions

Don't see exactly what you need?

Most engagements start as custom scopes. Send us your problem; we'll tell you whether one of our productized solutions fits — or what a custom build looks like.