Curated weekly · 36 tools · 30 categories
The AI tool landscape,
curated & ranked.
Each entry includes pricing, use cases, limitations, and an AIMenta editorial verdict — so you can spend less time evaluating and more time deploying.
By category
36 matching tools for "voice"
Descript
· DescriptEdit video and podcast by editing the transcript. Industry-defining tool for podcasters and content creators; AI features include voice cloning, eye contact, and studio sound.
ElevenLabs
· ElevenLabsThe category-defining voice AI. Highest-quality TTS, voice cloning from 30 seconds of audio, and an expanding library of conversational voice models. The default for production voice.
ABBYY Vantage
· ABBYYABBYY Vantage is an enterprise intelligent document processing (IDP) platform combining OCR, machine learning document classification, and data extraction into a low-code platform. Unlike cloud-native services (AWS Textract, Azure Document Intelligence), ABBYY Vantage supports on-premises deployment and provides 150+ pre-built skills for common document types: invoices, purchase orders, contracts, ID documents, bank statements, and customs forms. For APAC enterprises in regulated sectors — financial services, healthcare, government, logistics — where data sovereignty requires on-premises deployment or where document complexity exceeds cloud API capabilities, ABBYY Vantage is the enterprise IDP recommendation.
Anyword
· AnywordPerformance-driven AI copywriting platform with predictive performance scoring, A/B copy variants, and brand voice enforcement for APAC marketing teams optimising conversion rates.
AWS Textract
· Amazon Web ServicesAWS Textract is a fully managed machine learning document processing service that automatically extracts text, handwriting, tables, and form data from scanned documents and images. Unlike simple OCR, Textract understands document structure — it can identify form fields, table cells, and key-value pairs without requiring custom templates. For APAC enterprises on AWS running high-volume document processing workflows — KYC document extraction (passports, identity documents), invoice and purchase order processing, contract data extraction, and insurance claims processing — Textract provides a scalable, API-accessible intelligent document processing (IDP) layer that integrates natively with AWS storage, Lambda, and downstream business applications.
Azure Document Intelligence
· MicrosoftAzure Document Intelligence (formerly Form Recognizer) is Microsoft's AI document processing service, offering pre-built extraction models for common document types (invoices, receipts, ID documents, contracts) and a custom model builder for organisation-specific document types. For APAC enterprises on Azure or Microsoft 365 — the majority of large APAC financial institutions, professional services firms, and multinationals — Document Intelligence is the natural document AI choice: it integrates natively with Power Automate for workflow automation, Logic Apps for process orchestration, and Copilot Studio for document-driven conversational AI.
Bland AI
· Bland AIAI phone calling infrastructure for high-volume APAC outbound and inbound campaigns — enabling APAC enterprises to deploy voice AI agents for appointment reminders, lead qualification, payment follow-up, and customer surveys at scale with per-minute pricing and CRM integration.
Cartesia
· Cartesia AILow-latency text-to-speech API optimized for real-time voice AI applications — delivering sub-50ms streaming speech synthesis for APAC AI phone agents, live voice assistants, and interactive applications where TTS latency is a primary user experience constraint.
Cleanvoice AI
· CleanvoiceSpecialized AI audio cleaning service removing filler sounds, mouth noises, dead air, and stutters from podcast and voice recordings — enabling APAC podcasters and content creators to upload raw audio and receive professionally cleaned files without manual editing.
Coqui TTS
· Open Source (Coqui)Open-source TTS toolkit with XTTS-v2 zero-shot voice cloning and multilingual synthesis — enabling APAC engineering teams to create custom branded voices for Japanese, Korean, and Chinese virtual assistants by cloning a voice from a short audio sample or fine-tuning on APAC speaker data without training from scratch.
Coupa
· Coupa Software Inc.Coupa is the leading AI-powered business spend management (BSM) platform that unifies procurement, supplier management, invoicing, contract management, and expense management in a single cloud platform — with AI capabilities that surface savings opportunities, automate risk monitoring, and provide predictive spend analytics across the enterprise. Coupa is widely deployed at large APAC enterprises in financial services, technology, manufacturing, and retail — organisations that manage hundreds of millions of dollars in indirect spend across multiple Asian markets and supplier networks. Coupa's Community.ai leverages anonymised spend data from its entire customer network to provide benchmarking and savings recommendations specific to spend category, industry, and geography — including APAC market-specific insights on supplier pricing and category benchmarks. For APAC finance and procurement leaders, Coupa provides the spend visibility and AI-driven control needed to reduce maverick spend, accelerate invoice processing, and manage supplier risk across complex Asian supply chains.
Deepgram
· DeepgramSpeech-to-text API focused on accuracy, latency, and customization. Nova-3 leads on real-time streaming for voice agents and call analytics.
ERNIE
· BaiduERNIE (Enhanced Representation through kNowledge IntEgration) is Baidu's large language model family, powering the Wenxin Yiyan (文心一言) consumer AI product. As China's dominant search engine operator, Baidu has embedded ERNIE across its ecosystem — Maps, DuerOS voice assistant, cloud services, and enterprise AI products. ERNIE 4.5 (2026) demonstrates competitive Chinese-language performance and is the preferred model for enterprises with established Baidu Cloud relationships or state-sector compliance requirements.
Genesys Cloud CX
· GenesysGenesys Cloud CX is an enterprise contact centre as a service (CCaaS) platform that integrates AI across the entire contact centre operation — intelligent routing, IVR, real-time agent assistance, workforce engagement management, and analytics. Genesys has deep APAC deployments in telecommunications (Telstra, Singtel, SoftBank), financial services (major APAC banks and insurers), and retail enterprises that run contact centres of 500–10,000+ agents. Genesys AI capabilities include: AI-powered routing that matches each interaction to the best-fit agent based on skills, customer history, and predicted outcomes; real-time agent copilot that provides live suggestions and knowledge articles during calls; automatic speech recognition and NLP in major APAC languages; sentiment analysis for real-time coaching triggers; and predictive engagement that identifies and intervenes with website visitors likely to need support. For APAC enterprises with large contact centre operations, Genesys Cloud represents the consolidation of voice, chat, email, social, and messaging channels on a single AI-powered platform.
Gladia
· GladiaSpeech-to-text API with real-time transcription and speaker diarization — providing APAC developers with audio transcription, speaker identification, live captioning, and automatic translation for meeting intelligence, call analytics, and voice application backends.
Jasper
· JasperMarketing-focused AI writing platform with brand voice training, campaign workflows, and a library of marketing-specific templates.
Jasper
· Jasper AI Inc.Jasper is an AI content generation platform targeting marketing teams, with strength in long-form marketing content: blog posts, ad copy, email campaigns, landing pages, and social media content. Jasper's brand voice feature allows teams to define and enforce a consistent writing style across all AI-generated content — a key differentiator versus using ChatGPT or Claude directly. For APAC content marketing teams managing high volumes of blog, email, and social content production, Jasper provides structured AI workflows above the raw capability of general-purpose LLMs.
Kokoro TTS
· Open Source (hexgrad)Lightweight 82M-parameter neural text-to-speech model producing high-quality multilingual speech — enabling APAC engineering teams to run natural-sounding TTS for Japanese, Korean, Chinese, and English locally on CPU without cloud API dependency, with inference fast enough for real-time APAC voice agent and call center applications.
LOVO AI
· LOVO Inc.AI voiceover and video creation platform with 500+ voices across 100 languages — enabling APAC content teams to produce localized narration and AI-generated video in a single workflow, covering APAC languages from Mandarin to Bahasa Indonesia for marketing and training content.
Medallia AI
· Medallia Inc.Medallia AI is the artificial intelligence and machine learning capability layer embedded across the Medallia Experience Cloud platform — covering customer experience (CX), employee experience (EX), and contact centre analytics. The AI capabilities include text analytics on open-ended survey responses, social feedback, and contact centre recordings; sentiment scoring and topic classification; predictive NPS and attrition modelling; and AI-generated action recommendations. For APAC enterprises already on Medallia for their Voice of Customer or employee listening programmes — common in large financial services, telecommunications, retail, and hospitality companies in Singapore, Hong Kong, Australia, and Japan — Medallia AI represents an incremental capability upgrade that improves the signal quality from existing survey investments.
Murf
· Murf AIStudio-style voice generator with 120+ voices in 20+ languages. Strong UX for non-technical users producing e-learning, IVR, and explainer audio.
Murf AI
· Murf Inc.AI voiceover platform with 120+ voices across 20+ languages — enabling APAC content teams to produce studio-quality narration from text scripts for e-learning, corporate video, product demos, and marketing content without voice recording studios.
OpenAI Voice
· OpenAIOpenAI's TTS and Realtime voice models. Realtime API enables genuine voice agents with sub-second latency; TTS HD is a strong, less-expensive alternative to ElevenLabs for narration.
Piper TTS
· Open Source (Rhasspy)Fast local neural TTS system optimized for on-device inference — providing APAC engineering teams with real-time Japanese, Korean, and Chinese speech synthesis on CPU-only hardware including Raspberry Pi and embedded systems, enabling APAC IoT voice interfaces, kiosk assistants, and on-premises call center agents without cloud dependency.
PlayHT
· PlayHTAI voice cloning and text-to-speech platform with 800+ voices and 100+ language support — enabling APAC content creators and enterprises to generate realistic voiceovers, clone brand voices, and produce multilingual APAC audio content without recording studios.
pyannote.audio
· Hervé Bredin (CNRS)State-of-the-art speaker diarization and voice activity detection toolkit — providing APAC data science teams with neural models for identifying "who spoke when" in multilingual multi-speaker audio recordings, enabling automated attribution of Japanese, Korean, and Chinese meeting transcripts, call recordings, and interview audio without manual speaker labeling.
Resemble AI
· Resemble AIEnterprise AI voice cloning and dubbing platform — enabling APAC enterprises to create high-fidelity voice clones from existing recordings, produce AI-dubbed multilingual APAC video content, and deploy consistent branded voice identities across customer-facing AI applications.
Retell AI
· Retell AIConversational voice AI platform for APAC customer service automation — deploying LLM-powered phone agents with sub-800ms latency, natural conversation interruption handling, human escalation routing, and APAC multilingual voice support for inbound and outbound call center workflows.
Traydstream
· TraydstreamTraydstream is an AI-powered trade finance document digitisation and compliance checking platform that addresses one of APAC's most costly operational problems: Letter of Credit discrepancies. The platform uses optical character recognition and AI to extract data from trade documents (Bills of Lading, Commercial Invoices, Certificates of Origin, Packing Lists), cross-checks documents against LC terms and UCP 600 rules, and flags discrepancies before bank submission. Processing 8M+ trade finance documents per month across APAC, Europe, and the Middle East, Traydstream is deployed by DBS, HSBC, Standard Chartered, and hundreds of corporates across the Singapore-Hong Kong trade finance corridor.
Twilio
· TwilioCloud communications platform with programmable voice, SMS, WhatsApp, and video APIs for APAC engineering teams building custom customer engagement workflows at any scale.
UiPath (AI and Document Understanding)
· UiPath Inc.UiPath is the leading enterprise RPA platform globally, with deep install base across APAC in financial services, shared services, manufacturing, and BPO. UiPath AI adds Document Understanding (intelligent document processing for invoices, purchase orders, contracts, and customs forms), AI Center (an MLOps platform for deploying ML models into UiPath workflows), Autopilot (AI-assisted bot creation), and Communications Mining. For APAC enterprises with existing UiPath automation programmes, these AI features represent the upgrade path from rule-based RPA to AI-augmented intelligent automation without platform migration.
Vapi
· Vapi AIVoice AI platform for building AI-powered phone agents — enabling APAC developers to construct inbound and outbound call automation with custom LLM backends, TTS/STT provider selection, function calling, and conversation state management without building telephony infrastructure.
Vocode
· VocodeOpen-source Python library for building real-time conversational voice agents — orchestrating the speech recognition, LLM reasoning, and text-to-speech synthesis pipeline for APAC teams building automated call center agents, voice-controlled applications, and multilingual customer service automation for Japanese, Korean, and Chinese phone channels.
Voiceflow
· VoiceflowNo-code conversational AI platform enabling APAC enterprise teams to design and deploy AI chatbots and agents across web, WhatsApp, LINE, and messaging channels.
Writer
· WriterEnterprise writing platform with proprietary Palmyra LLMs, brand-voice enforcement, and on-prem deployment options. Targets regulated industries.
Writer
· WriterEnterprise AI writing platform with brand voice enforcement, style guide compliance, and team-wide content governance for APAC regulated organisations.
Need help choosing the right stack?
We help APAC enterprises design AI tool stacks that match their data, compliance, and budget realities — not vendor decks.