Skip to main content
Japan
AIMenta

Curated weekly · 36 tools · 30 categories

The AI tool landscape,
curated & ranked.

Each entry includes pricing, use cases, limitations, and an AIMenta editorial verdict — so you can spend less time evaluating and more time deploying.

Browse

By category

💬 08
Chat & assistants
General-purpose conversational AI
Explore
🧠 09
Foundation model APIs
Programmatic LLM access
Explore
⌨️ 07
Code assistants
IDE copilots & autocomplete
Explore
04
Code generation platforms
Build apps from prompts
Explore
🤖 06
Agent platforms
Autonomous LLM workflows
Explore
🗂️ 07
RAG & vector databases
Retrieval infrastructure
Explore
🎨 08
Image generation
Text-to-image models
Explore
🖼️ 05
Image editing & enhancement
AI-powered photo workflows
Explore
🎬 06
Video generation
Text-to-video models
Explore
📹 05
Video editing & repurposing
Cuts, captions, clips
Explore
🎵 03
Audio & music
Generative music
Explore
🗣️ 03
Voice & TTS
Text-to-speech & voice cloning
Explore
📝 06
Transcription & STT
Speech-to-text
Explore
🌐 02
Translation
Multilingual AI
Explore
🔍 06
Search & research
Answer engines
Explore
📋 05
Meetings & note-taking
Recording, transcripts, summaries
Explore
✍️ 04
Writing assistants
Marketing & long-form copy
Explore
📊 04
Presentations
AI slide builders
Explore
🎯 05
Design & UI
AI for designers
Explore
📈 05
Data analysis
Talk to your data
Explore
📄 02
PDF & document AI
Chat with documents
Explore
💼 05
CRM & sales AI
Pipeline & call intelligence
Explore
📣 05
Marketing AI
Campaigns, personalization, ops
Explore
🎧 04
Customer support
AI agents for support
Explore
🔗 04
Workflow automation
Connect AI to your stack
Explore
☁️ 06
LLM hosting & inference
Serve open-weight models
Explore
⚙️ 05
ML platforms & ops
Experiment tracking & MLOps
Explore
🛡️ 04
AI safety & guardrails
Production controls
Explore
📚 04
Knowledge management
AI-native notes & wikis
Explore
📡 06
AI observability
LLM monitoring & evals
Explore
All tools

36 matching tools for "voice"

Clear all

Descript

· Descript
Recommended

Edit video and podcast by editing the transcript. Industry-defining tool for podcasters and content creators; AI features include voice cloning, eye contact, and studio sound.

Freemium · Free; Hobbyist US$12/mo · Free tier Video editing & repurposing Transcription & STT

ElevenLabs

· ElevenLabs
Recommended

The category-defining voice AI. Highest-quality TTS, voice cloning from 30 seconds of audio, and an expanding library of conversational voice models. The default for production voice.

Freemium · Free; Starter US$5/mo · API · Free tier Voice & TTS

ABBYY Vantage

· ABBYY
Decent fit

ABBYY Vantage is an enterprise intelligent document processing (IDP) platform combining OCR, machine learning document classification, and data extraction into a low-code platform. Unlike cloud-native services (AWS Textract, Azure Document Intelligence), ABBYY Vantage supports on-premises deployment and provides 150+ pre-built skills for common document types: invoices, purchase orders, contracts, ID documents, bank statements, and customs forms. For APAC enterprises in regulated sectors — financial services, healthcare, government, logistics — where data sovereignty requires on-premises deployment or where document complexity exceeds cloud API capabilities, ABBYY Vantage is the enterprise IDP recommendation.

Enterprise · API · Self-host

Anyword

· Anyword
Recommended

Performance-driven AI copywriting platform with predictive performance scoring, A/B copy variants, and brand voice enforcement for APAC marketing teams optimising conversion rates.

Freemium

AWS Textract

· Amazon Web Services
Recommended

AWS Textract is a fully managed machine learning document processing service that automatically extracts text, handwriting, tables, and form data from scanned documents and images. Unlike simple OCR, Textract understands document structure — it can identify form fields, table cells, and key-value pairs without requiring custom templates. For APAC enterprises on AWS running high-volume document processing workflows — KYC document extraction (passports, identity documents), invoice and purchase order processing, contract data extraction, and insurance claims processing — Textract provides a scalable, API-accessible intelligent document processing (IDP) layer that integrates natively with AWS storage, Lambda, and downstream business applications.

Usage-based · API · Free tier

Azure Document Intelligence

· Microsoft
Recommended

Azure Document Intelligence (formerly Form Recognizer) is Microsoft's AI document processing service, offering pre-built extraction models for common document types (invoices, receipts, ID documents, contracts) and a custom model builder for organisation-specific document types. For APAC enterprises on Azure or Microsoft 365 — the majority of large APAC financial institutions, professional services firms, and multinationals — Document Intelligence is the natural document AI choice: it integrates natively with Power Automate for workflow automation, Logic Apps for process orchestration, and Copilot Studio for document-driven conversational AI.

Usage-based · API · Free tier

Bland AI

· Bland AI
Decent fit

AI phone calling infrastructure for high-volume APAC outbound and inbound campaigns — enabling APAC enterprises to deploy voice AI agents for appointment reminders, lead qualification, payment follow-up, and customer surveys at scale with per-minute pricing and CRM integration.

Usage-based

Cartesia

· Cartesia AI
Decent fit

Low-latency text-to-speech API optimized for real-time voice AI applications — delivering sub-50ms streaming speech synthesis for APAC AI phone agents, live voice assistants, and interactive applications where TTS latency is a primary user experience constraint.

Usage-based

Cleanvoice AI

· Cleanvoice
Decent fit

Specialized AI audio cleaning service removing filler sounds, mouth noises, dead air, and stutters from podcast and voice recordings — enabling APAC podcasters and content creators to upload raw audio and receive professionally cleaned files without manual editing.

Usage-based

Coqui TTS

· Open Source (Coqui)
Decent fit

Open-source TTS toolkit with XTTS-v2 zero-shot voice cloning and multilingual synthesis — enabling APAC engineering teams to create custom branded voices for Japanese, Korean, and Chinese virtual assistants by cloning a voice from a short audio sample or fine-tuning on APAC speaker data without training from scratch.

Open source

Coupa

· Coupa Software Inc.
Recommended

Coupa is the leading AI-powered business spend management (BSM) platform that unifies procurement, supplier management, invoicing, contract management, and expense management in a single cloud platform — with AI capabilities that surface savings opportunities, automate risk monitoring, and provide predictive spend analytics across the enterprise. Coupa is widely deployed at large APAC enterprises in financial services, technology, manufacturing, and retail — organisations that manage hundreds of millions of dollars in indirect spend across multiple Asian markets and supplier networks. Coupa's Community.ai leverages anonymised spend data from its entire customer network to provide benchmarking and savings recommendations specific to spend category, industry, and geography — including APAC market-specific insights on supplier pricing and category benchmarks. For APAC finance and procurement leaders, Coupa provides the spend visibility and AI-driven control needed to reduce maverick spend, accelerate invoice processing, and manage supplier risk across complex Asian supply chains.

Enterprise · API

Deepgram

· Deepgram
Recommended

Speech-to-text API focused on accuracy, latency, and customization. Nova-3 leads on real-time streaming for voice agents and call analytics.

Usage-based · Nova-3 US$0.0043/min · API · Free tier Transcription & STT

ERNIE

· Baidu
Niche

ERNIE (Enhanced Representation through kNowledge IntEgration) is Baidu's large language model family, powering the Wenxin Yiyan (文心一言) consumer AI product. As China's dominant search engine operator, Baidu has embedded ERNIE across its ecosystem — Maps, DuerOS voice assistant, cloud services, and enterprise AI products. ERNIE 4.5 (2026) demonstrates competitive Chinese-language performance and is the preferred model for enterprises with established Baidu Cloud relationships or state-sector compliance requirements.

Freemium · API · Free tier

Genesys Cloud CX

· Genesys
Recommended

Genesys Cloud CX is an enterprise contact centre as a service (CCaaS) platform that integrates AI across the entire contact centre operation — intelligent routing, IVR, real-time agent assistance, workforce engagement management, and analytics. Genesys has deep APAC deployments in telecommunications (Telstra, Singtel, SoftBank), financial services (major APAC banks and insurers), and retail enterprises that run contact centres of 500–10,000+ agents. Genesys AI capabilities include: AI-powered routing that matches each interaction to the best-fit agent based on skills, customer history, and predicted outcomes; real-time agent copilot that provides live suggestions and knowledge articles during calls; automatic speech recognition and NLP in major APAC languages; sentiment analysis for real-time coaching triggers; and predictive engagement that identifies and intervenes with website visitors likely to need support. For APAC enterprises with large contact centre operations, Genesys Cloud represents the consolidation of voice, chat, email, social, and messaging channels on a single AI-powered platform.

Enterprise · API

Gladia

· Gladia
Decent fit

Speech-to-text API with real-time transcription and speaker diarization — providing APAC developers with audio transcription, speaker identification, live captioning, and automatic translation for meeting intelligence, call analytics, and voice application backends.

Usage-based

Jasper

· Jasper
Decent fit

Marketing-focused AI writing platform with brand voice training, campaign workflows, and a library of marketing-specific templates.

Paid · Creator US$49/mo · API Writing assistants Marketing AI

Jasper

· Jasper AI Inc.
Recommended

Jasper is an AI content generation platform targeting marketing teams, with strength in long-form marketing content: blog posts, ad copy, email campaigns, landing pages, and social media content. Jasper's brand voice feature allows teams to define and enforce a consistent writing style across all AI-generated content — a key differentiator versus using ChatGPT or Claude directly. For APAC content marketing teams managing high volumes of blog, email, and social content production, Jasper provides structured AI workflows above the raw capability of general-purpose LLMs.

Paid · API

Kokoro TTS

· Open Source (hexgrad)
Recommended

Lightweight 82M-parameter neural text-to-speech model producing high-quality multilingual speech — enabling APAC engineering teams to run natural-sounding TTS for Japanese, Korean, Chinese, and English locally on CPU without cloud API dependency, with inference fast enough for real-time APAC voice agent and call center applications.

Open source

LOVO AI

· LOVO Inc.
Decent fit

AI voiceover and video creation platform with 500+ voices across 100 languages — enabling APAC content teams to produce localized narration and AI-generated video in a single workflow, covering APAC languages from Mandarin to Bahasa Indonesia for marketing and training content.

Freemium

Medallia AI

· Medallia Inc.
Decent fit

Medallia AI is the artificial intelligence and machine learning capability layer embedded across the Medallia Experience Cloud platform — covering customer experience (CX), employee experience (EX), and contact centre analytics. The AI capabilities include text analytics on open-ended survey responses, social feedback, and contact centre recordings; sentiment scoring and topic classification; predictive NPS and attrition modelling; and AI-generated action recommendations. For APAC enterprises already on Medallia for their Voice of Customer or employee listening programmes — common in large financial services, telecommunications, retail, and hospitality companies in Singapore, Hong Kong, Australia, and Japan — Medallia AI represents an incremental capability upgrade that improves the signal quality from existing survey investments.

Enterprise · API

Murf

· Murf AI
Decent fit

Studio-style voice generator with 120+ voices in 20+ languages. Strong UX for non-technical users producing e-learning, IVR, and explainer audio.

Freemium · Free; Creator US$29/mo · API · Free tier Voice & TTS

Murf AI

· Murf Inc.
Decent fit

AI voiceover platform with 120+ voices across 20+ languages — enabling APAC content teams to produce studio-quality narration from text scripts for e-learning, corporate video, product demos, and marketing content without voice recording studios.

Freemium

OpenAI Voice

· OpenAI
Recommended

OpenAI's TTS and Realtime voice models. Realtime API enables genuine voice agents with sub-second latency; TTS HD is a strong, less-expensive alternative to ElevenLabs for narration.

Usage-based · TTS US$15/M chars; Realtime US$200/M tokens · API Voice & TTS

Piper TTS

· Open Source (Rhasspy)
Recommended

Fast local neural TTS system optimized for on-device inference — providing APAC engineering teams with real-time Japanese, Korean, and Chinese speech synthesis on CPU-only hardware including Raspberry Pi and embedded systems, enabling APAC IoT voice interfaces, kiosk assistants, and on-premises call center agents without cloud dependency.

Open source

PlayHT

· PlayHT
Decent fit

AI voice cloning and text-to-speech platform with 800+ voices and 100+ language support — enabling APAC content creators and enterprises to generate realistic voiceovers, clone brand voices, and produce multilingual APAC audio content without recording studios.

Freemium

pyannote.audio

· Hervé Bredin (CNRS)
Recommended

State-of-the-art speaker diarization and voice activity detection toolkit — providing APAC data science teams with neural models for identifying "who spoke when" in multilingual multi-speaker audio recordings, enabling automated attribution of Japanese, Korean, and Chinese meeting transcripts, call recordings, and interview audio without manual speaker labeling.

Open source

Resemble AI

· Resemble AI
Decent fit

Enterprise AI voice cloning and dubbing platform — enabling APAC enterprises to create high-fidelity voice clones from existing recordings, produce AI-dubbed multilingual APAC video content, and deploy consistent branded voice identities across customer-facing AI applications.

Usage-based

Retell AI

· Retell AI
Decent fit

Conversational voice AI platform for APAC customer service automation — deploying LLM-powered phone agents with sub-800ms latency, natural conversation interruption handling, human escalation routing, and APAC multilingual voice support for inbound and outbound call center workflows.

Usage-based

Traydstream

· Traydstream
Recommended

Traydstream is an AI-powered trade finance document digitisation and compliance checking platform that addresses one of APAC's most costly operational problems: Letter of Credit discrepancies. The platform uses optical character recognition and AI to extract data from trade documents (Bills of Lading, Commercial Invoices, Certificates of Origin, Packing Lists), cross-checks documents against LC terms and UCP 600 rules, and flags discrepancies before bank submission. Processing 8M+ trade finance documents per month across APAC, Europe, and the Middle East, Traydstream is deployed by DBS, HSBC, Standard Chartered, and hundreds of corporates across the Singapore-Hong Kong trade finance corridor.

Enterprise · API

Twilio

· Twilio
Recommended

Cloud communications platform with programmable voice, SMS, WhatsApp, and video APIs for APAC engineering teams building custom customer engagement workflows at any scale.

Usage-based

UiPath (AI and Document Understanding)

· UiPath Inc.
Recommended

UiPath is the leading enterprise RPA platform globally, with deep install base across APAC in financial services, shared services, manufacturing, and BPO. UiPath AI adds Document Understanding (intelligent document processing for invoices, purchase orders, contracts, and customs forms), AI Center (an MLOps platform for deploying ML models into UiPath workflows), Autopilot (AI-assisted bot creation), and Communications Mining. For APAC enterprises with existing UiPath automation programmes, these AI features represent the upgrade path from rule-based RPA to AI-augmented intelligent automation without platform migration.

Enterprise · API · Self-host

Vapi

· Vapi AI
Decent fit

Voice AI platform for building AI-powered phone agents — enabling APAC developers to construct inbound and outbound call automation with custom LLM backends, TTS/STT provider selection, function calling, and conversation state management without building telephony infrastructure.

Usage-based

Vocode

· Vocode
Decent fit

Open-source Python library for building real-time conversational voice agents — orchestrating the speech recognition, LLM reasoning, and text-to-speech synthesis pipeline for APAC teams building automated call center agents, voice-controlled applications, and multilingual customer service automation for Japanese, Korean, and Chinese phone channels.

Open source

Voiceflow

· Voiceflow
Recommended

No-code conversational AI platform enabling APAC enterprise teams to design and deploy AI chatbots and agents across web, WhatsApp, LINE, and messaging channels.

Freemium

Writer

· Writer
Decent fit

Enterprise writing platform with proprietary Palmyra LLMs, brand-voice enforcement, and on-prem deployment options. Targets regulated industries.

Enterprise · Team US$18/user/mo; Enterprise custom · API Writing assistants

Writer

· Writer
Recommended

Enterprise AI writing platform with brand voice enforcement, style guide compliance, and team-wide content governance for APAC regulated organisations.

Enterprise
Vendor-neutral by design

Need help choosing the right stack?

We help APAC enterprises design AI tool stacks that match their data, compliance, and budget realities — not vendor decks.