NUS and MIT Research Shows APAC-Language LLMs Outperform English-First Models on Legal and Financial Reasoning

NUS and MIT publish multilingual LLM reasoning research showing APAC-language models trained on Mandarin and Japanese outperform English-first models on APAC legal and financial benchmarks by 18-31 percentage points.

AE By AIMenta Editorial Team · Apr 23, 2026

Researchers from the National University of Singapore and MIT have published findings demonstrating that large language models trained with APAC-language corpora as primary training data — specifically models with Mandarin Chinese and Japanese as dominant training languages — outperform English-first LLMs on APAC legal and financial reasoning benchmarks by 18-31 percentage points across standardised evaluation tasks, even when the APAC-language models are evaluated on English-language versions of those tasks.

The research introduces the APAC Legal-Finance Reasoning Benchmark (ALFR-Bench) — a new evaluation dataset designed specifically for APAC-market legal and financial reasoning, incorporating Singapore PDPA compliance scenarios, Japanese APPI interpretation tasks, Chinese commercial contract analysis, and APAC regulatory compliance question-answering. ALFR-Bench addresses the research gap that existing LLM benchmarks (MMLU, HellaSwag, ARC) evaluate reasoning on Western-market legal and financial scenarios that do not reflect the regulatory frameworks, commercial practices, and cultural context of APAC markets.

The performance gap between APAC-language primary models and English-first models on ALFR-Bench ranges from 18 percentage points (South Korean financial regulation interpretation) to 31 percentage points (Chinese commercial contract clause analysis) — performance differences that are practically significant for APAC enterprises evaluating LLMs for legal document review, regulatory compliance assessment, and financial analysis workflows. Qwen3-72B and DeepSeek-V3 achieve top ALFR-Bench scores among evaluated models; GPT-4o and Claude 3.5 Sonnet, despite strong overall benchmark performance, show systematic gaps on APAC-specific legal and financial reasoning tasks.

For APAC enterprises selecting LLMs for legal and financial AI applications, the NUS-MIT research provides empirical justification for evaluating APAC-language primary models (Qwen, DeepSeek) alongside US-developed models for APAC-specific tasks — rather than defaulting to US-developed models based on English-language benchmark rankings alone.

NUS and MIT Research Shows APAC-Language LLMs Outperform English-First Models on Legal and Financial Reasoning

How AIMenta helps clients act on this

Cross-reference our practice depth.

Related stories

NUS and NTU Publish APAC-Bench: Open-Source LLM Benchmark for APAC Regulatory and Financial Tasks

Hugging Face Raises $300M Series C and Opens Singapore APAC Headquarters

MAS Releases AI Governance Framework Version 2 for Singapore Financial Services

MAS Singapore Launches GenAI Regulatory Sandbox for APAC Financial Institutions

Singapore AI Startup Imbue Raises $200M Series B for Autonomous Enterprise Agent Platform