Skip to main content
Global
AIMenta
Playbook 6 min read

RAG vs Fine-Tuning vs Prompting: Which Pattern Fits Your Use Case?

Three deployment patterns, three sets of trade-offs. A decision tree that picks the right one for your AI use case in under five minutes.

AE By AIMenta Editorial Team ·

TL;DR

  • Most enterprise LLM use cases should start with prompting, move to RAG when context is large, and only fine-tune for narrow specialisation.
  • The decision tree turns on three questions: what knowledge does the model need, how often does that knowledge change, and what behaviour do you need to teach.
  • Mixing patterns (RAG + fine-tuning, prompting + RAG) is common in mature systems and often the right answer.

Why now

The pattern question dominates technical AI conversations in 2026. Vendor pitches recommend whichever pattern they sell. Engineering teams default to whichever pattern they tried first. Neither approach produces the best system.

Anthropic's Building with Claude guide and OpenAI's Production Best Practices both recommend a "start with prompting, then RAG, then fine-tune only when justified" sequence.[^1] The McKinsey Technology Trends Outlook 2025 tracks generative AI deployment patterns and finds that 71% of enterprise systems in production use RAG, 18% use prompting alone, and 11% involve fine-tuning, often in combination.[^2]

This article offers a decision tree, three pattern profiles, and a worked example for each.

The three patterns

Prompting. The model receives a task description and any context fits in the prompt window. No extra infrastructure. Behaviour shaped through prompt engineering and few-shot examples.

Retrieval-augmented generation (RAG). Authoritative content is stored in a vector database (and increasingly hybrid stores). At query time, relevant chunks are retrieved and inserted into the prompt. The model grounds its answer in retrieved content.

Fine-tuning. The model weights are updated using domain-specific examples. The fine-tuned model is hosted and served. Behaviour is taught into the model rather than provided in the prompt.

Each has distinct cost, latency, knowledge-freshness, and behaviour-change profiles.

The decision tree

Ask three questions in order.

Question 1: What knowledge does the model need?

  • Knowledge already in the foundation model's training data (general knowledge, common business concepts, popular programming languages). Prompting is enough.
  • Knowledge specific to your domain or organisation, that the foundation model does not have (your policies, your products, your customer data). RAG.
  • Knowledge that is so specialised the model cannot reason about it without deep training (rare scientific domains, proprietary languages, niche legal regimes). Consider fine-tuning, but only after RAG has been tried.

Question 2: How often does the knowledge change?

  • Daily or weekly (prices, inventory, customer data, regulations under active change). RAG. Fine-tuning would require constant retraining.
  • Monthly to quarterly (product catalogues, organisational policies). RAG with periodic re-indexing.
  • Stable or annual (general legal frameworks, settled standards). Either RAG or fine-tuning works; pick on other criteria.

Question 3: What behaviour do you need to teach?

  • Tone, format, style, structured output. Prompting handles most of this. Few-shot examples in the prompt are usually enough.
  • Domain-specific reasoning patterns the model does not naturally produce (how a claims adjuster reasons, how a pathology report is structured). Fine-tuning becomes useful, often combined with RAG for facts.
  • Refusal patterns and safety behaviour. Prompting first; fine-tuning only when call volume justifies the engineering cost.

Pattern profile: prompting

When to use. General-knowledge tasks. Drafting, summarisation, brainstorming, simple classification. Pilots and prototypes.

Strengths. Zero infrastructure. Fastest iteration. Lowest cost per change. Easiest to test.

Limitations. Context window limits. Performance degrades on highly specialised domains. Few-shot examples consume tokens.

Cost shape. Per-token inference cost. No training cost. No hosting cost beyond the model API.

Real example. A 380-person professional services firm in Singapore deploys a meeting-summary copilot using prompting alone, with the meeting transcript as input. Throughput: 1,200 summaries per week. Cost: US$0.04 per summary. No vector store, no fine-tuning. The pilot launched in three weeks.

Pattern profile: RAG

When to use. Domain-specific knowledge. Frequently changing knowledge. Use cases where source attribution matters. Most enterprise customer-facing and internal-knowledge applications.

Strengths. Knowledge can change without retraining. Source citations enable trust and verification. Easier to keep current. Composable with prompting.

Limitations. Retrieval quality is the bottleneck. Requires data engineering investment (chunking, indexing, evaluation). Prompt size grows with context. New failure modes (looks-grounded hallucinations, retrieval misses).

Cost shape. Per-token inference cost (typically larger prompts than pure prompting). Vector store hosting cost. Embedding cost at indexing and query time. Engineering cost for retrieval evaluation and tuning.

Real example. A 700-person specialty insurer in Tokyo runs a claims-policy assistant for adjusters. The policy corpus is 14,000 documents, updated weekly. RAG with a hybrid vector and keyword retrieval, source citation in every answer, escalation to a human reviewer when retrieval confidence is low. Adjuster-reported time savings: 38% on policy lookups. Year-one operational cost: US$240,000.

Pattern profile: fine-tuning

When to use. Narrow domain specialisation that prompting and RAG cannot reach. Stylistic or structural patterns the foundation model does not produce naturally. High-volume use cases where prompt-token cost dominates.

Strengths. Behaviour baked into the model. Smaller prompts at inference time. Can produce specialised reasoning patterns not achievable through prompting.

Limitations. Training cost, hosting cost, retraining cycle. Knowledge becomes stale unless re-fine-tuned. Harder to evaluate (the change is in the weights, not the prompt). Risk of degrading general capability.

Cost shape. One-time training cost (US$5,000-US$50,000 depending on model size and dataset). Ongoing hosted-model cost (typically higher than API call to a hosted foundation model). Periodic retraining.

Real example. A 540-person specialty manufacturer in Penang fine-tunes a small open-weights model on its quality-control terminology and report formats, reducing per-report inference cost by 64% versus calling a frontier model. The fine-tuning is paired with RAG for current product data. The combined system runs at high volume (8,000 reports per day) where the inference cost savings justify the fine-tuning investment.

When patterns combine

Mature production systems often combine patterns.

RAG + prompting. The default for most enterprise systems. Retrieved content in the prompt; behaviour shaped by prompt engineering.

RAG + fine-tuning. Common in regulated industries. Fine-tune for domain reasoning; RAG for current facts.

Prompting + few-shot + structured output. Often enough for simple use cases. Build out only when this pattern fails.

Fine-tuning alone. Rare in production enterprise systems. Almost always combined with RAG for fact freshness.

The Wardley Mapping framework (Wardley, 2018) helps think about this: prompting and frontier model APIs are now commodities, RAG infrastructure is becoming a product, and fine-tuning is a custom investment that should be reserved for genuine differentiation.

Implementation playbook

How to choose a pattern for a new use case.

  1. Frame the use case in three sentences. What does the user ask, what does the system answer, what knowledge does the answer require.
  2. Run the three questions (knowledge type, change frequency, behaviour to teach). Get a tentative pattern.
  3. Build the prompting version first. Even if you expect to need RAG. The prompting version validates the prompt design and surfaces failure modes early.
  4. Layer in RAG when prompting hits a knowledge ceiling. Specifically: when the model gives confidently wrong answers about your domain, or when the prompt cannot fit the necessary context.
  5. Consider fine-tuning only after RAG is mature. Fine-tuning to fix problems RAG should solve is expensive and slow. Fine-tuning to specialise a mature RAG system can be powerful.
  6. Evaluate against a fixed test set. Same set across patterns. Compare quality, latency, cost.
  7. Document the choice and the reasoning. When the pattern needs to change in 18 months, the next team will thank you.

Counter-arguments

"Fine-tuning is necessary for domain expertise." Sometimes. For most enterprise domains a well-built RAG system reaches 85-95% of the quality of a fine-tuned model at lower cost and with current knowledge. Start with RAG.

"RAG is just a transitional pattern; long context windows will replace it." Long context windows reduce some RAG complexity but not all. Source attribution, evaluation, and cost still favour retrieval. RAG is unlikely to disappear by 2027.

"Open-weights models with fine-tuning will always be cheaper." Per-token inference cost, often yes. Total cost of ownership including engineering, hosting, retraining, and evaluation, often no. Run the full TCO model before assuming.

Bottom line

For most enterprise LLM use cases the right starting pattern is prompting, the right scaling pattern is RAG, and fine-tuning is reserved for cases where prompting and RAG have been exhausted. Mixing patterns is common in production. The decision is rarely "one pattern wins" and usually "what is the right combination for this use case at this scale."

If your team is debating this question on a whiteboard right now, run the three questions and pick a pattern. Build it. Measure it. Layer in additional patterns when measurement shows the need.

Next read


By Daniel Chen, Director, AI Advisory.

[^1]: Anthropic, Building with Claude, 2024-2025; OpenAI, Production Best Practices, 2024. [^2]: McKinsey & Company, Technology Trends Outlook 2025, July 2025.

Where this applies

How AIMenta turns these ideas into engagements — explore the relevant service lines, industries, and markets.

Beyond this insight

Cross-reference our practice depth.

If this article matches your stage of thinking, the underlying capabilities ship across all six pillars, ten verticals, and nine Asian markets.

Keep reading

Related reading

Want this applied to your firm?

We use these frameworks daily in client engagements. Let's see what they look like for your stage and market.