Understanding RAG: When to Use It and When Not To

RAG is probably the most over-hyped and under-understood pattern in AI right now.

The short version: RAG lets you give an LLM access to your own documents at query time, without fine-tuning. You retrieve relevant chunks, shove them into the prompt, and let the model answer.

Simple in theory. Surprisingly subtle in practice.

The three problems RAG actually solves

Problem 1: The context window limit
LLMs have a fixed context window. You can’t dump your entire knowledge base into every prompt. RAG lets you be selective — retrieve only the chunks that matter for this specific query.

Problem 2: Stale training data
The model doesn’t know what happened after its training cutoff. RAG pulls from a live data source you control. Your docs update, the answers update.

Problem 3: Hallucinations on proprietary data
A model trained on the public web knows nothing about your internal systems. RAG grounds the model’s answers in your actual documentation.

When RAG is the wrong tool

RAG adds latency. It adds cost. It adds a retrieval failure mode that’s hard to debug.

If your use case is a well-defined task on structured data — classification, extraction, summarisation — RAG is overkill. Use a targeted prompt with a smaller, faster model.

RAG shines when:

Your knowledge base is large and dynamic
Queries are unpredictable (you can’t pre-write prompts for every case)
You need citations or source attribution

What you’ll build in Path 1

Project 3 in Path 1 is a full RAG pipeline on AWS — OpenSearch Serverless for the vector store, Bedrock for embeddings and generation. You’ll hit every tradeoff above with real infrastructure.

Start Path 1 →