RAG (Retrieval Augmented Generation)

Last updated on Jun 17, 2025

RAG (Retrieval Augmented Generation) is an AI architecture that allows large language models (LLMs) to retrieve and use external and domain-specific data for their responses or outputs. In addition to being trained by the model’s knowledge, RAGs help generate more accurate, context-aware responses because they retrieve relevant documents or information from external sources like a vector database.

Why is RAG important in large language models and AI applications?

The fundamental reason to use a RAG architecture is to ground the LLM to real, enterprise-specific knowledge. This architecture helps with limitations of LLMs, such as hallucinations, during the generation process.

The RAG can be used for more than retrieval, including query transformation, context formatting, and generation.

What are the components of a RAG?

The RAG architecture has 4 core and 4 optional components:

Retriever
Generator (LLM)
Embedding model
Indexing layer
Query rewriter (optional)
Reranker (optional)
Post-processor (optional)
Memory (optional)

The following table summarizes the components and their purposes with examples.

Component	Purpose	Examples
1. Retriever	Finds relevant documents/information	Vector DB search, keyword search, SQL query, etc.
2. Generator (LLM)	Generates human-like, final answers using retrieved info	GPT-4, Claude 3, Llama-3
3. Embedding Model	Converts documents and queries into dense vectors for retrieval (if using vector search)	OpenAI ada-002, HuggingFace BGE, etc.
4. Indexing Layer	Prepares the knowledge base for efficient retrieval	Vector index (e.g., FAISS, Milvus), BM25 index
5. Query Rewriter (optional)	Reframes the user query to improve retrieval quality	Rephrasing unclear user questions
6. Reranker (optional)	Ranks retrieved documents before passing them to the LLM	Cross-encoders, re-ranking models
7. Post-Processor (optional)	Polishes or filters the final generated output	Adding citations, trimming length
8. Memory (optional)	Maintains context or personalization across chats	Long-term memory, session memory

Please complete this form for your free AI risk assessment.