Please complete this form for your free AI risk assessment.

RAG (Retrieval Augmented Generation)

Last updated on May 30, 2025

RAG (Retrieval Augmented Generation) is an AI architecture that allows large language models (LLMs) to retrieve and use external and domain-specific data for their responses or outputs. In addition to being trained by the model’s knowledge, RAGs help generate more accurate, context-aware responses because they retrieve relevant documents or information from external sources like a vector database.

Why is RAG important in large language models and AI applications?

The fundamental reason to use a RAG architecture is to ground the LLM to real, enterprise-specific knowledge. This architecture helps with limitations of LLMs, such as hallucinations, during the generation process. 

The RAG can be used for more than retrieval, including query transformation, context formatting, and generation.

What are the components of a RAG? 

The RAG architecture has 4 core and 4 optional components:

  1. Retriever 
  2. Generator (LLM)
  3. Embedding model
  4. Indexing layer
  5. Query rewriter (optional) 
  6. Reranker (optional) 
  7. Post-processor (optional) 
  8. Memory (optional)

The following table summarizes the components and their purposes with examples.

Component Purpose Examples
1. Retriever Finds relevant documents/information Vector DB search, keyword search, SQL query, etc.
2. Generator (LLM) Generates human-like, final answers using retrieved info GPT-4, Claude 3, Llama-3
3. Embedding Model Converts documents and queries into dense vectors for retrieval (if using vector search) OpenAI ada-002, HuggingFace BGE, etc.
4. Indexing Layer Prepares the knowledge base for efficient retrieval Vector index (e.g., FAISS, Milvus), BM25 index
5. Query Rewriter (optional) Reframes the user query to improve retrieval quality Rephrasing unclear user questions
6. Reranker (optional) Ranks retrieved documents before passing them to the LLM Cross-encoders, re-ranking models
7. Post-Processor (optional) Polishes or filters the final generated output Adding citations, trimming length
8. Memory (optional) Maintains context or personalization across chats Long-term memory, session memory

Secure your agentic AI and AI-native application journey with Straiker