RAG

RAG

Retrieval-Augmented Generation

RAG (Retrieval-Augmented Generation) is an AI architecture that combines a large language model with a retrieval system, allowing the model to search an external knowledge base before generating each response — reducing hallucination and enabling answers grounded in up-to-date sources.

A RAG pipeline works in three stages: source documents are converted into vector embeddings and stored in a vector database; at query time the user's input is embedded and matched against stored chunks using semantic similarity search; the retrieved chunks are injected into the LLM's context window so the model generates a grounded, citable answer.

RAG is widely preferred over full fine-tuning when knowledge changes frequently, because the vector database can be updated without retraining the model — making RAG the dominant architecture for enterprise knowledge assistants, AI search, and products built on top of foundation models.

🔍 Click image to zoom

RAG pipeline — full visual walkthrough

Frequently Asked Questions

What does RAG stand for in AI?

RAG stands for Retrieval-Augmented Generation. RAG is an AI technique that augments a language model's output by first retrieving relevant documents from an external knowledge base, then using those documents as context when generating a response. The term was coined by Meta AI researchers Patrick Lewis et al. in a 2020 paper.

When should I use RAG instead of fine-tuning?

Use RAG when your knowledge base changes frequently, when you need to cite specific source documents, when you want to avoid the cost of retraining, or when your data is confidential and cannot be used in a training dataset. Choose fine-tuning when you need to change the model's style, tone, or output format, and when the knowledge to add is stable and non-sensitive.

What is a vector database and why does RAG need one?

A vector database stores document embeddings — numerical representations of text — and supports fast approximate nearest-neighbour search. RAG requires a vector database because semantic retrieval is based on mathematical similarity between query and document embeddings, not keyword matching. Popular vector databases for RAG include Pinecone, Weaviate, Qdrant, ChromaDB, and pgvector (PostgreSQL extension).

Frequently Asked Questions

What does RAG stand for in AI?

When should I use RAG instead of fine-tuning?

What is a vector database and why does RAG need one?

See Also