25 Types of RAG: Part 2

Unpacking Advanced RAG Architectures

Apr 19, 2025

In the first part, we delved into 8 types of RAG. We covered how these models efficiently integrate retrieval with generative capabilities to enhance performance across various applications.

Understanding LLM RAG: Enhancing Output Quality and Reducing Hallucinations | by Ajay Verma | GoPenAI

In todays edition of Where’s The Future in Tech, we’ll continue our exploration, focusing on another set of exciting RAG architectures that take things to the next level. From memory-enhanced systems to tree-structured retrievals and multimodal approaches, we’ll dive deeper into how each of these models operates and the architectural components that power them. This will give you an even broader understanding of the versatility of RAG models in tackling complex real-world challenges.

9. REALM: Retrieval-Augmented Language Model

REALM integrates retrieval directly into the language model's training process, allowing the model to fetch relevant information dynamically during inference. Unlike traditional RAG setups where retrieval is a separate component, REALM's retriever is trained alongside the language model, enabling end-to-end optimization.

Below is a breakdown of each component in the architecture.

Document encoder: The Document Encoder converts documents into dense vector embeddings using a transformer-based architecture, like BERT. It pre-processes the entire document corpus, encoding each passage into a vector that captures its semantic meaning. These embeddings are stored and used later during retrieval, ensuring efficient document matching.
Retriever: The Retriever finds relevant documents by performing Maximum Inner Product Search (MIPS) between the query and pre-encoded document embeddings. This allows the system to retrieve documents based on their similarity to the query vector. The retriever is trained to select the most useful documents to improve task-specific performance.
Reader: The Reader uses the retrieved documents along with the input query to perform masked language modeling. It generates answers by predicting missing tokens based on the context provided by the retrieved documents. This helps the model produce more informed and contextually accurate responses.
Joint training loop: In the Joint Training Loop, the entire system Document Encoder, Retriever, and Reader are trained together. The loss from the Reader's prediction is back-propagated through the retriever and encoder, optimizing the entire system. This ensures the retriever fetches the most relevant documents, enhancing overall performance.

10. RAPTOR: Tree-Organized Retrieval

RAPTOR is designed for efficient retrieval by organizing the corpus into a tree structure. This method allows the system to quickly narrow down the search space by traversing the tree, reducing the number of documents that need to be processed. It integrates retrieval and generation in a way that’s optimized for large-scale datasets.

Document tree structure: RAPTOR organizes the corpus into a hierarchical tree structure, where each node represents a subset of documents. This allows the system to traverse the tree to find the most relevant documents efficiently, without having to search through the entire dataset.
Retriever: The Retriever uses the tree structure to perform efficient search operations. By narrowing down the search space progressively, RAPTOR can reduce retrieval time and improve the relevance of the documents fetched for the query.
Reader: After retrieving the relevant documents, the Reader processes them alongside the query to generate the final answer. Like other RAG models, the Reader uses the retrieved context to improve the accuracy of its predictions.
Training loop: RAPTOR’s training loop works similarly to REALM, where the retriever and reader are trained jointly. The feedback from the reader’s performance optimizes the retrieval process, ensuring that the system consistently fetches more relevant documents.

11. REVEAL for Visual-Language Models

REVEAL extends the RAG architecture to visual-language models, enabling the retrieval of images along with textual data. It enhances tasks like image captioning, visual question answering (VQA), and multimodal understanding by integrating visual context into the retrieval and generation process.

Visual encoder: The Visual Encoder processes images and converts them into vector embeddings, much like the Document Encoder for text. This allows images to be represented as dense vectors, enabling their use in a multimodal retrieval process.
Retriever: The Retriever fetches both text and image data based on the query. It compares the query’s embeddings to both textual and visual embeddings, returning relevant content from both domains. This enables a more comprehensive understanding of the query.
Reader: The Reader processes both textual and visual information in a unified manner. By combining these modalities, the reader generates responses that leverage both visual and textual context, providing more accurate and complete answers.
Joint training loop: In REVEAL, the model is trained to optimize both text and image retrieval and generation. The retriever is fine-tuned based on the feedback from the reader’s performance, which learns to generate the best possible answers by considering both modalities.

12. REACT

REACT is a RAG variant that leverages the active decision-making ability of agents during retrieval. It’s designed for dynamic systems where the AI decides in real-time whether to retrieve additional documents or adjust its response strategy based on intermediate results.

Decision-Making agent: The core of REACT is its decision-making component, which continuously evaluates whether the retrieved documents are sufficient to answer the query. If not, the agent can initiate further searches, refining the context progressively.
Retriever: The Retriever pulls documents based on the query, but with REACT, the system doesn’t just stop at one retrieval cycle. The agent can influence the retrieval process, requesting new or refined documents to better answer the question.
Reader: The Reader in REACT generates the final output based on the query and retrieved documents. However, its output may change dynamically based on feedback from the agent, making the system responsive to evolving needs.
Dynamic feedback loop: Unlike traditional RAG systems, REACT uses a dynamic feedback loop, where the agent's decisions affect future retrievals and responses. The continuous learning process helps the model adapt and improve over time.

13. REPLUG Retrieval Plugin

REPLUG is an architecture that incorporates external knowledge sources into the generation process via plug-in retrieval systems. It’s designed for scenarios where knowledge sources, like databases or APIs, need to be accessed dynamically to enhance responses.

Retrieval plugin: The Retrieval Plugin is the central feature of REPLUG, allowing the system to query external knowledge bases in real time. These knowledge sources can be anything from structured databases to online APIs, providing specialized knowledge to improve responses.
Retriever: The Retriever uses the plugin to fetch relevant external data based on the query. It acts as a bridge, connecting the external knowledge with the model’s generation process, ensuring that the retrieved data is appropriately integrated.
Reader: The Reader processes the query and the retrieved external data, generating responses by combining both internal model knowledge and external insights. This helps REPLUG provide more detailed and accurate answers, especially in specialized domains.
Plugin-based training loop: REPLUG’s training loop integrates external knowledge into the optimization process. By adjusting the retrieval and generation process to work effectively with external plugins, the system learns to use this additional knowledge source to enhance performance.

14. MEMO RAG

MEMO RAG is a RAG variant designed to store and leverage long-term memory across multiple interactions. It’s particularly useful for tasks where the system needs to remember information from previous queries and apply it in future contexts.

Memory bank: MEMO RAG introduces a memory bank that stores useful context from previous queries and their corresponding responses. This memory is updated after each interaction and can be accessed to help answer new queries more efficiently.
Retriever: The Retriever searches through both the external corpus and the memory bank to find the most relevant information. By considering past interactions, the retriever can provide more contextually appropriate responses, improving the system’s ability to handle long-term tasks.
Reader: The Reader in MEMO RAG processes the query and retrieves information from both the memory bank and the document corpus. It then generates a response, leveraging both past memory and current information.
Memory update mechanism: MEMO RAG continually updates its memory bank based on feedback and new interactions. This allows the system to refine its understanding over time and recall important details from earlier interactions, making it more effective for ongoing tasks.

15. Attention-based RAG

Attention-based RAG enhances the standard retrieval-augmented architecture by integrating attention mechanisms to focus on the most relevant information during retrieval and generation. It’s designed to refine how attention is distributed across different parts of the input and retrieved documents.

Attention mechanism: The attention mechanism in this architecture allows the model to focus on the most relevant parts of the input and retrieved documents. Instead of treating all parts equally, the attention mechanism dynamically adjusts its focus based on relevance, improving the quality of generated outputs.
Retriever: The Retriever in Attention-based RAG works similarly to standard RAG models, fetching relevant documents based on the query. However, the retrieval is influenced by the attention mechanism, which guides the system toward more pertinent documents.
Reader: The Reader processes the query and retrieved documents, applying attention to extract the most important context for generating responses. By emphasizing the most relevant information, the attention mechanism improves the accuracy of predictions.
Training loop: In this model, the training loop is designed to optimize both the retriever and the reader through attention-guided learning. The attention mechanism helps fine-tune both components, ensuring that only the most valuable information is considered in each step.

16. RETRO

RETRO (Retrieval-Enhanced Transformer) is a RAG architecture that improves generative performance by using a large-scale external retrieval process. It’s designed to combine the power of massive external knowledge with a transformer-based model for generating more accurate, informed responses.

External knowledge retrieval: RETRO leverages an external knowledge base to retrieve documents that can help improve the model's performance. By using a separate retrieval system, RETRO can access far more information than a typical language model.
Retriever: The Retriever searches through the knowledge base to fetch the most relevant documents for the query. Unlike in traditional RAG models, RETRO focuses on large-scale retrieval to pull in as much useful context as possible, improving generation quality.
Reader: The Reader processes both the input query and the retrieved documents to generate the final response. With access to external knowledge, the reader can produce more contextually accurate and informed outputs.
End-to-End integration: RETRO’s training loop allows it to fine-tune the retriever and reader jointly, optimizing them to pull in the most relevant external documents and generate better answers. This integration helps the model become more efficient over time by leveraging vast external knowledge bases.
Conclusion
As we explored in this part, RAG architectures have transformed the way AI systems combine retrieval and generation. From REALM’s document retrieval to RAPTOR’s tree-based structure and REACT’s dynamic decision-making, each model offers unique features that enhance performance across various applications. Whether it’s REPLUG’s external knowledge fetching or MEMO RAG’s long-term memory, RAG architectures are proving to be essential in improving AI capabilities. Understanding their architecture helps unlock new possibilities in fields like search systems, chatbots, and multimodal AI. The future of AI is here, and it’s built on the foundations of RAG.

Until next time,
Stay curious, stay innovative and subscribe to us to get more such informative newsletters

WTF In Tech

Discussion about this post