Understanding the LLM RAG pipeline

Retrieval Augmented Generation (RAG) operates on a simple yet powerful premise: before generating a response, the system first retrieves relevant information from a database or a corpus of documents [1].

Typical RAG pipeline, from Rayyan Shaikh.
  1. First an embedding model parses the user's query to a database, mimicing the understand understand context. Here context refers to new or relevant information that the the system (the LLM) was not trained on.
  2. The model retrieves relevant information from a database or a corpus of documents and feeds it to the LLM.
  3. The LLM generates a response incorporating this data.

The RAG pipeline opens up a myriad of new opportunities, as pre-trained large language models can use specialised information to generate more accurate responses for chatbots or in research [2].

Sources

[1] Shaikh, R. (2024) How to Build an LLM RAG Pipeline with Upstash Vector Database. Medium, February 15, 2024.

[2] Silva, M. (2023). Retrieval augmented generation: Keeping LLMs relevant and current. Stack Overflow Blog, October 18, 2023.