Retrieval Augmented Generation (RAG) operates on a simple yet powerful premise: before generating a response, the system first retrieves relevant information from a database or a corpus of documents [1].
- First an embedding model parses the user's query to a database, mimicing the understand understand context. Here context refers to new or relevant information that the the system (the LLM) was not trained on.
- The model retrieves relevant information from a database or a corpus of documents and feeds it to the LLM.
- The LLM generates a response incorporating this data.
The RAG pipeline opens up a myriad of new opportunities, as pre-trained large language models can use specialised information to generate more accurate responses for chatbots or in research [2].
Sources
[1] Shaikh, R. (2024) How to Build an LLM RAG Pipeline with Upstash Vector Database. Medium, February 15, 2024.
[2] Silva, M. (2023). Retrieval augmented generation: Keeping LLMs relevant and current. Stack Overflow Blog, October 18, 2023.