10 Ways to Improve the Performance of Retrieval Augmented Generation Systems





“Retrieval augmented generation is the process of supplementing a user’s input to a large language model (LLM) like ChatGPT with additional information that you (the system) have retrieved from somewhere else. The LLM can then use that information to augment the response that it generates.” — Cory Zue

LLMs are an amazing invention, prone to one key issue. They make stuff up. RAG makes LLMs far more useful by giving them factual context to use while answering queries.

Using the quick-start guide to a framework like LangChain or LlamaIndex, anyone can build a simple RAG system, like a chatbot for your docs, with about five lines of code.

But, the bot built with those five lines of code isn’t going to work very well. RAG is easy to prototype, but very hard to productionize — i.e. get to a point where users would be happy with it. A basic tutorial might get RAG working at 80%. But bridging the next 20% often takes some serious experimentation. Best practices are yet to be ironed out and can vary depending on use case. But figuring out the best practices is well worth our time, because RAG is probably the single most effective way to use LLMs.