Simple RAG Q&A System

A fully decoupled, production-ready Retrieval-Augmented Generation (RAG) API designed to eliminate LLM hallucinations and provide verifiable, real-time answers.

Live Demonstration: Ingesting a 19-page research paper and testing real-time retrieval via Swagger UI.

Project Impact & Architecture

  • Designed and deployed a containerized API that updates its knowledge base instantly without model retraining.
  • Engineered an automated document ingestion pipeline supporting PDF, TXT, and CSV formats.
  • Replaced static parametric memory with Qdrant Cloud vector storage to ensure 100% source transparency.
  • Successfully hosted and maintained the live infrastructure on Render via Docker Hub.

Problem Statement

Standard pre-trained Large Language Models (LLMs) rely entirely on static "parametric memory" baked into their weights. This leads to costly retraining cycles for updated information and frequent "hallucinations" where the model invents answers without factual backing.

Solution & Features

I built a two-engine RAG system that forces the LLM to fetch facts from a vector database before generating an answer:

  • Document Ingestion: Automatically chunks uploaded documents and converts them into dense vector embeddings using OpenAI.
  • Similarity Search: Uses a Retriever to perform high-speed searches in Qdrant Cloud to find the exact passages matching the user query.
  • Transparent Generation: The Generator combines the prompt with the retrieved documents to stream a highly accurate response, citing the specific source chunks to guarantee trust.

Tech Stack

PythonFastAPILangChainQdrant CloudDockerRenderOpenAI

Explore the Project