Production-Grade AI System

Agentic RAG
Research Platform

A production-ready AI research assistant with self-correcting LangGraph agents, hybrid BM25 + vector search over ArXiv papers, full observability, and a modern chat UI — all in a single docker compose up.

Python 3.12 FastAPI Next.js 14 LangGraph OpenSearch Docker Compose Langfuse Redis Airflow Ollama / OpenAI

Built for Production, Not Demos

Every component is designed for reliability, observability, and extensibility — not just a proof-of-concept.

🧠

Self-Correcting Agent

LangGraph agent with 7 nodes: guardrail → retrieve → grade → rewrite → generate. Automatically rewrites weak queries with up to 2 retries.

🔍

Hybrid Search (BM25 + kNN)

Reciprocal Rank Fusion pipeline over a unified OpenSearch index with both BM25 text and 1024-dim Jina v3 dense vectors.

🔌

Pluggable LLM Providers

Flip between local Ollama and any OpenAI-compatible API (OpenAI, Groq, Together, OpenRouter, vLLM) with zero code changes.

📊

Full Observability

Every span, prompt, token count, and latency is exported to Langfuse v3. In-app thumbs up/down writes feedback to the same trace.

Answer Caching

Exact-match Redis cache returns repeated queries in under 1ms, keyed by query + model + top_k + search mode + categories.

📅

Automated Ingestion

Airflow DAG fetches new cs.AI papers daily, parses PDFs with Docling, chunks by section, embeds with Jina, and indexes into OpenSearch.

Architecture Overview

12+ services orchestrated via Docker Compose, from ingestion to serving.

Agentic RAG Architecture
Hybrid Search Pipeline RAG Pipeline Observability

Tech Stack

A carefully chosen production stack spanning frontend, backend, ML, and infrastructure.

FastAPIAsync Python backend
⚛️
Next.js 14App Router + React 18
🔗
LangGraph 0.2Agent state machine
🔎
OpenSearch 2.19BM25 + k-NN + RRF
🦙
Ollama / OpenAIPluggable LLM layer
🧬
Jina v31024-dim embeddings
🐘
PostgreSQL 16Paper metadata + ORM
🔴
Redis 7Exact-match cache
📡
Langfuse v3Tracing + feedback
🌬️
Apache AirflowScheduled DAGs
🐳
Docker ComposeFull-stack runtime
📄
DoclingPDF → structured text

How the Agentic RAG Works

A 7-node LangGraph state machine with built-in guardrails and self-correction.

1

Guardrail Check

LLM scores the query 0–100 for relevance to AI/ML research. Off-topic questions are rejected early, saving retrieval cost.

2

Hybrid Retrieval

BM25 keyword search and dense-vector kNN run in parallel over OpenSearch. Results are merged via Reciprocal Rank Fusion.

3

Document Grading

Each retrieved chunk is graded for binary relevance. Irrelevant chunks are discarded before generation.

4

Query Rewriting

If too many chunks are irrelevant, the agent rewrites the query and retries retrieval (up to 2 attempts).

5

Answer Generation

The LLM generates a grounded answer with inline citations back to the source ArXiv papers.

6

Observability & Caching

Every step is traced to Langfuse, the answer is cached in Redis, and the frontend renders reasoning steps.

Explore the Full Project

Clone the repo, run docker compose up, and have the entire platform running in minutes.