Agentic RAG: From Overviews to Deep Research
- Dinos Papakostas
- Oct 29
- 3 min read
TL;DR: RAG in production is no longer "retrieve a few documents, paste into a prompt, and hope the answer is correct". The modern pattern is Agentic RAG: plan-execute loops, search across enterprise sources and the web, shared memory, tool use, and verification - shipped as a system with observability and governance, not a single prompt.

*** Join our Luma community to stay up to date with our upcoming events! ***
What changed?
Enterprise questions have grown in complexity; context windows grew too, but so did noise. Tool use has become the default way to interact with language models, and the bar for trust moved from "sounds plausible" to "is auditable".
In large organizations, single-pass Retrieval Augmented Generation often produces answers that are technically correct but incomplete. For advanced use cases - like Deep Research that compiles full, grounded reports across internal knowledge and the web - you need agents that plan, iterate, and verify. Instead of one index lookup stuffed into a template, the system executes a dynamic plan: intent recognition, hybrid retrieval, internal connectors and external APIs, short- and long-term memory, and multi-agent orchestration with planners, specialist sub-agents, and reasoning models. Rigid, monolithic pipelines underperform because they leave no room for iterative control.

The anatomy of an agentic workflow
In practice, Agentic RAG routes a user request through a planner that decomposes the problem, selects tools, and executes within a latency/cost budget. A hybrid retrieval layer mixes BM25 with embedding models, followed by a pointwise or listwise reranker, while evidence is accumulated in a scratchpad and then synthesized with citations back to sources. When appropriate, tools are increasingly exposed via the Model Context Protocol (MCP) to improve interoperability and simplify deployment.

Evaluations, not vibes
Shipping Agentic RAG means treating evaluation as a first-class artifact. Beyond retrieval metrics like precision, recall, and nDCG, we score end-to-end task quality and faithfulness, and we retain traces of tool calls and agent handoffs for debugging. When ground truth is scarce, pairwise tournaments (e.g., with RAGElo) yield stable, comparative rankings across system variants and prompt changes. The result is a feedback loop that actually moves the needle, rather than one-off demos that decay over time.

Governance is part of the design
Enterprises don't just need plausible answers - they need verifiable outputs in a way that respects both user privacy and company policies. That implies Single Sign-On (SSO) and Role-Based Access Control (RBAC) are respected end-to-end, with source-level citations, configurable privacy scopes, and deployment in private cloud or on-premise when required. These constraints are where agentic systems shine: they can route, check, and adapt instead of forcing every question through a brittle, single-pass template.
When simple RAG is enough (and when it isn't)
FAQ-style lookups in narrow domains with well-structured documents? A single-shot pipeline may still be the pragmatic choice. But as soon as questions span multiple sources, modalities, or compliance constraints, Agentic RAG pays for itself by being iterative, inspectable, and adaptable.
If you want to see such an approach operating end-to-end, our Deep Research agent is a concrete example of planning, retrieval across multiple sources, tool use, and verification, producing complete reports with references rather than brief snippets. And if you're building your own, our quickstart guide on developing agents with MCP is a good place to begin.
Are you leading knowledge work in your company and want to explore how to integrate AI, RAG, and Agents in your processes and workflows? Reach out to us for an initial conversation on turning your internal expertise into a valuable asset.