How we built an agentic retrieval system for a global leadership development firm that turned 10 million pages of leadership research into a tool learners can actually trust.

The problem

Decades of original research, competency models, assessments, case studies, and facilitator guides; ten million pages of hard-won expertise. And almost none of it reachable the moment a learner actually needed an answer.

That was the situation at a global leadership development firm whose work serves Fortune 500 customers, coaches, and individual professionals. A learner with a question had to search across thousands of documents, compare overlapping frameworks, and somehow judge which source to trust. The firm's single greatest asset was, in practice, locked inside its own documents.

They wanted to turn that archive into something a learner could simply talk to: an experience that felt like asking an expert who'd read everything the firm had ever published. The hard part was never the chat. It was making the answers trustworthy enough to put the firm's name on.

Why ordinary search wasn't enough

The firm's content library had grown over decades. The same leadership concept might appear in a 1990s framework, a 2010s revision of that framework, a case study built around it, and a facilitator's guide for teaching it. Four documents, overlapping but not identical, each written for a different purpose. A learner searching the archive got all four back, with no signal about which to trust for their question.

Ordinary search treats all of that as the same flat text. Ask the archive a question and you'd get several plausible results with nothing to tell you which to rely on. For an individual learner, that meant confusion. For an enterprise learning director, it meant risk.

And the product would ship under the firm's name. Any answer it gave, to a Fortune 500 learning director or to an individual subscriber, had to be defensible. Not 'probably right.' Defensible: cited, traceable, and grounded in a passage the firm had written.

What we built

We ran the engagement on Traversaal Pro, our enterprise retrieval engine, working alongside the firm's product and content leaders to pin down what a good answer meant for a learner, an enterprise buyer, and the people responsible for protecting the firm's intellectual property.

The first job was making the content addressable. We went through the whole library, cleared out duplicates, organised everything by topic and authority, and gave every passage a clear line back to its original source. Without that clean structure underneath, nothing else would have produced answers the firm could trust.

On top of that sits an agentic retrieval layer. Instead of running one search per question, it plans how to search, deciding which kinds of document matter most for what the learner asked, weighing each by the authority its format carries, pulling across several content types, and going back for more evidence when a single source isn't enough to stand behind. That's what lets it answer a hard question with the right mix of research, frameworks, examples, and facilitation material.

How we built the agent with ADK

The agentic retrieval layer is built on ADK, the Agent Development Kit, an open-source framework for building and running production agents. ADK gave us a code-first way to define how the agent plans a search, which tools it can reach for, and how it chains several retrieval steps together, without rebuilding that orchestration from scratch. Just as important, the three things the firm cared about most, keeping answers defensible, being able to see how the agent reasoned, and proving quality before anything shipped, are first-class parts of the framework rather than things we bolted on afterward.

Guardrails were how we made "defensible" a rule instead of a hope. ADK lets you run checks at the moments that matter, before and after each model call and each tool call, so we could enforce the firm's hard line at the point of action. The agent doesn't get to answer from thin air. A guardrail confirms that what it's about to say is backed by retrieved firm passages, holds the answer back when the evidence isn't there, and keeps the agent from wandering off the firm's own material or surfacing content it shouldn't. So grounding is checked on every turn, not just hoped for at the end.

Observability is what lets the firm inspect any answer it gives. Because every run through ADK produces a full trace, the firm can see which documents the agent considered, how it weighed them, the passages it pulled, and which claim each citation supports. Learners see the clean citations; the firm's internal teams can open the whole trace and follow the agent's reasoning step by step. The answer is never taken on trust, it can be checked, and that was non-negotiable from day one.

Evaluations are what keep the quality bar holding as the library grows. Before the pilot reached a single customer, both teams agreed on what a good answer looked like for this audience, then turned that into an evaluation suite using ADK's built-in evaluation framework. ADK scores not just the final answer but the path the agent took to reach it, so we could measure grounding, relevance, source quality, citation accuracy, and usefulness against a fixed set of test cases. Those evals travel with the product. As the firm adds new material, the same bar is applied automatically, and a regression shows up in testing before a learner ever would.

The result

Ten million pages of expert content are now usable through a single conversation. Every answer in the pilot is grounded in the firm's own material, with the citation right there for the learner to see.

The firm didn't just walk away with a product. It walked away with a repeatable way to bring new content in: every answer's sourcing visible, the guardrails enforcing grounding on each turn, and the quality bar holding automatically through evaluation as the library grows. What was a locked archive is now something a learner can talk to, and the firm can defend.

Next engagement

Have an archive gathering dust?

Two or three clients at a time. 30-minute call, no pitch deck.

© 2026 Traversaal AI, Inc.