Relativity Fest 2025 was one for the record books. If you missed all the fun this year, there are still plenty of insights waiting for you.
In the session “Generative AI in e-Discovery: How to Test, Trust, and Thrive in a New AI Era,” a panel of industry leaders discussed best practices for using gen AI to yield transformative results during doc review with Relativity aiR.
The panel featured:
- Cristin Traylor, Sr. Director, AI Transformation & Law Firm Strategy, Relativity (moderator)
- Tara Emory, Special Counsel, Covington & Burling
- Robert Keeling, Partner, Redgrave
- CJ Mahoney, Counsel, Cleary Gottleib Steen & Hamilton, head of e-discovery group
- Ben Sexton, Senior Vice President, Innovation & Strategy, JND eDiscovery
Understanding the “Why” Behind aiR’s Decisions
You can't make technology roar if you don’t understand what’s under the hood.
The panel began by diving into how aiR for Review’s generative AI reaches its conclusions.
The workflow begins when users input carefully crafted prompt criteria (more on this later) and point aiR at a set of documents. aiR then reviews the documents based on the prompt criteria.
During its review, aiR selects citations, builds rationales, and argues against its own conclusions to identify and test alternate considerations. All of these outputs are provided to the user to review and consider. Finally, it generates a recommendation of very relevant, relevant, borderline, not relevant, or junk for each document.
“The considerations aiR provides can change the way I view an entire document,” Tara said. “I've had my mind changed about whether a document is responsive or not based on the arguments. And that's exercising attorney judgment.”
This transparency behind aiR’s recommendations is key, helping attorneys sharpen that judgment to deliver effective results.
The Delicate Art of Prompt Iteration
Once aiR provides a recommendation, Cristin explained, your SME validates that the citation prompting aiR’s decision is actually in the document—which helps users feel assured of their protection against hallucinations.
Then the real artistry comes in: prompt iteration. Seeing where your output meets your expectations, and where it needs refinement. aiR's rationale and considerations empower reviewers to hone their prompts quickly. This iterative process not only improves accuracy but also accelerates the review timeline—something that traditional TAR models struggle to match.
“I evolve my results,” Tara shared, “by using documents that are already tagged as ground truth so I can iterate them like sort of a control set. I want to see what happens when I run them through prompts and iterate them.”
The panel also stressed the importance of refining prompts not just through relevant content, but by clearly labeling what is not responsive as well. This strategy helps generative AI models better identify nonresponsive qualities, improving overall performance.
Great Samples Produce Great Results
“Sampling is the most important part of prompt iteration,” Ben explained. “Exposing your prompt to a diverse array of documents that cover the different facets of the population, and then exposing it towards threshold documents in the gray area so that you can draw bright lines between relevant and not relevant.
“How you do that is an area of art,” Ben continued—whether through stratified or diversity sampling, or establishing a pool of borderline documents by running a preliminary prompt against a random sample. “A lot of times we actually try to take a bigger sample that includes everything we want at once so we can iterate on one collective batch.”
So how do you know what to sample?
If your sample sizes are small, Robert cautioned, iterating on additional or larger samples will help avoid overfitting.
“It depends on what your goal is,” Tara advised. “Your goal could just be defensibility. You want a certain percentage of recall. And if you're doing that, you have to go on random samples. So, control set is not the only way to do it—though it tends to be the most efficient.”
Crunching the Numbers
Once you’re confident in your results from a qualitative perspective, you’ll want to validate your prompt with statistics before you can run it on the whole corpus, Cristin explained.
This is done with the same calculations as you may be familiar with from earlier-generation technology-assisted review tools: recall, precision, and elusion rate. When you use aiR for Review, the Review Center in RelativityOne performs these calculations automatically.
Cristin walked through the process: “You'll have your subject matter expert again review the documents. You'll be able to see the progress through Review Center, and then you'll get your results at the end—your recall, precision, and elusion—all in Relativity. Then you can decide: ‘Do I want to go ahead and accept these results? Do I need to go back and iterate a little bit more on my prompts because maybe I'm not getting the results that I was hoping for? Or am I ready to run it on everything?’”
Walking the Borderline
When it comes to including or excluding borderline documents, there’s no one-size-fits-all approach.
“I love that you have the option,” CJ said, “because I don't think there is an answer that's right in every circumstance.”
CJ emphasized that borderline decisions should be data driven: “I like to have an independent assessment of how much borderline bumps recall and how much it hurts precision. And determine what I'm going to do based on that.
“And then, based on the matter, it might be best to just exclude. It might be a matter where I am using aiR as the final responsiveness call. Or I might review borderlines because of the drop in precision. But I think it's very matter specific and I do some sampling to figure it out each time.”
“My clients value efficiency and effectiveness,” Robert added. “So, oftentimes, that means either sampling or ignoring borderline documents. But it depends on the case. If it's being used for key doc review, my merits counsel, every time, will want to look at the borderline documents. So, it really depends on what you're using it for.”
“To me, it's about asking: ‘what did we negotiate,’” Robert concluded. “If I've already exceeded my negotiated recall level through the very responsive/responsive set, I don't need to look at borderline documents. I can declare victory and go home.”
Tara agreed. “If the two sides have agreed to a recall and you meet that recall, then I think it's objectively reasonable to just abide by what that is.”
What Works for You
That’s a lot of stipulations to consider, but the panel agreed on one overarching standard: the most important factor is to simply make sure the tool works.
Are the results aligned with your goals? Is the technology giving you what you need? Are you meeting the agreed upon recall level? These are the real measures of success.
Robert had a pithy response: “Is it working at finding key documents? The thing speaks for itself.”
But Wait, There’s More
Ben summed it all up like this: “So that, in a nutshell, is a systematic way to do the first two weeks of a document review: when you don't know what you don't know, you're dialing in those review instructions and closing gaps. With aiR, we're doing that in 4 to 8 hours.”
This is just a sample (pun intended) of the topics the panel discussed, but it highlights the session’s key message.
By understanding how the technology works, honing prompt iteration, validating your results, and, most importantly, leading with human expertise and oversight, you have the power to completely transform your legal operations with generative AI solutions like Relativity aiR for Review—driving faster turnaround times, lower costs, and fewer headaches.
That’s how you test, trust, and thrive in this new AI era.
Graphics for this article were created by Caroline Patterson.






