Relativity Home logo

Your single source for new lessons on legal technology, e-discovery, and the people innovating behind the scenes.

Document review consumes roughly 80 percent of e-discovery spend, according to RAND Corporation research. As seasoned litigators know, cases can settle over the prospect of review costs alone. Generative AI tools like Relativity aiR continue in the lineage of technology-assisted review (TAR) to reduce the cost and burden of review while increasing the transparency and accuracy of the process. But successful deployment requires more than flipping a switch.

A recent expert panel chat brought together practitioners from JND, Bayer, and Relativity to discuss practical workflows for generative-AI-based document review. Their collective experience reveals a process that shares DNA with traditional TAR but demands different discipline at key stages.

Defining the Review Population

The first step of conducting document review with generative AI, defining what documents require review, resembles traditional approaches. In most cases, practitioners still collect, apply date restrictions, run search terms, and apply other parameters to compress the population before review begins.

The difference lies in timing. With generative AI review, finalizing the population early matters more than it did with linear review or even continuous active learning (CAL).

"When we're drawing samples for prompt iteration, we want them to reflect the full review set," explained Ben Sexton, SVP of Innovation and Strategy at JND. "This is vital for validation; control set metrics are only meaningful if they are representative of the underlying review population."

This creates pressure to lock down the review set before work begins. When late additions arrive, teams face a choice: fold them into the current review project (if the data is similar enough) or treat them as a separate project with independent sampling and validation. If folding into the current project, teams need to be mindful of the implications for prompt iteration and validation.

Michallynn Demiter, eDiscovery Squad Lead at Bayer, noted that this constraint echoes earlier technology: "It makes me think back to the TAR 1.0 days. You had to have your set defined up front before you got into the real process."

The exceptions requiring separate workflows, such as videos, images, and documents with too much or too little text, remain similar to those in traditional TAR projects. But cost pressures are shifting. Early AI review pricing created incentives to deduplicate aggressively at the file level and propagate coding afterward. Ben mentioned that, “As pricing models evolve, that compression work may become unnecessary.”

Prompt Iteration: Learning What You Don’t Know

Traditional document review projects typically spend their first two to four weeks refining instructions. Review managers field hundreds of questions from reviewers. Protocols evolve daily as reviewers encounter new facts, issues, and edge cases.

“This early knowledge-building stage, of actually looking at documents, gives us context that is essential to defining the contours of relevance in our prompts,” stated Ben. 

By being systematic, prompt iteration aims to compress that multi-week learning curve into a matter of days by testing prompts against strategic samples before the full run begins.

The starting point is translating case requirements into review criteria. A production request might contain 50 document requests, but those often collapse into 10 or fewer distinct issues. Teams draft prompts that capture those issues, then test them iteratively.

Relativity's prompt kickstarter accelerates this step by generating initial prompt drafts from case materials.

"We're getting pretty usable first drafts with minimal edits by the case team before we're getting started," Michallynn said.

Sampling for Prompt Iteration

The quality of prompt iteration depends on sample composition. Random samples reveal prevalence but may miss rare document types. “Stratified samples drawn from conceptual clusters promote diversity, which is critical when one document type dominates 90 percent of the population,” noted Ben.

A third category proves equally important: threshold documents. These are the gray-area records that two attorneys might code differently. Ben noted that “exposing the prompt to borderline documents during iteration helps define instructions that cut through ambiguity.”

"Every single time we've done a stratified sample across conceptual clusters, something has come up that the case team was not expecting," Michallynn observed. "There's tremendous value in getting that diversity up front."

Teams typically begin with 50 documents for initial testing, sometimes using known documents the case team has already identified. The reviewer must be a subject matter expert with deep case knowledge, and, if possible, the same person or team should evaluate all iteration rounds to ensure consistency. When all is said and done, prompt iteration generally involves reviewing anywhere from two hundred to five hundred documents.

As a final note on setting expectations, Ben specified that, "you may be automating 5,000 hours of review, but you're creating, say, four hours of document review for a partner or SME in the case, who may not be fond of reviewing documents.”

That's an expectation best set at the onset of the project.

Knowing When to Stop

“Iteration generally ends when we start seeing diminishing returns,” stated Michallynn.

Both Ben and Michallynn note that what they’re looking for from prompt iteration is acceptable performance with a recognition that correcting one edge case may create another, a sign the prompt has reached its natural ceiling. Ben added: “It’s also important to ensure your process avoids overfitting, whereby your prompts are tuned to bat a 1.000 on a small sample, but don’t generalize well across the full population.”

Low-Richness Issues

A common concern: Can AI find the needle in the haystack?

Generative AI-based review works differently from traditional machine learning-based TAR. Where traditional machine learning requires exemplars, gen AI evaluates each document on its own merits against the prompt criteria.

“That doesn’t mean one is better than the other,” Ben clarified, “but it’s an important difference. A single responsive document in a million-document population will be marked responsive based on its content, not its similarity to a training example.”

Validation

One of the benefits of gen AI review is that teams can estimate their recall and precision before processing the full population, a step that provides confidence and, in some contexts, a basis for disclosure to opposing parties.

Pre-run validation involves drawing a separate sample (distinct from the iteration samples to avoid "iterating on the control set"), having the SME code it, then running the prompt against it to measure recall and precision.

In Ben’s view, “The benefits are substantial. Teams gain numerical measurements rather than intuitions about prompt quality.” If validation fails, they can return to iteration without having accrued processing costs on the full run. And when opposing parties require transparency, teams can disclose projected recall before reviewing the documents.

But validation adds time and requires SME effort. For internal investigations without opposing-party obligations, or for small data sets that process in under an hour, teams may choose to lean on a good prompt iteration strategy and traditional QC to evaluate their work.

"Don't let the perfect be the enemy of the good," advised Dan Wyman, Lead Sales Engineer at Relativity. "Align it to the use case you're working with."

Some matters warrant both pre-run validation and cradle-to-grave validation after processing.

“The former addresses prompt performance, whereas the latter addresses the entire review workflow, including any side processes or exceptions,” noted Ben.

Running the Review

Execution has become largely mechanical. Early adopters recall projects where 75,000 documents required estimated processing times of 76 hours. Today, similar populations complete in roughly two hours.

The operational work involves monitoring progress, spot-checking results as they populate, and addressing errors. With Relativity aiR, Michallynn and Ben note that most errors resolve with retry attempts. Persistent failures route to exception workflows for manual handling.

Of course, the rapid pace of technological change creates its own challenge: "As someone who is process-oriented, you get something defined, you want to get others trained up on it, write documentation, and it's outdated immediately," Michallynn said.

Dan echoed that sentiment, sharing that “the technology as it stands today is as slow as it will ever be, and it's already dramatically better than it was even a year ago.”

This rapid pace of improvement is both a burden and an opportunity for teams adopting or scaling AI in legal workflows.

Broader Applications and Adoption Patterns

Anecdotally, panelists noted that adoption has grown faster among corporate defense teams than plaintiffs, likely a function of population size and review cost exposure. But a shift is emerging. Receiving parties are beginning to prefer generative AI review from producing parties, believing it yields more complete productions than traditional methods, especially around low richness issues.

"We're involved in negotiations now where our clients on the receiving side, both law firms and government, are promoting the use of gen AI by the producing side, subject to transparency and disclosure of validation metrics" Ben said. "Early on, plaintiffs felt like it was just an ‘easy button’ for defendants, but I think they’re understandably considering that it may be to their benefit for opposing to use AI-based review as opposed to traditional methods."

Privilege review adoption lags responsiveness review but is growing. Additional review types such as confidential business information identification, PII detection, and vision analysis for images are expected to roll out in the coming year.

Wyman shares Relativity’s vision for expanding AI capabilities: “Being able to pull generative AI into analyzing images will be a tremendous benefit. Identifying what else can we get out of that single push of a document to bring value to that analysis is an area we’re focusing on quite heavily.”

Practical Takeaways

For teams evaluating gen AI-based document review, the panelists offered several parting recommendations:

Front-load conversations: Discuss population stability, validation requirements, and SME involvement before beginning. These factors shape the entire workflow.

Invest in sampling strategy: Diverse, thoughtfully constructed samples accelerate prompt refinement and surface unexpected document types and case facts early.

Maintain reviewer consistency: If possible, the same SME should evaluate all iteration rounds. Different reviewers, even knowledgeable ones, may interpret issues differently enough to disrupt the process. The same is true for validation.

Right-size validation to the use case: High-stakes productions to opposing parties may warrant rigorous validation. Internal investigations with time pressure may not.

Treat late data strategically: New custodians with similar roles may fold into existing projects. New departments or substantially different data may require separate treatment.

Stay current. The technology improves rapidly. Capabilities that required workarounds six months ago may now be native features.

Document review remains the largest cost center in e-discovery. Generative AI offers meaningful efficiency gains, but realizing them requires adapting workflows, not simply substituting technology for reviewers. The practitioners in this discussion have spent months refining their approaches. Their experience offers a practical blueprint for teams beginning that journey.

Graphics for this article were created by Kael Rose.

Fireside Chats with JND and Relativity

Ben Sexton is the senior vice president of innovation and strategy at JND eDiscovery and a recognized subject matter expert in AI and e-discovery. He advises law firms, corporations, and public agencies on best practices for the use of AI and other emerging technologies across the e-discovery lifecycle. Ben is an active working group leader at the Sedona Conference, a 2025 AI Visionary, and a 2024 Relativity Innovation Award winner for his work in LLM-based document review. A frequent speaker, author, and educator, his mission is to demystify AI and equip decision-makers with clear, evidence-based guidance on how they can leverage the technology to meet their legal goals. He holds multiple industry certifications and a B.S. in mathematics.

The latest insights, trends, and spotlights — directly to your inbox.

The Relativity Blog covers the latest in legal tech and compliance, professional development topics, and spotlights on the many bright minds in our space. Subscribe today to learn something new, stay ahead of emerging tech, and up-level your career.

Interested in being one of our authors? Learn more about how to contribute to The Relativity Blog.