Twice the Accuracy, Half the Time: A Head-to-Head Test of aiR for Review

As lawyers are well-versed in using technology in e-discovery, generative AI comes across as a natural extension of existing approaches. When Relativity released aiR for Review for its cloud-based platform, RelativityOne, A&M’s Forensic Technology Services became an early adopter of the tool, recognising its potential to change how to conduct document review from here on out.

Since then, we have used the product in dozens of cases, helping clients uncover key facts and identify relevant documents faster than ever before. From the start we knew that getting both internal and client buy-in would require clear measurement of generative AI’s improvements across the trifecta of speed, cost, and accuracy.

In litigation the stakes are high, and legal professionals understandably won’t shift to a new product unless the benefits are clear and the risks well understood. Many wonder whether the output is as good as traditional approaches, or are they sacrificing quality on the altar of speed and savings?

So what can you do to answer those questions and turn skeptics into advocates? Find a data set that can effectively compare aiR for Review’s results to human review—to demonstrate that using the tool actually reduces risk with more accurate output. This type of A/B comparison goes a long way, so at A&M, we made sure to have it at our disposal.

Today, I wanted to share four important lessons we’ve learned from developing this type of experiment and using it to drive successful adoption of generative AI for document review at A&M and with our clients.

Laying the Groundwork for Widescale Success

When diving into aiR for Review, set yourself up for a strong showing that you can share with internal teams or clients moving forward.

1. Take advantage of relatively small cases that can be good opportunities to show impact, fast.

Our experiment (which I’ll go into detail on below) involved about 30,000 documents and very effectively proved the benefits of using aiR with minimal resources. It allowed us to compare the first case, which required 1,350 hours of reviewer input, and took over 3 weeks to complete, to the second, which was done in under 10 days (and actual aiR analysis time was only around 12 hours). These savings extrapolate to larger cases and projects that would have taken 10-12 weeks and now can be completed in two. When using aiR for Review, we no longer see document review projects that take over a month.

2. Establish a foundation for capturing performance criteria.

The case was set up well to measure quality relatively easily because it had a data set that could directly compare manual review to aiR for Review. But in general, we have had little difficulty establishing performance metrics in other scenarios as we have set up the foundation to do so. Our teams include experienced AI specialists and sample sets are always subject to senior review. We’ve set up best practices and find that we very rarely require more than three iterations to arrive at a stable prompt. Then, in disputes work, we also conduct statistical sampling. In one recent case that is working its way through the Cayman courts, elusion rates proved that we were obtaining 99.5 percent of the relevant documents—better than most human reviews. These types of defensible metrics are incredibly valuable to convince clients to move forward.

3. Recognise that you’ll need to develop some new processes, particularly around prompting.

We learnt quickly that, as with any new technology, there is a learning curve and a need to implement new approaches and ways of working. We’ve developed, a comprehensive library of prompts tailored to different case types and have our teams at the ready to adopt aiR for Review’s new features like its prompt kickstarter. The results so far have been remarkable. We are consistently achieving cost savings of 30-70 percent and reducing review timeframes from weeks to days on live cases, and this is only the beginning. Our commitment to integrating aiR in our practice allows us to use aiR solutions in any review project moving forward.

4. Consolidate your results into standards you can share, which demonstrate both the impact of the tool, as well as your familiarity with the technology.

For example, throughout multiple projects we have noticed that aiR for Review’s “not relevant” scores are almost 100 percent correct and even on a first pass, “relevant” scores are correct over 90 percent of the time. By using the product consistently, we can understand where it works and where we should add our expertise and can tell clients what to expect. The impact is that we are now seeing rapid acceptance from our client base. They know that we have already built up a substantial caseload and have the benefit of AI specialists supporting e-discovery consultants. Knowing that we understand what aiR for Review is doing and how to quickly get the best results from the product gives clients the confidence to shift to new ways of working.

One Case, Two Ways, and a Clear Winner

Comparing quality of review processes can be challenging since, as a rule, no two legal cases are alike. What counts as success in one might mask problems in another. However, in a recent situation, A&M was fortunate to have a natural experiment set up for us.

In this case, one client asked us to review the same collection of documents twice, and in close succession due to a new legal issues being generated. The first case took place prior to the release of aiR for Review, so the project was conducted within RelativityOne using keyword searches and manual review supplemented by a quality review by the client’s legal team. When asked to conduct a second review of essentially similar facts and documents, we jumped at the opportunity to use aiR for Review and compare the output.

For the second review, only 2 prompt criteria iterations were needed as we used aiR for Review’s rationale and considerations to refine the AI-assisted predictions. We used the rationale to identify where the model was potentially over-prioritising certain criteria, such as document types or terminology that should have been excluded. Having aiR’s detailed reasoning played a direct role in improving accuracy and better aligning the prompt with the project scope. The entire process only required the manual review of approximately 2,200 documents—with just over 1,000 reviewed after each iteration.

So, what did we find? As the table below shows, using aiR for Review proved to be twice as accurate as relying on human reviewers when comparing just the proportion of coding decisions that were overturned on review. Plus, it completed the entire review in nearly half the time.

	Manual Review	Relativity aiR for Review
% Overturned	16.4%	7.9%
Project duration	17 working days	9 working days

A key factor here is the solution’s machine offers greater consistency: aiR for Review provides a single perspective across the entire document set, which leads to less risk, less work to clean up mistakes, and less costs to get from start-to-finish.

The takeaway from test case study? aiR for Review’s results were remarkable. In a straight comparison between two near identical cases, generative AI proved to be twice as accurate at coding, requiring less than half the number of overturns on review, generating significant savings for the client.

When the Evidence is Clear, Change Follows

It's a luxury to be able to conduct a controlled experiment. But when you have it, take advantage!

By carefully structuring a side-by-side comparison, we were able to show that aiR for Review delivers greater accuracy, faster turnarounds, and measurable cost savings. With the right case study in hand, clear performance metrics, and a willingness to adapt workflows, any legal team can achieve the same results.

For A&M, the takeaway is simple: aiR for Review isn’t just a faster way to get the job done—it’s a smarter, more consistent way to deliver better outcomes.

Graphics for this article were created by Caroline Patterson.

Innovation in Action: Breaking New Ground with aiR for Review and Privilege

Discover how Relativity aiR for Review and aiR for Privilege are reshaping what’s possible in legal document review. You’ll get an inside look at how a customer uses them to dramatically reduce review time and costs, and deliver standout results for their client.

WATCH THE VIDEO

Gary Foster is a managing director with Alvarez & Marsal's Dispute and Investigations practice in London, where he leads the firm's UK e-discovery team.