Active Learning Validation 101: Quantifying Your Success

Active learning accelerates review by assisting with the coding of documents through the use of artificial intelligence. It seems simple enough, but to ensure defensibility—and, frankly, stakeholders’ comfort with the technology—data-driven validation to back up the visible results is key.

A handful of typical metrics provide this validation for case teams: elusion, recall, richness, and precision.

Active learning projects in RelativityOne can now report on all of these data points. This is an exciting new development that we hope will expand the usefulness of active learning, and relieve administrators who were previously doing some of these calculations by hand.

Some may wonder, though: Does this availability of recall statistics signal a shift away from elusion as a reliable data point? Additionally, recall is a familiar term from sample-based learning; have we returned to the days of control sets and early innovations in technology-assisted review?

The answer is no. Elusion has not become passé—no need to throw out your “I heart elusion” t-shirt—because this change is additive. In fact, a well-rounded approach to statistically evaluating your active learning projects will give you a more complete toolset for validating your use of the technology to aid review.

Prioritized Review and Elusion

A common way to use active learning is in the context of prioritized review—especially for teams who are new to the workflow. In a prioritized review, reviewers are served the highest-ranking documents for coding as the active learning engine learns from those decisions. The most relevant documents are presented first, and as reviewers make decisions on them, the algorithm fine-tunes its predictive designations to issue more batches. At some point, “the good stuff” has all been seen, and the quality and relevance of any remaining high-ranking documents deteriorates. This drop in relevance signals that the review is nearing completion.

Once that happens, the next step is to run an elusion test, which concisely measures the impact of stopping the review and not producing the low-ranking, unreviewed documents (i.e., the “discard pile”). The resulting elusion rate will tell you what percentage of documents in the discard pile may be relevant. The value of this statistic was true before, and it remains true. But an elusion rate alone can be difficult to interpret.

Most knowledgeable practitioners often include the elusion rate as part of a broader description of the project. In that larger report, they might include the number of relevant documents found during the review, total size of the project, total size of the discard pile, and so on.

In some regard, what they are doing is providing details that recall (the ratio of discovered relevant documents to the total number thought to be in the data set) and richness (the percentage of relevant documents across the entire data set) reporting can provide automatically. These show statistical insight into how many documents in the overall data set were relevant, how many weren’t, how well the system identified them, and what stones might be left unturned if the review were to cease at a given point in time.

Thus, knowing the elusion rate alone does not give me a complete picture of a review. I may know that my elusion rate is 3 percent, but knowing whether my overall project richness was 4 percent or 40 percent tells me how strong of a result that 3 percent number is. With this in mind, elusion and richness allow me to calculate recall, and elusion and recall allow me to calculate richness.

There’s a basic principle at stake here, which is that you usually need two metrics to draw any conclusions, and the more the merrier. As a lover of data, I rarely stop at two, but two is much better than one.

When Elusion is Less Useful

Sometimes, an administrator wants to understand how the active learning engine is performing well before the review is nearing completion. They may want to look for problems or predict when the review might finish. In these situations, they want to understand both types of prediction: false negatives and false positives.

But elusion, whose strength lies in its simplicity, only measures the false negative rate. By bringing in precision (which measures false positive rates by quantifying the accuracy of the predicted designations), and richness (which predicts your total number of relevant documents), a more complete picture comes into focus, which illuminates your current efficacy as well as the path remaining for the review.

Embracing Measured Flexibility

RelativityOne is often employed in remarkable and innovative ways by our community of users, and active learning is no different. Since active learning first launched in Relativity, we have seen users employ multiple projects within a workspace, using different workflow styles, and with varying document populations and review fields, to meet the unique needs of each matter. It’s an incredibly flexible piece of technology, which can be adapted to so many uses cases. This diversity of applications requires variety in project validation, and elusion alone doesn’t always fit the bill.

Historically, some teams have chosen to manage by creating their own project validation protocols. They have drawn out random samples, calculated rates independently, and reported to their colleagues, regulators, and courts the metrics of interest. But such statistical innovation is not for everyone. So we’re happy to make it easier for all.

By accessing more validation metrics, your team is empowered to make active learning a viable and defensible option for more and more of your projects. Happy learning! And if you need any support, please don’t hesitate to reach out to our team. We’re here to help.

Jeff Gilles is a senior product manager at Relativity, where he helps guide development of our machine learning toolset. He joined Relativity in 2016, with eight years of experience developing advanced text analytics technology