In a very short space of time, technology-assisted review (TAR) has evolved from an emerging technology to a key consideration in disclosure. It would not be a surprise if in the next few years, TAR is so ingrained that it will be an assumed part of the process for all matters.
The most fundamental purpose of TAR is to categorise documents as relevant or not relevant. There are many different flavours of TAR, but overall the technology has two main categories: sample-based learning and active learning. The sample-based methodology compares its automatically categorised results against a sample set to determine accuracy. Active learning, on the other hand, does not require sampling—instead, it continually ranks documents’ percentage of relevance against previous coding decisions.
This article will walk through the active learning workflow and how it can be leveraged for your next case.
At Sky Discovery, we see this technology as a game changer. This is both because of its simplicity of use and the fact that users can set up a workflow such that only documents deemed relevant will be viewed and tagged by a reviewer. Because the system will serve up the documents most likely to be relevant first, review teams can cover them all so that the technology need not automatically tag documents as relevant for production.
Why is it Different?
A drawback of sample-based learning is the setup and admin work required throughout the review process. There is a need to review an initial sample set of documents, which the technology then uses to determine the relevance of other documents. These results are then validated, and the process repeated until the review team reaches a point of statistical confidence. At each stage there is a need to set up a “round” of documents to be reviewed and provide the review team with a sample set of documents to review.
With active learning, the workflow is less cumbersome. Once the project is set up (a very straightforward process), the system simply queues documents that need to be reviewed by the team—there is no need to set up review batches. Each user will review the documents the system queues up for them. The system will learn from the review choice on each document, adjusting the queue so that the documents it determines most likely to be relevant are placed at the top.
The review process will continue until the review team is viewing fewer and fewer relevant documents. Once the team is satisfied that no relevant documents are being queued, the results can be validated using an elusion test workflow (more on that below).
Setting Up and Conducting the Review
There are four stages in the setup of an active learning project: creating an index, linking the index, selecting the “positive” choice field, and adding reviewers. You can see a simple step-by-step guide to project setup here.
Queuing Documents for Review
Once the project is set up, the review team will be able to commence the review. The system will serve up documents based on the queue being used. There is no need to batch documents for review; each user simply clicks on the Start Review button and away they go.
|Active Learning Queue Types|
|Prioritization Review||Coverage Review|
|This queue serves up documents that are most likely to be relevant. Within the most relevant documents, the system will also serve up a selection of random documents to ensure the model is getting a better range of the documents being reviewed||The goal of Coverage Review is to quickly separate the documents into the positive choice and negative choice categories.
The documents that are served up during Coverage Review can be either relevant or non-relevant and are the most impactful to training the model. Coverage Review begins by serving the documents the model is most unsure about—these are documents with a rank near 50 (see below).
As the review progresses, appropriate documents are served up based on previous coding decisions.
When using the Prioritization Queue, the documents that the system deems “relevant” will be served up for review first. As the review progresses, the number of relevant documents being given to reviewers should decrease.
For the Coverage Queue, documents in the middle range (those with a relevance rank around 50) will be queued up for review to enable as many documents to be categorised as quickly as possible.
The screenshot below is an example of the ranking distribution chart. This is what it should look like toward the end of the review: a small number of documents in the middle and higher rankings, and a large proportion of documents, deemed non-relevant, in the lower rankings.
At the point in the review where no new relevant documents are being served, it will become necessary to validate the review and by means of elusion testing.
The elusion test is a process whereby a sample set of documents—which are deemed not relevant and have not been reviewed—is served up to the review team.
Based on this validation process, the system will calculate the statistical likelihood of there being any relevant documents it did not identify as potentially relevant, and therefore were not reviewed. If the estimated number of potentially relevant documents is below the confidence level, then the review queue is reopened and the team will continue their review. The elusion test is repeated until the desired level of confidence is reached.
The level of confidence is set by the reviewer. For example, you may ask for a test that gives you 95 percent certainty with a margin of error of 2.5 percent that you have reviewed all relevant documents (this is a standard confidence level used in e-discovery).
TAR is becoming more important as the courts not only support, but also expect, this type of technology to be implemented in the interest of proportionality and efficiency. As new rules are introduced—such as the new disclosure rules pilot in England, or the revised rules in the Victorian courts in Australia—the need for technology to assist in the disclosure process is not only key but will have greater influence as it continues to develop.