Our History with Text Analytics Pt. 2

This is part two in a two-post series about our experience with text analytics in e-discovery.

In last week’s post, we described the somewhat rough beginning of our journey with analytics in e-discovery. We also talked about the growth in adoption of the technology that we started seeing in 2011. But prior to 2011, what were the barriers that made adoption of analytics tough in e-discovery—and are they still barriers today?

What Barriers Have We Seen?

Judicial acceptance

If you’re questioning whether or not computer-assisted review is defensible, ask Bennett Borden, chair of information governance and e-discovery at Drinker Biddle. Bennett is one of the most tech savvy, well-versed, and well-spoken litigators on the subject of e-discovery and information governance I’ve ever met. He’s been using analytics and advanced investigative workflows in his cases for years. His response to the question of defensibility is:

“Absolutely. It’s more defensible than any process you’re currently using. The technology has been approved faster by the courts than any other technology, including search terms. It’s been proven that it’s more effective than human review, and the courts have never denied a producing party’s request to use it.”

Prior to several landmark decisions, however, it took a somewhat intrepid attorney like Bennett to be willing to use analytics technology regularly. It wasn’t until cases and decisions like Da Silva Moore and last year’s tax court approval of computer-assisted review that case teams became more comfortable with the use of the technology. Of course, we took another major step toward having this barrier removed last week with Judge Peck’s Rio Tinto opinion:

“…case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.”

Check out our post from earlier this week for 3 important takeaways from the opinion.

Now that these rulings have come through, judicial acceptance is clear—and has paved the way for many more case teams to take advantage of computer-assisted review and text analytics.

High-touch education

It’s not enough to provide a product. You have to provide the education, support, and training around it to make users successful. This is especially true for analytics, which often gets pegged as a “black box.”

For us, a big part of getting users to adopt the technology is helping to get them comfortable with what it does, how it does it, where the technology works, where it falls down, and how it adds value. Over the years we’ve approached this from many angles—from short, self-paced trainings on our website, to in-depth Ask the Expert webinars, to white papers validating the defensibility of our approach, to live trainings, to certifications. Our advice team also provides hands-on guidance and support and delivers workflow recipes that go beyond user manuals.

Providing case teams with these resources will always be an essential component of fueling analytics adoption. As users become more comfortable with the software, and as we continue to make the product easier to use, we’ll keep working to deliver better educational resources.

Prescribed workflows

When we first brought Relativity Analytics to market, we had a lot of cool features, but we left it up to case teams to decide how they best wanted to apply them or construct a workflow. We thought the flexibility would be an advantage, but what we saw was that users wanted canned workflows to get started. It was enough of a challenge to get comfortable with a new technology—let alone decide specifically when and where to best apply it to a review strategy.

When we released Relativity Assisted Review, that changed. Assisted Review, in addition to being a powerful way to defensibly leverage machine learning, was built with a more prescribed workflow that helped guide users through the process. For the most part, reviewers could do what they’ve always done, focusing on batches of documents while allowing the technology to work behind the scenes.

A broad feature set

In 2013, we rounded out analytics in Relativity with email threading, near-duplicate detection, and foreign language identification. While every case might not be a candidate for Assisted Review—especially cases where the documents are predominately spreadsheets or images, or the question of responsiveness depends on numbers in a document—you’d be hard pressed to find one where email threading isn’t useful. The addition of this functionality, and a few other techniques for evaluating structured data, provided more reasons to use Relativity Analytics.

Ease of use

Demystifying the use of these advanced technologies, and making it fast and easy for end users and administrators, is a goal we try to make progress on with each release. As I said in the first post, we want to make the use of analytics in Relativity as easy and as trusted as using your email. We’re not there yet, but as we—and other software providers in our space—make headway, we’ll continue to see analytics used in a greater percentage of e-discovery matters.

Below is the same chart we shared in the first post showing the amount of data in the Relativity universe using analytics since late 2009. It’s been updated to show where our 5 barriers to adoption—judicial acceptance, high-touch education, prescribed workflows, a broad feature set, and ease-of-use—were addressed (at least in part—this is not to suggest they’ve been overcome entirely).

Cumulative GBs in Relativity Analytics - With Milestones

As the technology and the industry have evolved, legal teams have begun to dive in. The amount of data being run through Relativity Analytics year-over-year is growing steadily. In February alone we indexed 26,500 gigabytes.

Where Do We Go From Here?

Despite the recent adoption, we’re a long way from ubiquity. In fact, only 9 percent of cases in the Relativity universe are using analytics today. What excites us about that figure is that it means there is a lot more work to be done—that so many more case teams have yet to experience the benefits of this technology, and that we might get a chance to help them with that. Also, while 9 percent isn’t a huge share of the Relativity universe, it does represent a lot of data: 2.8 billion files and 5,710 matters.

When it comes to ease of use and more functionality, we’ll be focusing on the following areas in our roadmap for Relativity Analytics:

1. Making it easier to understand and search conceptually similar documents. Even once you’ve used techniques like clustering to pull together similar documents, there’s still a ton of information to go through. Providing intuitive visual metaphors for searching that information so you can quickly understand how one group of documents relates to another is an important element of unlocking the full power of analytics. It’s also important that you’re able to understand at a glance how your subjective coding information and metadata intersect with concepts in your case, and that it’s easy to take action on the documents. It’s not enough to have a graph or visualization—you have to be able to quickly move documents through review.

Below is a sneak peek of functionality we’ll be releasing later in the year. It’s a representation showing how we plan to bring clustered data to life to make it faster and easier for you to get your arms around your data sets.

2. Making analytics more automatic for admins and end users. Reducing index build time gets teams up and running more quickly with less fuss. The faster and more automated we can make the technology, the more likely you’ll be to actually use it.

3. Moving analytics earlier in the e-discovery process. Techniques like categorization, clustering, and threading are frequently used during pre-processing time, but it often requires that data to flow from a processing application to an analytics application to a review application. We feel that applying this technology in a defensible way in a single system with workflows that easily get you from pre-processing to review is key to the successful use of these tools earlier in the e-discovery process.

Increased use of analytics in e-discovery is inevitable. Data volumes and the amount of information created and stored in the digital universe—predicted to reach 44 zetabytes (that’s 44 trillion gigabytes) by 2020, with enterprises responsible for about 85 percent of it—will leave case teams with no other option. Using techniques like email threading, categorization, and forms of computer-assisted review will be an absolute necessity, and will eventually be used on the majority of cases. For our part, we look forward to chipping away at the barriers to adoption and continuing the long strange trip we began back in late 2007.

If you’re new to analytics or haven’t used it in a while, check out this analytics e-book, which introduces all the features in plain language and what they can do for your team.