Our History with Text Analytics Pt. 1

This is part one of a two-post series about our experience with text analytics in e-discovery. Check out part two here.

Legaltech 2008

Getting ready for Legaltech in 2008 was a wild ride—and the beginning of our long strange trip with analytics in e-discovery. It’s a trip we’re still on, with a mission that remains the same: make it fast, easy, and maybe even a bit fun, to find your relevant and important documents. We want to make functionality like computer-assisted review, email threading, and finding conceptually similar documents as easy—and as trusted—as using email.

While we had analytics capabilities in Relativity as early as 2004, we started the task of reimagining it in late 2007. The product development work was spilling over into the first days of the New Year, getting dangerously close to our deadline at the end of January, when we planned to demo the technology at Legaltech. Andrew’s office was littered in sticky notes with tasks that he and our development team of 3 or 4 had to complete to get to releasable code.

It looked like we wouldn’t make it, but several all-nighters by Andrew and the team got it done. In late January 2008, we were ready to start demonstrating how these new capabilities could accelerate review workflows and amplify the efforts of case teams. We were super excited about the technology and the workflow possibilities in Relativity. Features like clustering, categorization, and the ability to locate similar documents would provide a massive benefit for our users. We couldn’t wait to show this stuff to the world. We were sure the use of analytics in e-discovery would spread like wildfire. We were wrong.

Level Setting and the Worst Software Demo Ever

Before I tell you about the worst software demo ever delivered, let’s level set with respect to terminology.

When we talk about analytics in e-discovery, we’re referring to functionality built around a search engine that not only indexes words for purposes of keyword searching, but evaluates all of the text in all of the documents in a data set to make determinations about how those documents relate to one another based on the concepts within them. (Analytics in general is a very broad category of technical capabilities and is typically defined differently in other industries. Oftentimes it focuses more on delivering insights from structured and transactional data).

There are a few commonly used core technologies that do this type of indexing, including SVM, Bayesian, and NLP. We use LSI, or latent semantic indexing, which is used by a number of other providers in e-discovery. Check out our white paper, Document Categorization Using Latent Semantic Indexing, to learn more about LSI and why we use it.

Relativity Analytics is what we call this technology in our software. It includes a number of individual features using both unstructured text (e.g., the body of an email or document) and structured text (e.g., author or recipient information for an email or document metadata). Those features range from Relativity Assisted Review, which is our predictive coding or technology-assisted review or computer-assisted review (whatever you prefer to call it) workflow; to document clustering, where Relativity gathers documents into conceptually similar groups without any user input; to email threading and near-duplicate detection. For a comprehensive walkthrough of the challenges these features solve, pick up our analytics e-book.

Okay, back to the demo. Our first demo of Relativity Analytics was maybe the worst software demo I’ve ever been a part of—and I’ve delivered or seen thousands. One of our Premium Hosting Partners, Planet Data, had invited their top client to our suite at the Warwick New York Hotel. They wanted us to show them the exciting new analytics features coming in the latest version of Relativity. However, instead of showing search results and end user functionality, we mostly showed admin setup. We didn’t have any other choice. Having just gotten Relativity Analytics working, it wasn’t the most elegant implementation. We had to spend a lot of time on setup before we showed anything else.

As we walked through it with a room full of attorneys in real time, we realized that what seemed super cool to us only an hour before, was actually super boring and complex. In fact, it felt like we were trying to launch a rocket ship to the moon directly from our laptop. Andrew and I spent most of our time clicking through admin pages, while the 12 people crammed into our small room stared uncomfortably at a projector screen. These were in-house counsel who wanted a succinct and impactful demonstration of how they could bring new efficiencies to e-discovery review. What they got was two nerds fumbling over a keyboard describing things they’d never care about—nor should they.

I think after the demo they shrugged their shoulders and said something like, “Oh, well. It doesn’t always come together the way it should.” They were being gracious.

Roadblocks to Adoption

Relativity Analytics, and the use of analytics in our space, has come a long way since that demo in early 2008. In the last few years we’ve seen increasing adoption, and 2014 was no exception: we eclipsed half a petabyte of data indexed all time, including close to 70 terabytes in the last 3 months of the year alone. Our Relativity Analytics business overall grew by 73% in 2014. This didn’t happen overnight, though—so what’s changed?

Cumulative Gigabytes in Relativity Analytics

It took several years before our customers started adopting analytics. While our technology—and our demo—improved (nowhere to go but up, right?), we didn’t start to see case teams use the technology more widely until 2011. Before that adoption began, we thought we must just be really bad at selling it.

While that might have been true, a few other barriers were preventing more widespread use, and once those barriers started to fall we began to see the uptick and excitement we’d expected from the beginning.

Have an idea of what those barriers might be? Let us know in the comments.

We’ve published a follow-up to take a deep dive into each of those barriers and discuss where we plan to go from here with analytics, so check it out here. You can also see our thoughts on the recent Rio Tinto v Vale opinion from Judge Andrew Peck here. If you haven’t already, feel free to subscribe to this blog via the sidebar to receive notifications when more posts like these are published.

Our History with Analytics in e-Discovery Part 1

Legaltech 2008

Level Setting and the Worst Software Demo Ever

Roadblocks to Adoption