by Jeff Gilles
on February 14, 2018
Analytics & Assisted Review
Active learning has been a trending topic in the e-discovery industry for some time, yet to many, the tech behind it is daunting. It’s easy for technologists to jump into the nitty-gritty inner workings of active learning and AI, but many legal practitioners are left wondering what these innovations mean for their e-discovery projects.
For example, Relativity’s new active learning Assisted Review workflow is based on a Support Vector Machine (SVM) algorithm, but how does it actually work? How is it different from Latent Semantic Indexing (LSI)? And does the new active learning workflow mean LSI-based conceptual analytics is on its way out?
SVM is a tried and true approach to doing one thing particularly well: binary classification, which is really just a fancy way of saying “putting things into two buckets.” It will break things down by “X” and “Not X.” If you’re a credit card company, you might need to know if a transaction is fraudulent or not fraudulent. A website needs to know if a user is a person or not a person (i.e., a “bot”). In e-discovery, of course, the canonical example is “responsive” and “not responsive.” The SVM algorithm, inside an active learning Assisted Review workflow, will do this faster and more efficiently than ever before.
At its core, SVM—like LSI—operates on a spatial representation of your documents. With LSI, individual terms are synthesized into dimensions, which represent the relatedness of those terms. LSI lends itself to a conceptual view of the world which is quite human-like, and you can use diverse features, such as keyword expansion to see those term relationships, clustering to “automagically” organize your documents, and “find similar” to take a key document and find others like it.
SVM, on the other hand, doesn’t invest in this conceptual synthesis. It seeks only to “split” the document space into two halves—corresponding to “X” and “Not X”—and then to rapidly classify any new documents by mapping them to one of the two sides. This approach is quick to stabilize, resilient, and performant—all valuable traits in the world of document review.
If you’re not sure when to opt for SVM, think of it like a pickup truck—designed to do one thing extremely well. I’ve been able to get tremendous mileage out of my family sedan. I use it as a kid hauler, a daily commuter, and an occasional cargo transport. But when I want to move a sectional couch or bring a lawnmower to the dump, a pickup truck is what I need.
When combined with the creativity of our user community, Relativity’s flexibility yields innovative workflows. We’re excited to see how users can leverage this increased breadth of analytics, bringing together very different algorithms to solve problems that any one of them could not solve alone.
For instance, when you think about email threading, concept clustering, and the SVM classifier, you have three completely independent capabilities, each with strengths and weaknesses. With this diversity of tools at your fingertips, you can bring the right, multi-faceted solution to each problem that you face.
The combined results from using both SVM and LSI—such as whether they corroborate or contradict one another—can yield different workflows, guiding your review project. For example, if you see the two algorithms in harmony, agreeing and complementing each other, it would imply a much higher confidence and you might fast-track a document downstream in your process.
On the flip side, you might apply SVM decisions as heat-maps onto your conceptual clusters and find a mostly (but not entirely) responsive cluster. The documents in that cluster tagged non-responsive might be placed into a QC workflow to figure out why the algorithms seem to disagree. For instance, LSI may have picked up on a nuanced term correlation during its index build, or perhaps SVM has determined a strong relationship between responsiveness and the presence of one particular term.
In a way, it’s like the perfect meal that combines textures, flavors, and presentation. It appeals to multiple senses all at once, sometimes working together, sometimes catching our attention or curiosity with nuanced tension. With the new active learning Assisted Review workflow, there’s a flavorful, fresh ingredient that’s just come into season, allowing administrators to act as a master chef, coordinating the experience and using the ingredients in the right proportions. For short-order cooks—such as reviewers, newer users, or those who don’t need to dive into the admin side of Relativity—this new ingredient can add plenty of flavor on its own, making it easy to get up and running quickly.
Jeff Gilles joined the Relativity team in 2016, where he helps guide development of Analytics, following eight years of experience developing advanced text analytics technology.
4 Ways Active Learning Will Change Your e-Discovery Review
Duke Law Discusses the Future of TAR: A Twitter Compilation
Throwback: Why A Law Firm Partner Became A Chief Data Scientist