The Data Dump and the Search Scare: they’re two fears every e-discovery team shares. No matter how far e-discovery and the legal technology industry have come, these two scenarios continue to haunt our nightmares. In some ways, the fright is only escalating alongside growing data volumes and complex data types.
Still, legal teams need not be paralyzed by the prospects of these spooky scenarios. We sat down with Ben Sexton from JND eDiscovery to talk about how such scary situations arise in modern litigation, and how to overcome them.
First, you've worked on quite a few cases. Talk a bit about your experience.
I’ve had the opportunity to work behind the scenes on more than 250 cases for firms, corporations, and government agencies. My goal has always been to supply clients with the tools and knowledge they need to get to the truth and meet their discovery goals. Having spent my career on the technology side, I’ve had a unique lens into the tactics used on both sides of a dispute involving e-discovery.
What about e-discovery makes it such fertile territory for “spooky” scenarios?
First, motive. To the extent that Party A can drive up costs for opposing, suppress incriminating evidence, or generally make life hard for Party B, they move the dial closer to a favorable outcome for their client. An overproduction or “data dump” can skyrocket costs for opposing, and an underproduction can hide documents necessary for opposing to substantiate their claims.
Second, opportunity. The ability to inflict damage is high. At the earliest phases of discovery, producing parties often have the upper hand negotiating critical details like the collection scope, which will determine what and how much data is turned over. If Party A is knowingly in possession of a smoking gun email, they may bury the email in a data dump, or steer the keyword negotiation to avoid hitting on that email altogether.
Even when malintent is not in play, all too often, parties can overcomplicate discovery simply by not knowing enough about the matter, the data, or the technology involved.
So tell me about a “data dump.” What is it and how does it happen?
Most commonly, data dumps come about as a failure to negotiate fairly. As litigation begins, Party A submits a discovery request to Party B. Because they may anticipate resistance from Party B, Party A casts an overly broad net (e.g., applying deliberately broad search terms) expecting Party B to counter and eventually meet in the middle. For example, a broad net might be asking for all documents with terms like “price,” “sell,” or “report.”
Party B sees the opportunity to data dump—knowing that it will make it more difficult for Party A to find the documents that really matter—and agrees to Party A’s broad request without countering with a targeted proposal. Party B then produces a massive amount of ESI, much of it with little to no relevance to the dispute. Because they’ve done exactly what Party A requested, they’ll likely get away with it. It’s a “be careful what you wish for” situation.
What do you do when you’re dumped on?
Actually, on the technology side, data dumps can be an interesting challenge. Getting dumped on usually means we’re asked to find innovative ways to unearth key tranches of documents hidden within a massive corpus. That could mean technology-assisted review (TAR), spiffy clustering workflows, or organizational tools like Fact Manager.
In a recent example, we had a case involving several million documents and a very tight deadline for review. We put our heads together and developed a workflow that drew from multiple analytics tools to calculate a ranking for each document based on its likelihood to be relevant to the case. For example, an email that was predictively coded as Hot, which lives in a key cluster, contains a key stock symbol, and was sent during the key timeframe was ranked highly. This approach allowed us to organize the review so that the first documents reviewed were the most important to the case according to many criteria, as opposed to using TAR in a silo. Documents ranked with a “10” were reviewed first, and then “9,” and so on. The first five batches proved to have 100 times more relevant documents than had we started reviewing front-to-back. We ultimately built the workflow into an application (LayerCake) and plan to reuse it on future cases.
What’s another spooky situation you’ve seen?
Search term negotiations are common territory for “gamesmanship” as well. Whether too broad or too narrow, the wrong keyword list can easily make or break an e-discovery project. That’s why, when possible, we use analytics to try to get an idea of what’s in our data set and what keywords will or won’t return. Keyword negotiations are usually fairly crude, and rely on simple hit reports that don’t speak to the richness of the query.
Wow, that’s high impact. Is that an inherent risk when using keywords?
To some extent, but there is a better way. A smarter workflow for both sides is to couple the search terms report with a richness report. Rather than negotiating a keyword’s value based on hit count alone, a richness sampling tells both parties how effective a term actually is at retrieving responsive documents. It’s already a common technique for validating analytics-based workflows. The producing party is usually able to significantly reduce their review spend by eliminating terms that don’t generate good results, and the receiving side can be confident that they aren’t getting dumped on.
Analytics also offers keyword expansion, which can help teams expound upon the keywords that are most effective for their case by highlighting similar terms that might have more accurate results for their particular data set.
Is there a silver lining in all of this?
The good news is that attorneys are becoming increasingly collaborative and transparent. Opposing parties can and often do agree to an ESI protocol early in the process, and negotiate in good faith to arrive at the parameters for what data should and shouldn’t be produced.
Ben Sexton is director of e-discovery at JND eDiscovery.
Sam Bock is a member of the marketing communications team at Relativity, and serves as editor of The Relativity Blog.