Your single source for new lessons on legal technology, e-discovery, and the people innovating behind the scenes.

5 e-Discovery Data Truths for Attorneys

Eric Kveton

Prepping data for review at the start of a case is one of the most technical parts of the e-discovery process, and a lot of it can be a black box to practitioners. Having some familiarity with the technical steps behind this data staging can help case teams better strategize and plan for e-discovery—and your project managers can help provide some lessons.

As a project manager who has handled cases of all sizes and at all stages, I work closely with case teams to help them better understand this preparation and plan for it accordingly. Here are the top five things attorneys could benefit from learning about this early phase of e-discovery.

1. Electronic data doesn’t mean “instant” data.

Unfortunately, getting data organized and into a review environment is rarely just a matter of pressing “Go.” It includes some manual steps and setup to tackle file error resolution, for example, alongside the process of computers churning through the data during processing. Here are a few of the steps that may be involved:

  • Text extraction and optical character recognition (OCR—turning all the text into something searchable)
  • De-duplication (taking out duplicate emails across multiple custodians)
  • De-NISTing (removing system files, thereby reducing the amount of data processed)
  • Early culling (eliminating junk based on date ranges, email domains, and other filters)
  • Keyword searches (applying keywords to further narrow down the set, and making sure they aren’t overly inclusive)
  • Foldering (sorting data into a logical organizational structure for the review team)
  • Preparing default searches and applying analytics (setting up text analytics features and common searches the review team may want to leverage during the project)

Each of these steps requires servers—and the people controlling them—to run multiple processes that distill raw data into something useful, and then organize it in a meaningful way. That takes time.

2. It’s risky to say “process everything.”

While defensibility is important when considering what files to exclude, processing “everything” can mean a lot of unhelpful files get into your review workspace. Once there, they can end up costing a lot of extra money and slowing down review. To help make processing cheaper, limit the amount of data that is churned through servers. Think about data filters—such as file type—that are relevant to the case, as well as custodian restrictions. Also, consider taking out system files.

3. There may be lots of file types to consider.

While you may be used to dealing with emails, Microsoft Word, Excel, and PowerPoint, and PDF files, there are many other file types out there that come out of specialized software that could prove very relevant to your project. What about the client’s engineering software in a patent suit? Make sure you think about what kinds of software and documents your clients may have.

4. The smallest details in your data may make search terms less straightforward than they seem.

Search terms may seem like a great way to limit unnecessary information, but it’s important to note the full breadth of your data when building your list. For example, the word “environment” may be important to your case, but it can also show up in all those email signatures asking people not to print emails for the sake of protecting the environment. That can lead to lots of false positives.

To tackle that challenge, consider using text analytics tools rather than just search terms. Many—such as email threading and repeated content identification, which helps remove repetitious language like email footers from your analysis—can be set up by your project team during data staging, so they’re ready to go as soon as you jump in for review. These features dig into the depths of your data to understand its content and can be more accurate at finding relevant documents and preventing review of irrelevant information.

5. Data expands.

We all know additional data can pile onto a case throughout the e-discovery lifecycle. But did you know that the data you already have may grow, too? While you may start out with a hard drive of 100 GBs, and be left with only 10 GBs of data after taking an inventory of what you really need, processing those 10 GBs can cause the data to grow to about 15 GBs. Why does this happen?

Well, if you aren’t de-duplicating your data set, PST files (the typical Outlook file type) expand because each email will also include the entire thread of emails. OCR also expands file size. Another cause of expansion is imaging, which is performed on each file for review workflows like redactions, and these image files will increase overall the data size as well.

Though these considerations might sound complex, the good news is that a well-rounded project team—with attorneys, admins, litigation support pros, and technologists—will bring diverse expertise to your e-discovery strategy and a strong game plan to your data preparation. That means your review will be off to an efficient start.


The latest insights, trends, and spotlights — directly to your inbox.

The Relativity Blog covers the latest in legal tech and compliance, professional development topics, and spotlights on the many bright minds in our space. Subscribe today to learn something new, stay ahead of emerging tech, and up-level your career.

Interested in being one of our authors? Learn more about how to contribute to The Relativity Blog.