by Erika Namnath - H5
on June 22, 2017
This post was originally published by H5, a Blue-level Relativity Best in Service Partner. We thought it provided great tips and best practices for processing. It's the first in a series—check out the H5 blog for more e-discovery workflow tips.
The collection is done and you finally have the data. Now what? What’s the best way to tackle processing without breaking a sweat, the bank, or your timeline?
Alongside early case assessment (ECA), this phase of e-discovery presents an opportunity to undertake a defensible reduction of data and pass along only the material that will ultimately be necessary to review. Here are a few tips that may help.
Consider the data that has been collected. Are you dealing with a select set of individual, cherry-picked files? If so, you’re in luck—kick off the processing set and go have lunch!
More likely, you’re facing hard drive images and data that has come from a variety of sources in myriad formats. In most matters, it’s only the user-created content that is of interest.
The size of a hard drive collection can cause a mini-heart attack, but user-created content is likely only a fraction of it. Processing everything rarely makes sense when there are so many files that probably don’t matter.
Consider what you want to target (or omit). Can you focus on particular areas on the drive? Targeting the User folder may make sense. If you’re concerned you may miss something that way, try excluding files that are clearly not user-generated (e.g., program files, Windows folders, and other system-related data) and process the rest. More on this below.
Litigation is an evolving, iterative process, and things change that could impact what needs to be processed. For example, consider the collection timeline. Did collection occur early and broadly due to time constraints? Often, a wider-than-necessary net is cast to include custodians and data sources on the fringe, “just in case.” Has the focus changed? Have some custodians been disqualified? Has the timeline been revised? Even minor changes can result in significant data reduction.
Prioritize what you know to be the most important from a document-count perspective. Identifying high-priority custodians and targeting email content will help yield the most documents in the shortest time and get reviewers started while you supplement with other data sources and custodians.
Each step you take to limit the size of the data set saves time and chips away at exponential costs. This is the perfect time to implement an ECA workflow to cull your data down even further. Here are some processing techniques to employ:
OCR can eat up time and money, so make sure every page counts. Does OCRing every image type make sense? How often does OCR of a .png or .jpg result in usable text? Think about where your data is coming from and how aggressive you need to be.
Consider targeting image types more likely to yield text of reasonable quality. TIFFs and non-searchable PDFs are a great place to start. You can always run additional OCR later if you find that unexpected image types might benefit.
While it can be helpful to get rid of as much noise as possible, you need to be confident about what you’re leaving on the cutting room floor:
The world of data is rife with complexities. Don’t underestimate the knowledge and ability of technical and statistical experts to provide guidance. Tools and methods to analyze, parse, sample, and otherwise manage data are constantly being developed and enhanced. Working in close partnership with your vendor at each stage to consider the inputs, goals, and time and cost constraints for each particular matter can lead to a much less stressful, more efficient, and likely more defensible engagement.
Erika Namnath joined blue-level Relativity Best in Service Partner H5 in August 2005 and is Associate Director of Technical Services within the eDiscovery Group. She focuses on litigation support services including media management, processing, imaging, and production. She has designed and implemented strategies for a variety of use cases, including data reduction and subjective culling, responsive review, expert search, and workflow prioritization.
4 Ways to Move e-Discovery Data That You May Not Know About