Processing 102: Collection & Review Teamwork

One of the key components to a successful e-discovery project is proper communication across the different teams responsible for its many stages. Often we refer to this as working “end to end” on the EDRM. From legal hold and collection requests to the format of production and each step in between, teams must collaborate to ensure success.

If you manage processing jobs, your work is the link between the left-hand side of the EDRM and the right—those who gather the data, and those who analyze it.

To be the best bridge on your e-discovery team, make sure you’re asking the collection pros the right questions—and giving proper notice to the review gurus, too.

Questions to Ask the Collection Team

For processing teams, it is critical to know your data. Request information from the legal hold and collection teams to gain a better understanding of the data coming your way.

1. What am I looking at and where did it come from?

In addition to the volume of data, ask about the source devices when applicable. Knowing from whom and what devices the data was collected may be very important when determining the processing specifications, but also the order in which it is ingested. In instances where data is collected from shared locations, consider assigning the source data as a custodian, including notes of which custodians had access to this location. This may broaden the scope of a custodian’s data for review.

While looking to the collection teams, be sure you look downstream; there may be some important decisions to be made based on the agreed upon production format. A common issue raised late in the process is the format of email productions. As emails can come in many different formats, production specifications may require a normalized format such as .MSG. Legacy mail formats and Lotus Notes can pose some challenges when producing native files and are important to know when selecting the appropriate processing tool.

2. What time zones and languages are important to this case?

As you stage data for processing, be certain you’re applying the correct time zone to the different data sources. Processing in the correct time zone(s) will help will help reviewers gain a perspective of communications in the custodians’ “real time” during their review. When certain communications transpired may be very important as the facts of the case emerge.

As time zones are captured, different languages associated with the collection of data should be, as well. The native language and geographical location for your custodians should accompany the data. Be sure to consider the different languages that may be in the collection and enable these languages for any files that require optical character recognition (OCR). Identifying the languages involved with the data will ensure your OCR is complete at the time of processing. This will save your team from having to generate images and run OCR post-processing.

3. Is de-duplication acceptable?

While de-duplication can be helpful to narrow the scope of review, there are some important considerations. Knowing the production requirements should drive the scope used for de-duplication. Some productions require de-duplication, while others require duplicates to be included. When de-duplication is required, check with the case team to help identify important custodians. In cases with these types of important players, take care to ensure the ingestion or de-duplication of the data is done in the proper order so the most critical custodians' data is preserved. To de-duplicate files but still keep track of each custodian that maintained a copy of those files, you might also consider creating an “all custodians” field to house that information for each file.

Notice to Give to the Review Team

Once all the processing specifications have been decided and the data has been processed, you can begin communicating nuances in the processing results to the case team and reviewers. This can be done with error reports, but it also may be helpful to create places to display this information for reviewers. Getting this information to the teams early can prevent surprises down the road.

1. Here are the errors we encountered.

Processing errors can be caused by many different factors but are often related to corrupt or password-protected files. Work with the case team to establish a workflow for dealing with errors. Consider whether they will request passwords from custodians, or if they’ll receive replacement files instead. Including these files for reviewers will call attention to them and enable their team to build a plan to address them. If there are proprietary file formats or files a processing tool cannot access properly to extract the appropriate text and metadata, work with the review team to develop a method to address them.

2. We’re missing some data. Here’s why.

While errors and unprocessable files are important to call out for the review team, you will also want to identify files without extracted text or OCR. They’ll need to address files that are not searchable within a collection. As these files will likely not be returned when running search terms, case teams can decide whether they’ll require a separate review workflow with specialized attention from the team.

3. You’ll see these date ranges present in the data set, but they may not look the way you’d expect.

It is important to consider if and how dates should be used to cull the review set. You may be capturing a variety of different dates. While date fields like “date sent” and “date received” are fairly straightforward, others, like “date created” and “date last modified,” can be affected when a user copies or moves the files to different locations. In addition, some files store dates as internal metadata. For those without internal metadata, many processing applications will pull from external metadata or operating system metadata—which could mean conflicting results with the other data in the set. For example, we’ve seen projects where emails have been unnecessarily included in a data set because their dates reflected a system-wide conversion to a new document management system—not because of the email data itself.

All of this may mean that the review team sees unexpected results in the data set, or that their requested date ranges are overly inclusive or exclusive—so it’s important for you to help them understand what will be there and why.

Ensuring your project goes smoothly requires input from many different stakeholders. Just as the processing team reaches out for direction from other teams, it is crucial they communicate back to the teams with their findings.

Processing 102: Communicating with Collection and Review Teams