Attorneys & Speaking the eDiscovery Language

Attorneys and lit support work together all the time, but with different areas of focus—technology versus law—they may not always speak the same language. We’ve mentioned closing this gap before, but we’ve frequently heard that, sometimes, what these folks really need is to literally get on the same page—of a dictionary.

To help, we sat down with a panel of former litigation support professionals to find out what e-discovery terms most often trip up conversations between them and their attorneys. Check out the full list below to prepare for your next conversation with lit support.

Byte: A unit of digital information that most commonly consists of eight binary digits or bits. In word processing, a single character is typically one byte. In e-discovery, you’ll commonly hear about data sizes that are in one of three categories:

Megabyte = one million bytes
Gigabyte = one billion bytes
Terabyte = one trillion bytes

Note from our panel: “Please don’t ask us how many documents are in a megabyte, gigabyte, or terabyte of data. The number can vary greatly, so we can’t give you a good answer until the data is processed.”

Custodian: Just a fancy e-discovery term for people who possess or manage data that is potentially relevant to the case.

De-duplication: This is almost exactly what it sounds like—the process of comparing electronic records and removing duplicate records from the data set to help reduce the number of documents subject to review.

DeNIST: Another common method of reducing the number of documents subject to attorney or computer review, DeNISTing removes file types that are highly unlikely to have evidentiary value, such as non-user created files (including executables, system data, etc.).

Fun Fact: The “NIST” in deNIST stands for National Institute of Standards and Technology, and the process uses a list of file types maintained by that agency. Software like Relativity will compare all ESI in a collection against the list and removes any files that match it.

Extracted text: Text that’s been pulled from the native electronic file and normalized so important content can be searched for in nearly any database, repository, or file system. (See also OCR)

Hash value: Think of this like a file’s fingerprint—it’s a unique, identifying number calculated by a “hash algorithm.” If two documents are completely and totally identical, they will return the same hash value—which comes in handy during the de-duplication process, or if you need to ensure documents have not been altered.

Metadata (application vs system): For this one, turn to the Sedona Conference definitions:

“Application metadata is metadata created as a function of the application software used to create the document or file.” In other words, any information about the document itself, the author, comments and prior edits, when the document was created, viewed, modified, saved, or printed.
“System metadata is information created by a computer’s operating system or by the user and which is maintained by the operating system.” Think: a file’s location and the time and date stamps indicating when the file was created, opened, or changed.

Native file: Think of this as the original source file. It’s a file saved in the format of the original application (e.g. Microsoft Word, Excel, etc.).

NSF: Short for Notes Storage Facility, this is simply a container file for IBM Lotus Notes.

OCR: Short for Optical Character Recognition, this is a method of scanning printed material so that it’s converted into searchable electronic text and able to be copied and pasted into a new document.

Note: OCR is dependent on the quality of the printed copy and is considered 85 percent accurate. Extracted text, on the other hand, is 100 percent accurate, as it’s pulled directly from the native file.

Processing: The electronic process of extracting and recording metadata from documents to serve them up in a software platform for review.

Note from our panel: This one sounds like common sense, but often attorneys may not understand what the process entails, as it’s pretty involved. The one thing you need to know: There’s more to it than plugging in a hard drive to Relativity or your review platform.

PST: Also known as a personal storage folder, this a container where Microsoft Outlook stores its data. PST files are created when a mail account is first set up, an additional PST file can be created for backing up and archiving Outlook folders, messages, and files.

TIFF: Tagged Image File Format—essentially a picture of a document and one of the most widely supported and commonly used file delivery formats for productions.

Unitization: The assembly of individually scanned pages into documents. To unitize documents, a case team member will review each individual page in an image collection to determine which ones belong together as documents.

ZIP: A “.ZIP file” is a common format that compresses one or more files or directories for quick and easy storage or transmission.

What e-discovery terms trip you or your team up? Share your thoughts in the comments.

Kristy Esparza is a member of the marketing team at Relativity, specializing in content creation and copywriting.

Attorneys: Want to Speak e-Discovery Manager? Now You Can.