Your single source for new lessons on legal technology, e-discovery, and the people innovating behind the scenes.

Email Threading 101: An Introduction to an Essential e-Discovery Tool

Jacob Cross

Originally published more than a year ago, this post is a helpful look at one of the easiest wins in all of e-discovery analytics. We've updated it to reflect the current capabilities of Relativity, and are republishing it to give you another look.

The average businessperson sends about 40 emails per day. It may not sound like much, but if you do the math, it means a single employee creates nearly 10,500 emails per year (more if they work weekends).

With these numbers, it’s no wonder email is a dominant data form in e-discovery. Even in cases involving only a few custodians, you’re looking at a data set filled with thousands of emails, one for every time your custodian hits send.

Yet somehow, email threading still isn’t used in every case—which is shocking to us, given its two huge benefits. First, it prevents your team from reviewing information multiple times; second, and even more importantly, it reduces the likelihood of coding mistakes.

Avoid Déjà vu

Email threading identifies email relationships—threads, people involved in a conversation, attachments, and duplicate emails—and groups them together so you can view them as one coherent conversation.

First, text analytics will identify which of the documents in your data set are emails, then look for embedded messages within those emails. For example, if Rick writes Daryl an email, Daryl's reply will likely contain Rick’s original message at the bottom.


These are called segments. Rick’s email, if it were part of an e-discovery collection, would have two segments.

An algorithm compares and matches segments, grouping emails and attachments from the same conversation—known as a thread group—together into a neat thread. Next, the technology analyzes the text, sent time, attachments (and their text), and the sender of each email to determine uniqueness or inclusiveness.

Inclusive email messages contain the most complete content—all the text and attachments in a whole email thread group. Conversely, non-inclusive emails are those with text and attachments that are contained in another (inclusive) email. In other words, if a user reviews only the inclusive messages, they will have read all content in the email thread group.

By reviewing inclusive messages, rather than non-inclusive messages, your team bypasses redundant content, reducing the number of documents they need to review. 

Give Your Reviewers the Complete Picture

As you can verify in your own inbox, emails can have tens—sometimes even hundreds—of segments, some of which are potentially important to your investigation (“Want to commit fraud with me?”) and others that are just fluff (“Thanks for your help!”).

When you don’t use email threading for e-Discovery, you’re setting your reviewers up to only see portions of those long chains—sometimes just the “fluff,” even though there may be more to the story.

For example, say there are 10 messages in an email thread group. If these 10 messages were batched to reviewers without using email threading, they would be mixed into the data set as separate messages with no particular order or grouping.

Not only does this open the door to reviewing the first message 10 times—once on its own, once as a segment in the second message, once as a segment in the third message, and so on—but it also increases the chance that your reviewer will miss potentially responsive information because they’re only getting part of a larger conversation, making it difficult to make an accurate coding decision.

For example, how would you code the following exchange?


If you were a reviewer, you’d likely code this email as non-responsive, as it doesn’t contain any relevant information. But, had your team used email threading, you would have noticed that there's more to this conversation.

First, you would see there are three emails in the thread group:


Then, you would choose to review the inclusive message to ensure you're getting the whole conversation, just as you would in a native email client like Microsoft Outlook:


If pricing is important to your case, this email is definitely responsive.

Another benefit of email threading in e-discovery is the ability to see the organization and coding status of these conversations at a glance. With email threading visualization, as you drill into each piece of a thread, you're able to see an illustration of where it lives in the chain. So, for this final email about setting up a pricing conversation, you can see that it's inclusive (fully shaded) and has been marked as responsive (its box is blue).


So, in addition to avoiding duplicative reviews of the same email, you're able to perform a quick QC as you're reviewing a document—verifying which component of a conversation you're looking at, and getting insight into any coding decisions that have been made so far.

Email threading is right for every case because it makes review easier, more efficient, and more accurate—every case, regardless of how big or small, can benefit from it.

Jacob Cross was a member of Relativity’s customer success team, where he helped Relativity users make the most of the platform.

The latest insights, trends, and spotlights — directly to your inbox.

The Relativity Blog covers the latest in legal tech and compliance, professional development topics, and spotlights on the many bright minds in our space. Subscribe today to learn something new, stay ahead of emerging tech, and up-level your career.

Interested in being one of our authors? Learn more about how to contribute to The Relativity Blog.