The life of an e-discovery professional isn’t always an easy one. Beyond the difficulties we face with increasingly complicated data and evolving case law to meet the demands of new data sources, perhaps the most basic and painful challenge in our career is explaining what it is we even do.
To people outside the world of e-discovery, it’s not the easiest concept to grasp. But that’s all starting to change with the generative artificial intelligence gold rush and the public’s fascination with emerging tools like ChatGPT, Midjourney, and more.
A Whole New Generative AI World
Generative AI is a viral sensation and the hot takes tend to land at juxtaposed extremes: it’s going to take away our jobs (or humanity as a whole), or it’s going to make us 1,500 percent more productive in our lives with one single prompt.
Those of us who’ve worked with artificial intelligence vis-a-vis machine learning know that there’s no magic wand with analytics. At its most basic level, AI is a tool that we can use to make our lives easier. Given the vast scalability of generative AI, it can also be a tool that creates tremendous noise and chaos.
As the dust begins to settle around discussions regarding the use of AI, increased demand is being placed on the need for AI ethics. We’re seeing more focused scrutiny by governing bodies and the potential for AI regulation, such as the EU AI Act. Many of these discussions about ethics focus on addressing key, universal priorities: transparency, bias, privacy.
And from this environment, an opportunity emerges for the e-discovery professional. Not only has the general public become more aware of what AI is, but there’s a growing need for AI practitioners who understand the strengths and weaknesses of these tools. The development of AI ethics regulations is a given, and who is well-suited to transition to an AI ethics practitioner? e-Discovery professionals, of course!
To borrow from a superhero movie: e-Discoverists might not be the hero the world asked for, but we’re the hero the world needs right now.
e-Discovery Professionals as AI Ethics Practitioners
Not only have many of us in e-discovery been using AI tools like machine learning for five, even ten years, but we’ve also been using it in an adversarial system: litigation!
We use technology-assisted review (“TAR”) knowing that we have to establish a defensible process because the other side of the “v” is going to scrutinize our work product. That could mean a negotiated ESI protocol requiring disclosure of your process, or it could mean knowing you have to defend yourself in motion practice if the other side claims your use of analytics didn’t sufficiently lead to the production of all relevant documents.
- Data Quality: e-Discovery professionals understand the basic rule of data quality: garbage in, garbage out. If you don’t have a rich data set, or if you have data that is full of “noisy” content, your results will likely take longer to optimize. We also know that if your data set is supplemented, thereby inserting diversity into the training data, additional training of the model is needed to optimize the results.
- Iterative: We understand that using AI is an iterative process. We don’t kick off a TAR project thinking the initial relevancy score will be the last. We know that’s simply a starting point toward our ultimate end point of defensible accuracy. How many iterations of refining the model that process takes will vary from project to project.
- Validation: We know the AI outputs have to be validated, ensuring a certain threshold of accuracy has to be met to establish a defensible workflow. Not only do we have to take the time to review the results and ensure there’s accuracy, sometimes we also have to do math! Precision? Recall? We know her.
All of this means that e-discovery practitioners understand not just what AI is or what it can do, but how it can be most effectively applied to the work we do every day—and how to advise others on the same.
AI For Not Bad
Reid Blackman’s book, Ethical Machines: Your Concise Guide to Totally Unbiased, Transparent, and Respectful AI, level sets exactly what I mean when talking about AI ethics. Broadly, AI ethics falls into two categories: AI For Good and AI For Not Bad.
As you can likely guess, the regulatory variety of AI ethics falls in the AI For Not Bad category. In other words: what sort of requirements should be in place to make sure AI is being responsibly used and the rights of individuals are being protected? Or to be put even more simply, what risk mitigation needs to take place when using AI systems?
Within the world of AI For Not Bad, there are those three previously mentioned priorities: transparency, bias, and privacy.
- Transparency: Think of an AI output as one created by an entire ecosystem: What is the quality of the underlying data? How diverse is that data? What sort of subject matter experts were used to train the algorithm? Has the training been updated to reflect new requirements? Has the training considered underrepresented groups? How often are the outputs being validated? Most consumers use an algorithm and get a result but don’t have any insight into what led to the result they received. Depending on how the AI system is being used, people may want to know why or how the system came to their output.
- Bias: There are multiple types of biases that can be associated with AI. One is algorithmic bias, in that repeated outcomes of an AI system create inaccurate and unfair results to the detriment of a specific group or category. Sometimes that bias is caused by an insufficiently diverse data set; other times it’s caused by the people training the AI models injecting their own biases into the system. Another bias to be mindful of is automation bias, which is that human tendency to see a machine-created output as more accurate than a human-created output. These two biases, coupled with the vast scalability of generative AI, means there’s a legitimate need to focus on AI For Not Bad.
- Privacy: Who runs the AI world? Data, and lots of it. AI systems require very large volumes of training data (hence the term “large language model”) to robustly train those models. It shouldn’t be surprising that large volumes of data are likely to include data about consumers. It should be even less surprising that data sets with robust consumer data are likely to be more valuable to companies building AI systems.
AI Ethics + e-Discovery
If you, as an e-discovery professional, found some of the AI For Not Bad priorities as concepts you’re pretty darn familiar with, that’s the exact reason for this article! As long-time practitioners of AI systems, these concepts are firmly rooted in our operational brains.
In case the parallels weren’t obvious, consider the following conceptual pairings:
- Transparency + Early Case Assessment: Looking at the base set of data for noise that can be removed, or for potential gaps of information, is a common part of our workflow initiation. We want to know what sort of data we’re dealing with, and AI systems are the same.
- Transparency + Defensibility Binders: Whether you call them defensibility binders or decision logs, we track all of the key decision points when executing an AI workflow that might receive scrutiny from the opposing party. Doing so puts us in the position of being able to easily defend our process, if needed.
- Addressing Algorithmic Bias + Validation: What does an e-discovery professional love? QC workflows, which in essence are aimed at mitigating algorithmic bias (or errors) in a workflow. Not only do we check for errors by human reviewers, but also errors made by the algorithm. All errors are equal in our eyes, and thus all receive scrutiny to ensure defensible results, from elusion testing to reviewer QC.
- Addressing Automation Bias + Iteration: We are not one and done when using AI. We expect the first iteration to be the worst and expect the model to improve over time—until it hits a threshold we deem defensible.
- Privacy + Privacy: Regardless of the type of review being conducted, an e-discovery professional is aware of the potential of private personal information being present in a production set. How it is addressed from case to case may vary, but we understand the importance of and challenges with identifying that information.
Future So Bright
The myriad legal and regulatory issues surrounding the use of AI are unlikely to ease up anytime soon. There’s no doubt that people with a practical working knowledge of AI systems will be in demand. While they may not know they need e-discovery professionals, we can help them know we’re here as a resource.
Whether it's inserting ourselves into the dialogue of AI ethics, becoming certified to conduct AI audits, or advocating to our colleagues of the potential ways we can assist a broader community, the future is bright.
And the next time someone asks you what you do? Tell them you’re a long-time practitioner of artificial intelligence in litigation.
Graphics for this article were created by Natalie Andrews.