AI in e-Discovery: Legal Ethics' Next Frontier

Editor's Note: The first in a two-part series, this article exemplifies a ton of collaboration between author Brittany Roush and members of Relativity's legal team, including Beth Kallet-Neuman and Mark Bussey.

Relativity declared 2022 as the “Year of AI” in reference to our internal efforts to research, integrate, and build significant AI advancements into our software, but little did we know it was a prescient prediction for the world at large. OpenAI’s ChatGPT launched in November 2022 and, in the wake of its release, it’s fair to say that the conversation around AI has gone from “interested” to “frenetic.”

Every industry is looking for ways to integrate generative AI into their workflows to support workers, automate mundane tasks, and reduce costs. The legal industry is no different; Legalweek, Relativity Fest London, and the CLOC Global Institute were all abuzz with conversations about how tools like GPT-4 can be used in e-discovery, investigations, and litigation.

It’s an exciting time to be working in AI, but at the risk of being a Buzzkill Brittany, it’s also a very scary time to be working in AI [insert terminator gif]. What happens right now will have long-term consequences for our society; that’s especially true in high-risk industries like ours, where the poor application of AI could impact litigation and investigations—and more specifically, the people involved in those matters.

The legal industry has several unique challenges when it comes to the use of AI. For the purposes of this article, we are discussing two specific types of AI: generative AI and classification models.

Generative AI: Generative artificial intelligence or generative AI is a type of artificial intelligence system capable of generating text, images, or other media in response to prompts. Generative AI models learn the patterns and structure of their input training data, and then generate new data that has similar characteristics. ChatGPT is built with generative AI.
Classification Models: A machine learning algorithm where a label is assigned for a given example of input data. Sentiment analysis is an example of a classification model.

Data Privacy and Confidentiality

The use of artificial intelligence is under heightened scrutiny, not least from a privacy standpoint. In early spring of this year, the Italian Data Protection Authority issued an emergency order prohibiting the use of ChatGPT over the following concerns:

There were no controls to stop people under the age of 13 from using ChatGPT
ChatGPT suffered a data breach where user conversations and payment information were exposed. As we know now, users are often inputting confidential and proprietary information into ChatGPT, creating a cascading risk of exposure in any data breach or security incident.
ChatGPT shared inaccurate information about people, calling into question the harm that could be done by the model if inaccurate content about a person was shared.
People were not told their data was collected for training purposes, which is a common issue amongst generative AI models and not specific to ChatGPT. This is a legal problem that must be tackled across jurisdictions as this technology is developed. Already, a lawsuit has been brought against OpenAI regarding their use of personal data.
GDPR requires that companies meet one of the six legal justifications for collecting personal data. OpenAI did not provide a legal justification for collecting personal information while scraping the internet for data to train GPT, and even now their privacy terms do not include any information on why that data was collected.

Whilst that emergency ban has since been lifted—which in and of itself is newsworthy, since it demonstrates that there is a way for technology companies to work with regulators to address such concerns appropriately—it remains a useful outline for some of the privacy risks presented when leveraging AI.

The problem of personal data in training stems from the fact that globally, and particularly in the US, data privacy laws have been lacking over the years, leading to a proliferation of data brokers—companies who buy, aggregate, disclose, and sell personal information. Data brokerage is a $300 billion+ industry. Data brokers gather information from public records and social media, along with information they purchase from third party apps, loyalty programs, subscription programs, online surveys, cell phone companies (location data), and internet service providers (browser cookies). All this data is compiled to create a profile of a person that’s then used for various purposes, including training AI models.

Some strides are being made in addressing the issues stemming from data brokerage. Massachusetts is considering banning the sale of cell phone location data, and the CCPA requires data brokers to register with the state and provide consumers with information on their business. Until more regulation is in place, there is a level of personal responsibility required by users of AI to understand the training data used in AI development, and what ethical concerns it may pose to use those models.

Intellectual Property Concerns

Not only is personal data at issue, but there are concerns that large language models may very well be using copywritten material as training data without the consent of the intellectual property owners. For example, there are several lawsuits against Midjourney, Stability AI, and DeviantArt for copyright infringement and violation of publicity rights. Github, Microsoft, and OpenAI are facing similar lawsuits for their GPT-based technologies.

Intellectual property concerns are a significant issue for attorneys who use these tools. They pose several important questions:

Does using AI where intellectual property rights have been violated constitute an ethical quandary for an attorney’s licensure?
Does it mean that the results of AI can change over time as data and material are removed due to copyright infringement?
And if so, what does that mean for defensibility of the AI over time?

Right now, there are no answers, as it hasn’t been determined whether the use of copyrighted material will fall under fair use exemptions. According to Foley & Lardner LLC, attorneys should exercise caution when using AI where copyright infringement is at stake: “Because it is uncertain whether generative AI companies will be allowed to use these models in the future if they are found liable for copyright infringement, there is concern about end users or companies that have produced output using such models and whether liability can extend to such end users or companies.”

On the flip side, users are self-perpetuating this issue by prompting chatbots to create materials based on confidential and proprietary information, such as privileged details of a legal matter or copyrighted materials owned by a company. If a user has not opted out of sharing their data with the company creating the generative AI, and the company has a policy to use that data to train future models, then that information can become part of the training data. This poses a number of questions that will need to be answered in the coming months and years, such as:

If generative AI is used in conjunction with intellectual property to create new materials, is the output of generative AI intellectual property itself? And if so, does the AI own part of the copyright?
What is the impact on the model’s responses when users share copyrighted, confidential, or privileged information? If an attorney is using ChatGPT to write their briefs and the model was actively learning and adjusting responses based on user inputs (ChatGPT’s model stopped “learning” as of 2021), could the opposing party glean information into their strategy with the right prompts?
What bias is generated in the model by sharing copyrighted, confidential, or privileged information? The concern is not only for bias against protected classes, but bias when it comes to providing accurate responses to prompts.
While people have a right to be forgotten, does that right extend to the materials provided by the person into the model?

ABA Model Rules of Professional Conduct

In the United States, the American Bar Association Model Rules of Professional Conduct (MRPC) serve as models for the ethics rules of most jurisdictions. These rules are not binding law, but they are a model for state regulators of the legal profession (e.g., bar associations) to adopt. The rules are flexible enough to allow state-specific adaptations and all 50 states and the District of Columbia have adopted ethics rules based on at least part of the MRPC.

In the context of the ABA Model Rules of Professional Conduct, using generative AI or other AI models, without thorough training and understanding of how the model produces results, and what pitfalls may exist with those results, can be deleterious to one’s license. U.S. attorneys should be paying close attention to Rule 1.6: Confidentiality of Information. As previously discussed, unless you are opting out, data fed to ChatGPT is stored on their systems indefinitely and users then lose control of their data. As attorney-client data is typically privileged and/or confidential information, using ChatGPT or other unsecured generative AI programs may lead to violating this rule where it applies.

There is also a component of Rule 1.1: Competency that underpins any discussion regarding attorney licensure in the United States, especially as it relates to data privacy obligations. “Not understanding” AI is not an excuse that attorneys can rely on, as demonstrated by a recent case involving a law firm representing a man suing Avianca Airlines. The law firm used ChatGPT to file a brief and ChatGPT hallucinated, referencing made-up judicial opinions and even made-up quotes. The attorney and the law firm were sanctioned under Rule 11(c)(4), and the following was noted in the Opinion and Order on Sanctions:

Many harms flow from the submission of fake opinions. The opposing party wastes time and money in exposing the deception. The Court’s time is taken from other important endeavors. The client may be deprived of arguments based on authentic judicial precedents. There is potential harm to the reputation of judges and courts whose names are falsely invoked as authors of the bogus opinions and to the reputation of a party attributed with fictional conduct. It promotes cynicism about the legal profession and the American judicial system. And a future litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity.

Attorneys have to have an understanding not just of how AI works, but of its potential pitfalls. While attorneys don’t need to become data scientists to use AI, they do need to have enough training to be considered competent in the use of AI, especially if they will be using it on a case.

It is likely that, as the use of AI becomes more democratized, courts will have far less patience for the misuse of AI and will respond with harsher sanctions. Additionally, attorneys need to be aware of how data is stored and protected, and how (or if) client data is used to improve vendor models so they do not risk exposing protected client data. Ultimately, attorneys must ensure they are representing their clients to the best of their abilities and meeting their professional standards of conduct when it comes to AI.

Come back next week for the second part of this two-part series, in which we'll talk through concerns around sentiment analysis and GDPR, the dangers of "black box" AI, and what the way forward looks like for legal practitioners.

Graphics for this article were created by Sarah Vachlon.

Harm, Less? Why Relativity built fit-for-purpose AI models to power sentiment analysis

Does causing harm require intentionality? Recklessness? Sentience? Explore the concept of algorithmic bias and how sentiment analysis in RelativityOne was engineered with responsible AI in mind.

READ THE EDITORIAL

Brittany Roush is a senior product manager at Relativity, working on features like search and visualizations. She has been at Relativity since 2021. Prior to Relativity, she spent over a decade conducting investigations, managing e-discovery projects, collecting data, and leading a data breach notification practice. She has a passion for building better tools for investigators and PMs to make their lives and cases easier (at least partly because her friends in the industry would hunt her down if she didn’t).