The future of AI and journalism at stake: OpenAI battles news giants in copyright lawsuit

The publishers are seeking the destruction of ChatGPT dataset in this high-stakes case

15 Jan 2025, 13:59 by Skye Jacobs · TechSpot

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

What just happened? A coalition of news organizations led by The New York Times faced off against OpenAI in federal court on Tuesday, continuing a legal battle that could shape the future of AI and journalism. The hearing, centered on OpenAI's motion to dismiss, marks a critical juncture in a high-stakes copyright infringement case that asks a fundamental question: Can AI companies use copyrighted news articles to train their language models without consent or compensation?

The case has merged lawsuits from three publishers: The New York Times, The New York Daily News, and the Center for Investigative Reporting. The publishers argue that OpenAI's practices amount to copyright infringement on a massive scale, potentially threatening the future of journalism.

The publishers' legal team contends that OpenAI and its financial backer, Microsoft, have profited from journalistic work that was scanned, processed, and recreated without proper authorization or payment. Jennifer Maisel, a lawyer for The New York Times, drew a parallel to criminal investigations, stating in court, "We have to follow the data."

Ian Crosby, another attorney representing the Times, emphasized the substitutional nature of ChatGPT and Microsoft's Bing search engine, arguing that these AI-powered tools have become alternatives to the publishers' original work for some users. This point is crucial in establishing copyright infringement.

OpenAI's defense rests on the doctrine of fair use, a principle in US law that allows copyrighted material to be used for purposes such as education, research, or commentary. Joseph Gratz, representing OpenAI, argued that the company's AI models are not designed to regurgitate entire articles but rather to recognize patterns in data.

The hearing delved into the technical aspects of large language models, with OpenAI and Microsoft's legal team explaining to Judge Sidney Stein how ChatGPT processes and analyzes data. They described a system that breaks down text into "tokens" and learns to recognize patterns rather than simply retrieving and reproducing copyrighted content.

// Related Stories

However, the publishers raised concerns about a feature called "retrieval augmented generation," which allows ChatGPT to incorporate up-to-date information from the web into its responses. Steven Lieberman, attorney for The New York Daily News, characterized this as "free riding," suggesting that readers might turn to AI-generated content instead of visiting publishers' websites.

The stakes in this case are extraordinarily high. The New York Times is seeking billions of dollars in damages and calling for the destruction of ChatGPT's dataset. Such an outcome could be catastrophic for OpenAI, potentially forcing the company to rebuild its AI models using only authorized works. "If you're copying millions of works, you can see how that becomes a number that becomes potentially fatal for a company," Daniel Gervais, co-director of the intellectual property program at Vanderbilt University, told NPR.

The tech and publishing worlds now await Judge Stein's decision on whether to dismiss the case or allow it to proceed to trial.