fbpx
OpenAI Sued by Authors for Using Their Books to Train ChatGPT Without Consent OpenAI Sued by Authors for Using Their Books to Train ChatGPT Without Consent
Two authors have filed a lawsuit against OpenAI, alleging that the company used their work to help train ChatGPT. At issue,... OpenAI Sued by Authors for Using Their Books to Train ChatGPT Without Consent

Two authors have filed a lawsuit against OpenAI, alleging that the company used their work to help train ChatGPT. At issue, is that their books were copyrighted and the authors are claiming that OpenAI didn’t receive consent to use their work to train the LLM.

Authors Paul Tremblay and Mona Award claim that ChatGPT can generate “very accurate summaries” of their works according to the suite. They go on to claim that these summaries were “only possible” if ChatGPT was trained on their works, which they view as a violation of copyright law.

Lawyers for both OpenAI and the authors who brought the suit did not respond to questions from CNBC. The method that LLMs such as ChatGPT train is by training on enormous amounts of text data that is typically gathered by crawling the internet. This of course includes sites like Wikipedia and archived books.

Filed in San Francisco, the lawsuit is alleging that “much” of the material in OpenAI’s training data is based on copyrighted materials. This of course includes books by the two authors. But there is a major problem with the case. And that is trying to prove exactly how and where ChatGPT gained the data to train.

So it could prove difficult to show evidence of damage without this information. So far, the complaint references exhibits of the summaries that ChatGPT was able to generate. It also notes that the LLM gets information wrong. But the two authors state that the majority of the summaries are accurate, which in their claim, “ChatGPT retains knowledge of particular works in the training dataset.

The complaint goes on to say, “At no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works.” This compliant is quite similar to that of artists. Last year, there was a way of pressure from artists who stated that AI tools such as Stable Diffusion, DALL-E 2, and others were trained on their artwork without consent.

It will be sometime before the lawsuit moves forward. And how it is decided in court could change how LLMs are trained in the future.

Editor’s Note: Are you ready to learn about the latest in generative AI? Join us for the one-day Generative AI summit. Go beyond the hype and dive deeper into this cutting-edge technology. Register now for free and unlock the power of generative AI.

ODSC Team

ODSC Team

ODSC gathers the attendees, presenters, and companies that are shaping the present and future of data science and AI. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in USA, Europe, and Asia.

1