Sarah Silverman Sues ChatGPT's Creator Over Alleged Copyright Infringement

By Nadeem SarwarJuly 9, 2023 7:40 pm EST

Jason Kempin/Getty Images

OpenAI, the creator of ChatGPT, is facing a lawsuit from renowned comedienne and actress Sarah Silverman for allegedly using her book as training material without due credit or explicit permission, then benefitting financially from it. Silverman, Emmy Award winner and author of "The Bedwetter," is joined in her legal challenge against OpenAI by fellow authors Richard Kadrey and Christopher Golden.

The lawsuit argues that a healthy portion of the material used to train the LLMs behind ChatGPT is protected by copyright laws — and that since it was used without due credit and compensation, OpenAI should answer for it in a court of law. The lawsuit is similar to the one filed by Getty, which sued Stability AI for using the former's vast cache of stock images to train AI models without paying for it.

Silverman's legal challenge claims that "OpenAI relied on harvesting mass quantities of textual material from the public internet," including digital copies of Silverman's book, without explicit permission. Subsequently, it says the company also received financial benefits from its alleged copyright-violating deed, for which Silverman and her supporters are now seeking damages and restitution of profits.

The copyright allegations run deep

T. Schneider/Shutterstock

Labelling OpenAI's conduct as "unfair, immoral, unethical, oppressive, unscrupulous or injurious to consumers," the authors' legal representatives say OpenAI trained and reaped financial benefits from their "stolen" copyright-protected works without proper attribution. Another interesting aspect of the lawsuit claims that OpenAI — and Meta (in a different lawsuit) — not only violated the copyright privileges of the authors and the publishing houses but that it also relied on supposedly illegal sources for access to the books involved in AI training.

The lawsuit claims the only way OpenAI could have acquired the vast cache of books used to train its LLM was by accessing a so-called "shadow library," which refers to websites like Library Genesis Z-Library, Sci-Hub, and Bibliotik. Based on OpenAI's own past revelations, the lawsuit estimates that one of the two datasets of books used to train large language models like GPT may have contained as many as 294,000 titles, but notes that the company never disclosed the sources of these books.

The lawsuit claims that when prompted, ChatGPT was able to provide an accurate summary of Silverman's book, indicating it was included in the training data. "Plaintiffs never authorized OpenAI to make copies of their books," among other things, the lawsuit says, going on to claim that the authors "have been injured by OpenAI's acts of direct copyright infringement." This isn't the first lawsuit OpenAI has faced. The company was hit with legal action earlier this year over ChatGPT generating false information.

Sarah Silverman Sues ChatGPT's Creator Over Alleged Copyright Infringement

The copyright allegations run deep

Recommended