Anthropic Trained Its AI Model Using Millions of Copyrighted Books. A Judge Just Ruled in Favor of the Company, but There’s One Big Caveat

  • This legal victory could set an important precedent for other AI companies facing similar lawsuits.

  • The judge upheld the fair use argument as valid.

  • However, Anthropic still faces a new trial regarding the creation of a library containing illegally downloaded books for which it didn’t pay.

Library
No comments Twitter Flipboard E-mail
javier-pastor

Javier Pastor

Senior Writer
  • Adapted by:

  • Alba Mora

javier-pastor

Javier Pastor

Senior Writer

Computer scientist turned tech journalist. I've written about almost everything related to technology, but I specialize in hardware, operating systems and cryptocurrencies. I like writing about tech so much that I do it both for Xataka and Incognitosis, my personal blog.

264 publications by Javier Pastor
alba-mora

Alba Mora

Writer

An established tech journalist, I entered the world of consumer tech by chance in 2018. In my writing and translating career, I've also covered a diverse range of topics, including entertainment, travel, science, and the economy.

1599 publications by Alba Mora

Anthropic has achieved a significant legal victory in the ongoing battle over copyright and intellectual property rights within the AI industry. This positive ruling for Anthropic could set an important precedent for other cases involving AI companies that have faced lawsuits for using copyrighted works to train their models. However, the win isn’t absolute.

Anthropic wins. In the lawsuit filed by three authors against Anthropic, the company was accused of downloading millions of copyrighted books without permission. The plaintiff also accused the company of purchasing some of these books to scan and digitize to train its AI models.

Senior district judge William Alsup ruled that “the training use was a fair use.” Companies developing AI models have often relied on the concept of fair use to justify their training practices, even when those practices involve copyrighted materials.

Fair use. This legal principle allows limited use of protected material without permission from the copyright owner. In copyright law, judges determine whether an activity qualifies as fair use by examining whether that use is “transformative.” This means whether something new is created from the original works. According to Alsup, “The technology at issue was among the most transformative many of us will see in our lifetimes.”

Important caveats. The judge acknowledged that the training process could be considered fair use. However, he also said that authors retain the right to sue Anthropic for copyright infringement.

The company argued that accessing “all these copies [was] at least reasonably necessary for training LLMs.” Alsup emphasized that, despite the purchases made, Anthropic built a substantial library of works without compensating their authors.

“Anthropic downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic should have paid for these pirated library copies. This order agrees.”

The Thomson-Reuters precedent. In early 2025, Thomson Reuters won a lawsuit against Ross Intelligence, an AI startup that originated in 2020. The lawsuit claimed that Ross Intelligence had reproduced material from Thomson Reuters’ legal research division, Westlaw. The judge rejected the defense’s arguments, ruling that the fair use argument couldn’t be applied in this case.

In contrast, the recent ruling against Anthropic supports the use of copyrighted material, provided that companies purchase the works they use to train their models. Notably, Anthropic had already achieved a small legal victory in a previous case against Universal Music.

Anthropic’s downloading practices. The trial revealed that Anthropic co-founder Ben Mann downloaded large datasets, such as Books3 and LibGen (Library Genesis), during the winter of 2021. These datasets consist of massive collections of books, many of which are protected by copyright.

Meta’s situation. Many companies developing AI models have trained their systems using a wide range of data sources, including copyrighted works. Meta downloaded 81.7 TB of copyrighted books via BitTorrent to train its AI models. As such, the company could encounter similar legal challenges to Anthropic.

Potential financial penalties. According to Wired, the minimum fine for copyright infringement in this context is $750 per book. Alsup indicated that Anthropic0s illegally downloaded library consists of at least seven million books, exposing the company to potentially enormous fines. No date has been set for the new trial yet.

The never-ending battle between AI and copyright. This is just one chapter in a long-running saga surrounding AI and copyright issues. Companies such as Google, OpenAI, and Perplexity have also been aggressive in training their models, often using public and private data from across the Internet.

Copyright infringement lawsuits are accumulating. Cases like Anthropic’s could set a troubling precedent for all companies if they don’t purchase the books they use to train their models.

Image | Iñaki del Olmo

Related | Everyone Wants to Use ChatGPT Again. The Problem: The Reason Why They Want to Use It Infringes on Copyright

Home o Index