French publishers have had enough. They’ve sued Meta for copyright infringement. They’re not the first, and they won’t be the last. But that’s not the real issue—the real issue is that AI companies have been using copyrighted content to train their models, and it’s business as usual.
Business as usual. It’s been more than two years since Getty sued Stable Diffusion, accusing it of stealing photos to train its image-generating AI. That lawsuit was the first in a long line of cases making the same claim. Yet, despite all this time, little progress has been made. It’s as if what Stable Diffusion—and other AI companies—did has been pushed to the court’s back burner.
Copy what? Suspicions about this practice have existed for years, even before ChatGPT’s public launch in November 2022. Months earlier, in June, DALL-E was accused of relying on copyrighted images from creators who received nothing in return.
In another case, Microsoft, OpenAI, and GitHub were sued just weeks before ChatGPT’s debut for training GitHub Copilot with code from developers who never gave permission. But in July 2024, a California judge dismissed the plaintiffs’ claims.
Few verdicts, few consequences. So far, recent rulings seem to favor AI companies. OpenAI, for example, won a lawsuit that challenged its practices. However, its victory might be short-lived as it faces another major case from The New York Times, which argues it has suffered demonstrable harm.
Fair use? The New York Times lawsuit against OpenAI, which began in January 2025, is one of the most important in this legal battle. Sam Altman’s company insists it’s making “fair use” of the content to train its models. But the contradiction is striking—while claiming fair use, OpenAI has also signed multimillion-dollar deals with Reddit and publishers to license content and avoid further lawsuits.
Meta’s case: a different level. AI companies go to extreme lengths to secure high-quality training data. But Meta’s case stands out. It was recently revealed that Meta used more than 80 terabytes of books downloaded via BitTorrent to train its AI model. Many of these books were copyrighted, sparking widespread criticism and a new lawsuit from French publishing groups.
No punishment. Despite this massive, ongoing infringement of intellectual property, there has been little accountability. No court has delivered a ruling that seriously penalizes these copyright violations. Instead, these infractions continue unchecked—largely ignored in favor of the advantages AI models provide.
Image | Emil Widlund (Unsplash) | Meta
Related | Meta Trained Llama Using Copyrighted Books. Mark Zuckerberg Knew It and Didn’t Care
View 0 comments