To be very clear on this point - this is not related to model training.
It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.
Buying used copies of books, scanning them, and training on it is fine.
1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.
2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.
3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.
I can't help but feel like this is a huge win for Chinese AI. Western companies are going to be limited in the amount of data they can collect and train on, and Chinese (or any foreign AI) is going to have access to much more and much better data.
Everything talks about settlement to the 'authors'; is that meant to be shorthand for copyright holders? Because there are a lot of academic works in that library where the publisher holds exclusive copyright and the author holds nothing.
By extension, if the big publishers are getting $3000 per article, that could be a fairly significant windfall.
After their recent change in tune to retain data for longer and to train on our data, I deleted my account.
Try to do that. There is no easy way to delete your account. You need to reach out to their support via email. Incredibly obnoxious dark pattern. I hate OpenAI, but everything with Anthropic also smells fishy.
We need more and better players. I hope that XAi will give them all some good competition, but I have my doubts.
This is sad for open source AI, piracy for the purpose of model training should also be fair use because otherwise only the big companies who can afford to pay off publishers like Anthropic will be able to do so. There is no way to buy billions of books just for model training, it simply can't happen.
After the book publishers burned Google Book's Library of Alexandria, they are now making it impossible to train a LLM unless you engage in the medieval process of manually buying paper-copies of work just to scan & destroy them...
Is this legal: scan billions of pirated books, train a LLM on them and generate billion public domain books with it so that nobody ever needs copyrighted books anymore?
Also if there is a software library with annoying Stallman-style license, can one use LLM to generate a compatible library but in a public domain or with commercial license? So that nobody needs to respect software licenses anymore? Can we also generate a free Photoshop, Linux kernel and Windows this way?
Maybe I would think differently if I was a book author but I can't help but think that this is ugly but actually quite good for humanity in some perverse sense. I will never, ever, read 99.9% of these books presumably but I will use claude.
I wonder who will be the first country to make an exception to copyright law for model training libraries to attract tax revenue like Ireland did for tech companies in the EU. Japan is part of the way there, but you couldn't do a common crawl type thing. You could even make it a library of congress type of setup.
How do legal penalties and settlements work internationally? Are entities in other countries somehow barred from filing similar suits with more penalties?
I think that one under-discussed effect for settlements like this is the additional tax on experimentation. The largest players can absorb a $1.5B hit or negotiate licensing at scale. Smaller labs and startups, which often drive breakthroughs, may not survive the compliance burden.
That could push the industry toward consolidation—fewer independent experiments, more centralized R&D inside big tech. I feel that, this might slow the pace of unexpected innovations and increase dependence on incumbents.
This def. raises the question: how do we balance fair compensation for creators with keeping the door open for innovation?
This is exactly what could imped LLM training dataset in the western world, which will mechanically lead to "richer" LLM training dataset in countries where some PI is not walling that data for training.
But then, the countries with the freedom to add everything to the training dataset will have to distribute for free the weights in PI walled countries (because they would be plain 'illegal' and will be "blocked" over there, unless free as in free beer I guess), basically only what deepseek could work.
If powerfull LLM hardware becomes somewhat affordable (look at nvidia omega push on LLM specific hardware), "local" companies may run at reasonable speed those 'foreign trained LLM models', but "here".
It is a good opportunity to ask: is it true, that Anthropic can get indemnification from user actions that end up in the company being sued? User actions that are related to the use of Claude. Even just for the accusation. The user needs to cover their bills of lawyers and proceedings. Also they take control of the legal process, can do the way they please, settle or what, user footing the bill. Without limit. Be the user an individual or organization, doesn't matter.
Sounds harsh, if true. Making its use practical only for hobby projects basically where the results of Claude kept for yourself completely (be it information, product using Claude, or product is made by using Claude). Difficult to believe, I hope I heard it wrong.
In related thought, when I listen to Suno, when I create "Epic Power Metal", the singer is very-often indistinguishible from the famous Hansi Kursch, of Blind Guardian.
I'm wondering, if they could purchase all the books that had been in the pirate stash, in physical or DRM-free ebook form, could they have been out of trouble? Use the stash because it's already pre-digitized and accessible, give money to publishers.
It would take time, sure, to compile the lists and make bulk orders, but wouldn't it be cheaper in the end than the settlement?
This settlement highlights the growing pains of the AI industry as it scales rapidly. While $1.5B is significant, it's a fraction of Anthropic's valuation and funding. It underscores the need for better governance in AI development to avoid future legal pitfalls. Interesting to see how this affects their competition with OpenAI.
From a systems design perspective, $3,000 per book makes this approach completely unscalable compared to web scraping. It's like choosing between a O(n) and O(n²) algorithm - legally compliant data acquisition has fundamentally different scaling characteristics than the 'move fast and break things' approach most labs took initially.
This will paid to rights holders, not authors. Published authors sign away the rights to financial exploitation of their books under the terms of contracts offered. I expect some authors suing publishers in turn. This has happened before when authors realised that they were not getting paid royalties on sales of ebooks.
So this is a straight-up victory for Anthropic, right?
They pay out (relative) chump change as a penalty for explicitly pirating a bunch of ebooks, and in return they get a ruling that they can train on copyrighted works forever, for the purchase price of the book (not the price that would be needed to secure the rights!)
They also agreed to destroy the pirated books. I wonder how large of a portion of their training data comes from these shadow libraries, and if AI labs in countries that have made it clear they won't enforce anti-piracy laws against AI companies will get a substantial advantage by continuing to use shadow libraries.
So the article notes Anthropic states they never publicly released a frontier model that was trained on the downloaded copyright material. So were Claude 2 and 3 only trained on legally purchased and scanned books, or do they now use a different training system that does not rely on books at all ?
(Sorry, meta question: how do we insert in submissions that "'Also' <link> <link>..." below the title and above the comment input? The text field in the "submit" page creates a user's post when the "url" field is also filled. I am missing something.)
It is a very good deal for them, did not have to acquire books and had them in a very convenient format (no digitalization), saved tons of time (5+ years), got access to rare books and the LLM is not considered derived work, when it is actually clearly one
> "The technology at issue was among the most transformative many of us will see in our lifetimes"
A judge making on a ruling based on his opinion of how transformative a technology will be doesn't inspire confidence. There's an equivocation on the word "transformative" here -- not just transformative in the fair use sense, but transformative as in world-changing, impactful, revolutionary. The latter shouldn't matter in a case like this.
> Companies and individuals who willfully infringe on copyright can face significantly higher damages — up to $150,000 per work
Settling for 2% is a steal.
> In June, the District Court issued a landmark ruling on A.I. development and copyright law, finding that Anthropic’s approach to training A.I. models constitutes fair use,” Aparna Sridhar, Anthropic’s deputy general counsel, said in a statement.
This is the highest-order bit, not the $1.5B in settlement. Anthropic's guilty of pirating.
I feel like there could a business opportunity for authors here - selling their books to LLM companies. For the LLM companies, it could be cheaper than a lawsuit and the authors get paid.
I don’t understand how training an LLM on a book and then selling its contents via subscriptions is fine but using a probabilistic OCR to read a book and then selling its contents is a crime that deserves jail time.
> A trial was scheduled to begin in December to determine how much Anthropic owed for the alleged piracy, with potential damages ranging into the hundreds of billions of dollars.
It has been admitted and Anthropic knew that this trial would totally bankrupt them had they said they were innocent and continued to fight the case.
But of course, there's too much money on the line, which means even though Anthropic settled (admitting guilt and profiting off of pirated books) they (Anthropic) knew there was no way they could win that case, and was not worth taking that risk.
> The pivotal fair-use question is still being debated in other AI copyright cases. Another San Francisco judge hearing a similar ongoing lawsuit against Meta ruled shortly after Alsup's decision that using copyrighted work without permission to train AI would be unlawful in "many circumstances."
I wonder how many author will see real money out of this (if any). The techbros prayed to the new king of America with the best currency they had: money - so the king may intervene, like he did many times.
Quid of the already neural-network already feed by those books ?
In case the court choose to protect the writers they should be deleted and retrain without all of this materials removed.
It doesn't set precedent, but the message to other AI companies is clear: if you're going to bet your model on gray-area data, have a few billion handy for settlements
It's the concentration of power, monopolies driving this trend of ignoring the fines and punishments. The fine system was not designed for these monstrous beasts. Legal code was designed to deter the common man from wrong doing. It did not anticipate the technological super powers doing winner-take-it-all in a highly connected world, and growing beyond the control of law. Basically, it's law of jungle for these companies. Law and punishment is never going to have any effect on them, as long as they can grab enough market share and customer base. Same as any mafia.
We are entering a world which is filled with corporate mafia that is above law (due to insignificant damage it can cause). These mafia would grip the world providing the essential services that make up the future world. The State would become much weaker, as policy makers could be bought by lobbying, punishments can be offset by VC funding.
For legal observers, Judge William Haskell Alsup’s razor-sharp distinction between usage and acquisition is a landmark precedent: it secures fair use for transformative generative AI while preserving compensation for copyright holders. In a just world, this balance would elevate him to the highest court of the land, but we are far from a just world.
This weirdly seems like its the best mechanism to buy this much data.
Imagine going to 500k publishers to buy it individually. 3k per book is way cheaper. The copyright system is turning into a data marketplace in front of our eyes
Reminder that just recently, Anthropic raised a $13 billion series F at a $183 billion post-money evaluation.
In March, they were worth $61.5 billion
In six months they've created $120 billion in value. That's almost 700 million dollars per day. Avoiding being slowed down by even a few days is worth a billion dollar payout when you are on this trajectory. This lawsuit, and any lawsuit AI model companies are likely to get, will be a rounding error at the end of the fiscal year.
They know that superintelligent AI is far larger than money, and even so, the money they'll make on the way there is hefty enough for copyright law to not be an issue.
... in one economy and for specific authors and publishers. But the offence is global in impact on authors worldwide, and the consequences for other IPR laws remains to be seen.
Smart move: now that they're an established player, and that they have a few billions of investors' money to spend, they comfort a jurisprudence that stealing IP to train your models is a billion dollar offense.
What a formidable moat against newcomers, definitely worth the price!
Somehow excuses like "we torrented it, but we configured low seeding" "temptation was too strong because there was money to be made" "we tried getting a licenses, but then ignored it" and more ludicrous excuses actually worked.
Internal meta emails seemed to point to people knowing the blatant breach of copyright, and yet Meta won the case.
I guess there are tiers of laws even between billionaire companies.
$1.5B is a nothing but a handslap for the big gold rush companies.
It's less than 1% Anthropic's valuation -- a valuation utterly dependent on all the hoovering up of others' copyrighted works.
AFAICT, if this settlement signals that the typical AI foundation model company's massive-scale commercial theft doesn't result in judgments that wipe out a company (and its execs), then we have confirmation that is a free-for-all for all the other AI gold rush companies.
Then making deals to license rights, in sell-it-to-us-or-we'll-just-take-it-anyway deals, becomes only a routine and optional corporate cost reduction exercise, but not anything the execs will lose sleep over if it's inconvenient.
Let us not forget that this one is the good, ethical AI company. The one founded by splinter AI safety cultists who thought that OpenAI wasn't deep enough in the safety cult for their liking. And here they are, keeping the humans safe. By robbing them.
Because it turns out that nobody in the whole safety cult cares a whit for the human mind, the human experience, human art. Maybe for something they call "human values" in some abstract thought experiment, but never for any human decency. No, the human mind is just ones and zeros, just like a computer, no soul and no spark, to people in the cult. The cult thinks that an LLM reading a book is just the same mechanically as a human reading it.
Your brain is just emergence, your honor. Fair use. Blah blah Dennett Hofstadter Yudkowsky.
I'm excited for the moment where these models are able to treat using copyrighted work in a fair-use way that pays out to authors the way Spotify does when you listen to a song. Why? Because authors recieving royalties for their works when they get used in some prompt would likely encourage them to become far more accepting towards LLMs.
Also passing on the cost to consumers of generated content since companies now would need to pay royalties on the back-end should also likely increase the cost of generating slop and hopefully push back against that trend.
This shouldn't just be books, but all written content, like scholarly journals and essays, news articles and blogs, etc.
I realize this is just wishful thinking, but there's got to be some nugget of aspirational desire to pay it forward.
This settlement I guess could be a landmark moment. $1.5 billion is a staggering figure and I hope it sends a clear signal that AI companies can’t just treat creative work as free training data.
I'm gonna say one thing. If you agree that something was unfairly taken from book authors, then the same thing was taken from people publishing on the web, and on a larger scale.
Book authors may see some settlement checks down the line. So might newspapers and other parties that can organize and throw enough $$$ at the problem. But I'll eat my hat if your average blogger ever sees a single cent.
Anthropic agrees to pay $1.5B to settle lawsuit with book authors
(nytimes.com)983 points by acomjean 5 September 2025 | 738 comments
Comments
It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.
Buying used copies of books, scanning them, and training on it is fine.
Rainbows End was prescient in many ways.
1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.
2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.
3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.
You can search LibGen by author to see if your work is included. I believe this would make you a member of the class: https://www.theatlantic.com/technology/archive/2025/03/searc...
If you are a member of the class (or think you are) you can submit your contact information to the plaintiff's attorneys here: https://www.anthropiccopyrightsettlement.com/
Can only imagine the pitch, yes please give us billions of dollars. We are going to make a huge investment like paying of our lawsuits.
By extension, if the big publishers are getting $3000 per article, that could be a fairly significant windfall.
Try to do that. There is no easy way to delete your account. You need to reach out to their support via email. Incredibly obnoxious dark pattern. I hate OpenAI, but everything with Anthropic also smells fishy.
We need more and better players. I hope that XAi will give them all some good competition, but I have my doubts.
It’s not precedent setting but surely it’ll have an impact.
Is there a way to make your content on the web "licensed" in a way where it is only free for human consumption?
I.e. effectively making the use of AI crawlers pirating, thus subject to the same kind of penalties here?
Also if there is a software library with annoying Stallman-style license, can one use LLM to generate a compatible library but in a public domain or with commercial license? So that nobody needs to respect software licenses anymore? Can we also generate a free Photoshop, Linux kernel and Windows this way?
Even might AI with billions must kneel to copyright industry. We are forever doomed. Human culture will never be free from the grasp of rent seeking.
That could push the industry toward consolidation—fewer independent experiments, more centralized R&D inside big tech. I feel that, this might slow the pace of unexpected innovations and increase dependence on incumbents.
This def. raises the question: how do we balance fair compensation for creators with keeping the door open for innovation?
OpenAI and Google will follow soon now that the precedent has been set, and will likely pay more.
It will be a net win for Anthropic.
TBH I'm just going to plow all that money back into Anthropic... might was well cut out the middleman.
Seriously, how will this money propagate to the authors (if at all) or will it just stay with the publishers?
But then, the countries with the freedom to add everything to the training dataset will have to distribute for free the weights in PI walled countries (because they would be plain 'illegal' and will be "blocked" over there, unless free as in free beer I guess), basically only what deepseek could work.
If powerfull LLM hardware becomes somewhat affordable (look at nvidia omega push on LLM specific hardware), "local" companies may run at reasonable speed those 'foreign trained LLM models', but "here".
Sounds harsh, if true. Making its use practical only for hobby projects basically where the results of Claude kept for yourself completely (be it information, product using Claude, or product is made by using Claude). Difficult to believe, I hope I heard it wrong.
Unless, of course, the transformation malfunctioned and you got the good old verbatim source, with many of examples compiled in similar lawsuits
Taken right from the VC's handbook.
https://en.wikipedia.org/wiki/Hansi_K%C3%BCrsch
I'm not sure if he even knows, but that is almost certainly his tracks they trained on.
It would take time, sure, to compile the lists and make bulk orders, but wouldn't it be cheaper in the end than the settlement?
They pay out (relative) chump change as a penalty for explicitly pirating a bunch of ebooks, and in return they get a ruling that they can train on copyrighted works forever, for the purchase price of the book (not the price that would be needed to secure the rights!)
A judge making on a ruling based on his opinion of how transformative a technology will be doesn't inspire confidence. There's an equivocation on the word "transformative" here -- not just transformative in the fair use sense, but transformative as in world-changing, impactful, revolutionary. The latter shouldn't matter in a case like this.
> Companies and individuals who willfully infringe on copyright can face significantly higher damages — up to $150,000 per work
Settling for 2% is a steal.
> In June, the District Court issued a landmark ruling on A.I. development and copyright law, finding that Anthropic’s approach to training A.I. models constitutes fair use,” Aparna Sridhar, Anthropic’s deputy general counsel, said in a statement.
This is the highest-order bit, not the $1.5B in settlement. Anthropic's guilty of pirating.
It has been admitted and Anthropic knew that this trial would totally bankrupt them had they said they were innocent and continued to fight the case.
But of course, there's too much money on the line, which means even though Anthropic settled (admitting guilt and profiting off of pirated books) they (Anthropic) knew there was no way they could win that case, and was not worth taking that risk.
> The pivotal fair-use question is still being debated in other AI copyright cases. Another San Francisco judge hearing a similar ongoing lawsuit against Meta ruled shortly after Alsup's decision that using copyrighted work without permission to train AI would be unlawful in "many circumstances."
The first of many.
I haven't had this in a while, but I always hate it when I'm blocked by Cloudflare/Datadome/etc.
We are entering a world which is filled with corporate mafia that is above law (due to insignificant damage it can cause). These mafia would grip the world providing the essential services that make up the future world. The State would become much weaker, as policy makers could be bought by lobbying, punishments can be offset by VC funding.
It is all part of the playbook.
At least if you're a regular citizen.
Thats a weird way for Anthropic to announce they're going out of business.
Imagine going to 500k publishers to buy it individually. 3k per book is way cheaper. The copyright system is turning into a data marketplace in front of our eyes
Same racket the media cartels and patent trolls have been forcing for 40-50 years.
In March, they were worth $61.5 billion
In six months they've created $120 billion in value. That's almost 700 million dollars per day. Avoiding being slowed down by even a few days is worth a billion dollar payout when you are on this trajectory. This lawsuit, and any lawsuit AI model companies are likely to get, will be a rounding error at the end of the fiscal year.
They know that superintelligent AI is far larger than money, and even so, the money they'll make on the way there is hefty enough for copyright law to not be an issue.
What a formidable moat against newcomers, definitely worth the price!
https://www.youtube.com/watch?v=sdtBgB7iS8c
Somehow excuses like "we torrented it, but we configured low seeding" "temptation was too strong because there was money to be made" "we tried getting a licenses, but then ignored it" and more ludicrous excuses actually worked.
Internal meta emails seemed to point to people knowing the blatant breach of copyright, and yet Meta won the case.
I guess there are tiers of laws even between billionaire companies.
It's less than 1% Anthropic's valuation -- a valuation utterly dependent on all the hoovering up of others' copyrighted works.
AFAICT, if this settlement signals that the typical AI foundation model company's massive-scale commercial theft doesn't result in judgments that wipe out a company (and its execs), then we have confirmation that is a free-for-all for all the other AI gold rush companies.
Then making deals to license rights, in sell-it-to-us-or-we'll-just-take-it-anyway deals, becomes only a routine and optional corporate cost reduction exercise, but not anything the execs will lose sleep over if it's inconvenient.
Because it turns out that nobody in the whole safety cult cares a whit for the human mind, the human experience, human art. Maybe for something they call "human values" in some abstract thought experiment, but never for any human decency. No, the human mind is just ones and zeros, just like a computer, no soul and no spark, to people in the cult. The cult thinks that an LLM reading a book is just the same mechanically as a human reading it.
Your brain is just emergence, your honor. Fair use. Blah blah Dennett Hofstadter Yudkowsky.
Do you feel safe?
Also passing on the cost to consumers of generated content since companies now would need to pay royalties on the back-end should also likely increase the cost of generating slop and hopefully push back against that trend.
This shouldn't just be books, but all written content, like scholarly journals and essays, news articles and blogs, etc.
I realize this is just wishful thinking, but there's got to be some nugget of aspirational desire to pay it forward.
Book authors may see some settlement checks down the line. So might newspapers and other parties that can organize and throw enough $$$ at the problem. But I'll eat my hat if your average blogger ever sees a single cent.