Anthropic agrees to pay $1.5B to settle lawsuit with book authors Hackernews Viewer

Anthropic agrees to pay $1.5B to settle lawsuit with book authors

989 points by acomjean 5 September 2025 | 737 comments

Comments

perihelions 5 September 2025

aeon_ai 5 September 2025

To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

GodelNumbering 5 September 2025

Settlement Terms (from the case pdf)

1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.

2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.

3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.

GMoromisato 5 September 2025

If you are an author here are a couple of relevant links:

You can search LibGen by author to see if your work is included. I believe this would make you a member of the class: https://www.theatlantic.com/technology/archive/2025/03/searc...

If you are a member of the class (or think you are) you can submit your contact information to the plaintiff's attorneys here: https://www.anthropiccopyrightsettlement.com/

Taek 5 September 2025

I can't help but feel like this is a huge win for Chinese AI. Western companies are going to be limited in the amount of data they can collect and train on, and Chinese (or any foreign AI) is going to have access to much more and much better data.

arjunchint 5 September 2025

Wait so they raised all that money just to give it to publishers?

Can only imagine the pitch, yes please give us billions of dollars. We are going to make a huge investment like paying of our lawsuits.

mNovak 6 September 2025

Everything talks about settlement to the 'authors'; is that meant to be shorthand for copyright holders? Because there are a lot of academic works in that library where the publisher holds exclusive copyright and the author holds nothing.

By extension, if the big publishers are getting $3000 per article, that could be a fairly significant windfall.

petralithic 5 September 2025

This is sad for open source AI, piracy for the purpose of model training should also be fair use because otherwise only the big companies who can afford to pay off publishers like Anthropic will be able to do so. There is no way to buy billions of books just for model training, it simply can't happen.

mbrochh 6 September 2025

After their recent change in tune to retain data for longer and to train on our data, I deleted my account.

Try to do that. There is no easy way to delete your account. You need to reach out to their support via email. Incredibly obnoxious dark pattern. I hate OpenAI, but everything with Anthropic also smells fishy.

We need more and better players. I hope that XAi will give them all some good competition, but I have my doubts.

MaxikCZ 5 September 2025

See kids? Its okay to steal if you steal more money than the fine costs.

on_meds 5 September 2025

It will be interesting to see how this impacts the lawsuits against OpenAI, Meta, and Microsoft. Will they quickly try to settle for billions as well?

It’s not precedent setting but surely it’ll have an impact.

gordian-mind 6 September 2025

After the book publishers burned Google Book's Library of Alexandria, they are now making it impossible to train a LLM unless you engage in the medieval process of manually buying paper-copies of work just to scan & destroy them...

r_lee 5 September 2025

One thing that comes to mind is...

Is there a way to make your content on the web "licensed" in a way where it is only free for human consumption?

I.e. effectively making the use of AI crawlers pirating, thus subject to the same kind of penalties here?

codedokode 6 September 2025

Is this legal: scan billions of pirated books, train a LLM on them and generate billion public domain books with it so that nobody ever needs copyrighted books anymore?

Also if there is a software library with annoying Stallman-style license, can one use LLM to generate a compatible library but in a public domain or with commercial license? So that nobody needs to respect software licenses anymore? Can we also generate a free Photoshop, Linux kernel and Windows this way?

mhh__ 5 September 2025

Maybe I would think differently if I was a book author but I can't help but think that this is ugly but actually quite good for humanity in some perverse sense. I will never, ever, read 99.9% of these books presumably but I will use claude.

scotty79 6 September 2025

That's the worst AI news I read ever.

Even might AI with billions must kneel to copyright industry. We are forever doomed. Human culture will never be free from the grasp of rent seeking.

novok 5 September 2025

I wonder who will be the first country to make an exception to copyright law for model training libraries to attract tax revenue like Ireland did for tech companies in the EU. Japan is part of the way there, but you couldn't do a common crawl type thing. You could even make it a library of congress type of setup.

dataflow 6 September 2025

How do legal penalties and settlements work internationally? Are entities in other countries somehow barred from filing similar suits with more penalties?

KTaffer 5 September 2025

This was a very tactical decision by Anthropic. They have just received Series F funding, and they can now afford to settle this lawsuit.

OpenAI and Google will follow soon now that the precedent has been set, and will likely pay more.

It will be a net win for Anthropic.

unvritt 6 September 2025

I think that one under-discussed effect for settlements like this is the additional tax on experimentation. The largest players can absorb a $1.5B hit or negotiate licensing at scale. Smaller labs and startups, which often drive breakthroughs, may not survive the compliance burden.

That could push the industry toward consolidation—fewer independent experiments, more centralized R&D inside big tech. I feel that, this might slow the pace of unexpected innovations and increase dependence on incumbents.

This def. raises the question: how do we balance fair compensation for creators with keeping the door open for innovation?

robterrell 5 September 2025

As a published author who had works in the training data, can I take my settlement payout in the form of Claude Code API credits?

TBH I'm just going to plow all that money back into Anthropic... might was well cut out the middleman.

Anon84 6 September 2025

So… when can I expect my cheque?

Seriously, how will this money propagate to the authors (if at all) or will it just stay with the publishers?

egypturnash 6 September 2025

Wooo, I sure could use $3k right now and I've got something in the pirate libraries they scraped. Nice.

sylware 6 September 2025

This is exactly what could imped LLM training dataset in the western world, which will mechanically lead to "richer" LLM training dataset in countries where some PI is not walling that data for training.

But then, the countries with the freedom to add everything to the training dataset will have to distribute for free the weights in PI walled countries (because they would be plain 'illegal' and will be "blocked" over there, unless free as in free beer I guess), basically only what deepseek could work.

If powerfull LLM hardware becomes somewhat affordable (look at nvidia omega push on LLM specific hardware), "local" companies may run at reasonable speed those 'foreign trained LLM models', but "here".

mihaaly 6 September 2025

It is a good opportunity to ask: is it true, that Anthropic can get indemnification from user actions that end up in the company being sued? User actions that are related to the use of Claude. Even just for the accusation. The user needs to cover their bills of lawyers and proceedings. Also they take control of the legal process, can do the way they please, settle or what, user footing the bill. Without limit. Be the user an individual or organization, doesn't matter.

Sounds harsh, if true. Making its use practical only for hobby projects basically where the results of Claude kept for yourself completely (be it information, product using Claude, or product is made by using Claude). Difficult to believe, I hope I heard it wrong.

eviks 6 September 2025

> the law allowed the company to train A.I. technologies using the books because this transformed them into something new.

Unless, of course, the transformation malfunctioned and you got the good old verbatim source, with many of examples compiled in similar lawsuits

1-6 5 September 2025

$3000 per work isn't a bad price. It seems insulting to the copy write holder.

amelius 5 September 2025

It's better to ask for forgiveness than for permission.

Taken right from the VC's handbook.

jbeard4 6 September 2025

"$3,000 per work" seems like an incredibly good deal to license a book.

EchoReflection 6 September 2025

I get the "Welcome to nginx!" error when I try to visit the archive.ph site, here is an archive.org version: https://web.archive.org/web/20250906042130/http://web.archiv...

pluc 6 September 2025

Illegal with a fine is legal with a fee.

nromiun 6 September 2025

What about OpenAI and Meta? Are they going to face similar lawsuits?

footlose_3815 6 September 2025

In related thought, when I listen to Suno, when I create "Epic Power Metal", the singer is very-often indistinguishible from the famous Hansi Kursch, of Blind Guardian.

https://en.wikipedia.org/wiki/Hansi_K%C3%BCrsch

I'm not sure if he even knows, but that is almost certainly his tracks they trained on.

WesolyKubeczek 6 September 2025

I'm wondering, if they could purchase all the books that had been in the pirate stash, in physical or DRM-free ebook form, could they have been out of trouble? Use the stash because it's already pre-digitized and accessible, give money to publishers.

It would take time, sure, to compile the lists and make bulk orders, but wouldn't it be cheaper in the end than the settlement?

pbd 6 September 2025

From a systems design perspective, $3,000 per book makes this approach completely unscalable compared to web scraping. It's like choosing between a O(n) and O(n²) algorithm - legally compliant data acquisition has fundamentally different scaling characteristics than the 'move fast and break things' approach most labs took initially.

action404x 6 September 2025

This settlement highlights the growing pains of the AI industry as it scales rapidly. While $1.5B is significant, it's a fraction of Anthropic's valuation and funding. It underscores the need for better governance in AI development to avoid future legal pitfalls. Interesting to see how this affects their competition with OpenAI.

surfingdino 6 September 2025

This will paid to rights holders, not authors. Published authors sign away the rights to financial exploitation of their books under the terms of contracts offered. I expect some authors suing publishers in turn. This has happened before when authors realised that they were not getting paid royalties on sales of ebooks.

markasoftware 5 September 2025

They also agreed to destroy the pirated books. I wonder how large of a portion of their training data comes from these shadow libraries, and if AI labs in countries that have made it clear they won't enforce anti-piracy laws against AI companies will get a substantial advantage by continuing to use shadow libraries.

swiftcoder 6 September 2025

So this is a straight-up victory for Anthropic, right?

They pay out (relative) chump change as a penalty for explicitly pirating a bunch of ebooks, and in return they get a ruling that they can train on copyrighted works forever, for the purchase price of the book (not the price that would be needed to secure the rights!)

mdp2021 5 September 2025

(Sorry, meta question: how do we insert in submissions that "'Also' <link> <link>..." below the title and above the comment input? The text field in the "submit" page creates a user's post when the "url" field is also filled. I am missing something.)

bigtones 6 September 2025

So the article notes Anthropic states they never publicly released a frontier model that was trained on the downloaded copyright material. So were Claude 2 and 3 only trained on legally purchased and scanned books, or do they now use a different training system that does not rely on books at all ?

wdb 6 September 2025

I do not believe authors will see any of this money. I will change my mind when I see an email or check.

qqbooks 5 September 2025

So if a startup wants to buy book PDFs legally to use for AI purposes, any suggestions on how to do that?

rvnx 6 September 2025

It is a very good deal for them, did not have to acquire books and had them in a very convenient format (no digitalization), saved tons of time (5+ years), got access to rare books and the LLM is not considered derived work, when it is actually clearly one

golly_ned 5 September 2025

> "The technology at issue was among the most transformative many of us will see in our lifetimes"

A judge making on a ruling based on his opinion of how transformative a technology will be doesn't inspire confidence. There's an equivocation on the word "transformative" here -- not just transformative in the fair use sense, but transformative as in world-changing, impactful, revolutionary. The latter shouldn't matter in a case like this.

> Companies and individuals who willfully infringe on copyright can face significantly higher damages — up to $150,000 per work

Settling for 2% is a steal.

> In June, the District Court issued a landmark ruling on A.I. development and copyright law, finding that Anthropic’s approach to training A.I. models constitutes fair use,” Aparna Sridhar, Anthropic’s deputy general counsel, said in a statement.

This is the highest-order bit, not the $1.5B in settlement. Anthropic's guilty of pirating.

rvz 5 September 2025

> A trial was scheduled to begin in December to determine how much Anthropic owed for the alleged piracy, with potential damages ranging into the hundreds of billions of dollars.

It has been admitted and Anthropic knew that this trial would totally bankrupt them had they said they were innocent and continued to fight the case.

But of course, there's too much money on the line, which means even though Anthropic settled (admitting guilt and profiting off of pirated books) they (Anthropic) knew there was no way they could win that case, and was not worth taking that risk.

> The pivotal fair-use question is still being debated in other AI copyright cases. Another San Francisco judge hearing a similar ongoing lawsuit against Meta ruled shortly after Alsup's decision that using copyrighted work without permission to train AI would be unlawful in "many circumstances."

The first of many.

whatever1 6 September 2025

I don’t understand how training an LLM on a book and then selling its contents via subscriptions is fine but using a probabilistic OCR to read a book and then selling its contents is a crime that deserves jail time.

rand17 6 September 2025

I wonder how many author will see real money out of this (if any). The techbros prayed to the new king of America with the best currency they had: money - so the king may intervene, like he did many times.

game_the0ry 6 September 2025

I feel like there could a business opportunity for authors here - selling their books to LLM companies. For the LLM companies, it could be cheaper than a lawsuit and the authors get paid.

__MatrixMan__ 6 September 2025

Any models trained on the ill-gotten data should now be public domain.

neurolesudiste 6 September 2025

Quid of the already neural-network already feed by those books ? In case the court choose to protect the writers they should be deleted and retrain without all of this materials removed.

JohnMakin 6 September 2025

You or I would go to jail.

TimByte 6 September 2025

It doesn't set precedent, but the message to other AI companies is clear: if you're going to bet your model on gray-area data, have a few billion handy for settlements

arendtio 6 September 2025

OT: Is anybody else seeing that Datadome is blocking their IP?

I haven't had this in a while, but I always hate it when I'm blocked by Cloudflare/Datadome/etc.

zkmon 6 September 2025

It's the concentration of power, monopolies driving this trend of ignoring the fines and punishments. The fine system was not designed for these monstrous beasts. Legal code was designed to deter the common man from wrong doing. It did not anticipate the technological super powers doing winner-take-it-all in a highly connected world, and growing beyond the control of law. Basically, it's law of jungle for these companies. Law and punishment is never going to have any effect on them, as long as they can grab enough market share and customer base. Same as any mafia.

We are entering a world which is filled with corporate mafia that is above law (due to insignificant damage it can cause). These mafia would grip the world providing the essential services that make up the future world. The State would become much weaker, as policy makers could be bought by lobbying, punishments can be offset by VC funding.

It is all part of the playbook.

thayne 7 September 2025

I hope this leads to the big AI companies pushing for copyright reform that makes access to DRM-free digital content better for everyone.

ChrisArchitect 5 September 2025

[dupe] https://news.ycombinator.com/item?id=45142558

rallies 6 September 2025

So that's 10% of their latest series?

ycombinatrix 6 September 2025

How did Meta get away without a scratch?

mooreds 5 September 2025

Anyone have a link to the class action? I published a book and would love to know if I'm in the class.

nottorp 5 September 2025

I thought 1.5 B is the penalty for one torrent, not for a couple million torrents.

At least if you're a regular citizen.

rise_before_sun 5 September 2025

Does anyone know which models were trained on the pirated books? I would like to avoid using those models.

Aeolun 6 September 2025

Why are they paying $3000 per book. Does anyone think these authors srll their books for that amount?

bikeshaving 5 September 2025

For legal observers, Judge William Haskell Alsup’s razor-sharp distinction between usage and acquisition is a landmark precedent: it secures fair use for transformative generative AI while preserving compensation for copyright holders. In a just world, this balance would elevate him to the highest court of the land, but we are far from a just world.

DiabloD3 6 September 2025

(Everyone say it with me)

Thats a weird way for Anthropic to announce they're going out of business.

groovetandon 6 September 2025

So if you buy the content legally and fine tune using it that's fair use?

motbus3 6 September 2025

This shouldn't be allowed to be settled outside courts

bhaktatejas922 5 September 2025

This weirdly seems like its the best mechanism to buy this much data.

Imagine going to 500k publishers to buy it individually. 3k per book is way cheaper. The copyright system is turning into a data marketplace in front of our eyes

antihero 5 September 2025

Isn't this basically what Spotify did originally?

iamsaitam 7 September 2025

Fair use just in legal terms, not ethical.

hirako2000 6 September 2025

I wrote a book, can I get my 1 dollar cheque?

nextworddev 5 September 2025

Wait, I’m a published author, where’s my check

bastard_op 6 September 2025

Here's some money, now piss off and let us get back to taking everyone else's.

Same racket the media cartels and patent trolls have been forcing for 40-50 years.

usefulcat 5 September 2025

Do they even have that much cash on hand?

atleastoptimal 6 September 2025

Reminder that just recently, Anthropic raised a $13 billion series F at a $183 billion post-money evaluation.

In March, they were worth $61.5 billion

In six months they've created $120 billion in value. That's almost 700 million dollars per day. Avoiding being slowed down by even a few days is worth a billion dollar payout when you are on this trajectory. This lawsuit, and any lawsuit AI model companies are likely to get, will be a rounding error at the end of the fiscal year.

They know that superintelligent AI is far larger than money, and even so, the money they'll make on the way there is hefty enough for copyright law to not be an issue.

nurettin 5 September 2025

I can see a price hike incoming.

ggm 5 September 2025

... in one economy and for specific authors and publishers. But the offence is global in impact on authors worldwide, and the consequences for other IPR laws remains to be seen.

flyingzucchini 6 September 2025

“Agrees” is a funny old word

fab13n 6 September 2025

Smart move: now that they're an established player, and that they have a few billions of investors' money to spend, they comfort a jurisprudence that stealing IP to train your models is a billion dollar offense.

What a formidable moat against newcomers, definitely worth the price!

lovelyDays61 6 September 2025

Why only Anthropic?

johntopia 6 September 2025

was the latest round for paying off fines?

Luker88 6 September 2025

5 days ago GamesNexus did a piece on Meta having the same problems, but resolving it differently:

https://www.youtube.com/watch?v=sdtBgB7iS8c

Somehow excuses like "we torrented it, but we configured low seeding" "temptation was too strong because there was money to be made" "we tried getting a licenses, but then ignored it" and more ludicrous excuses actually worked.

Internal meta emails seemed to point to people knowing the blatant breach of copyright, and yet Meta won the case.

I guess there are tiers of laws even between billionaire companies.

deafpolygon 5 September 2025

Honestly, this is a steal for Anthropic.

neilv 5 September 2025

$1.5B is a nothing but a handslap for the big gold rush companies.

It's less than 1% Anthropic's valuation -- a valuation utterly dependent on all the hoovering up of others' copyrighted works.

AFAICT, if this settlement signals that the typical AI foundation model company's massive-scale commercial theft doesn't result in judgments that wipe out a company (and its execs), then we have confirmation that is a free-for-all for all the other AI gold rush companies.

Then making deals to license rights, in sell-it-to-us-or-we'll-just-take-it-anyway deals, becomes only a routine and optional corporate cost reduction exercise, but not anything the execs will lose sleep over if it's inconvenient.

lxe 5 September 2025

A terrible precedent that guarantees China a win in the AI race

cendyne 5 September 2025

Now how about Meta and their questionable means of acquiring tons of content?

arduanika 6 September 2025

Let us not forget that this one is the good, ethical AI company. The one founded by splinter AI safety cultists who thought that OpenAI wasn't deep enough in the safety cult for their liking. And here they are, keeping the humans safe. By robbing them.

Because it turns out that nobody in the whole safety cult cares a whit for the human mind, the human experience, human art. Maybe for something they call "human values" in some abstract thought experiment, but never for any human decency. No, the human mind is just ones and zeros, just like a computer, no soul and no spark, to people in the cult. The cult thinks that an LLM reading a book is just the same mechanically as a human reading it.

Your brain is just emergence, your honor. Fair use. Blah blah Dennett Hofstadter Yudkowsky.

Do you feel safe?

jarjoura 5 September 2025

I'm excited for the moment where these models are able to treat using copyrighted work in a fair-use way that pays out to authors the way Spotify does when you listen to a song. Why? Because authors recieving royalties for their works when they get used in some prompt would likely encourage them to become far more accepting towards LLMs.

Also passing on the cost to consumers of generated content since companies now would need to pay royalties on the back-end should also likely increase the cost of generating slop and hopefully push back against that trend.

This shouldn't just be books, but all written content, like scholarly journals and essays, news articles and blogs, etc.

I realize this is just wishful thinking, but there's got to be some nugget of aspirational desire to pay it forward.

thinkingtoilet 5 September 2025

Great. Which rich person is going to jail for breaking the law?

unaut 5 September 2025

This settlement I guess could be a landmark moment. $1.5 billion is a staggering figure and I hope it sends a clear signal that AI companies can’t just treat creative work as free training data.

non_aligned 5 September 2025

I'm gonna say one thing. If you agree that something was unfairly taken from book authors, then the same thing was taken from people publishing on the web, and on a larger scale.

Book authors may see some settlement checks down the line. So might newspapers and other parties that can organize and throw enough $$$ at the problem. But I'll eat my hat if your average blogger ever sees a single cent.