In a copyright infringement case, a U.S. District Court granted summary judgment for AI software firm Anthropic PBC, holding that its use of the plaintiff authors’ books for training LLMs qualified as fair use.
Anthropic argued that it copied the plaintiffs’ books only for the purpose of training LLMs. The authors, however, contended that it did so for at least two uses: first to build a vast, central library of potentially useful content, and second to train specific LLMs using shifting sets and subsets of that content — over time selecting the more well-organised and well-expressed works for training.
Importantly, the authors did not allege that the LLMs generated infringing output. Instead, the Court noted that an additional software between the user and the underlying LLM was in place to ensure that no infringing output ever reached the users.
Under U.S. copyright law, fair use of a copyrighted work for purposes such as news reporting, research, etc. does not constitute infringement. In determining whether the use made of a work in any particular case is a fair use, these factors are to be considered:
- the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
- the nature of the copyrighted work;
- the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
- the effect of the use upon the potential market for or value of the copyrighted work.
The Court evaluated each of these factors as applied to the training copies and to the purchased and pirated library copies.
Key findings include:
- Use of books for training LLMs was fair use: The court found that using the books to train LLMs was exceedingly transformative and was a fair use. Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different. The use did not displace demand for copies of authors’ works, or not in the way that counts under the Copyright Act.
- Digitisation of books purchased in print form was also fair use: The digitisation of the books purchased in print form by Anthropic for its central library was also deemed fair use. The Court found this was merely a shift to a more convenient space-saving and searchable digital copies for its central library, without addition of new copies, creation of new works, or redistribution of existing copies.
- No entitlement to use pirated copies for central library: Anthropic had no entitlement to use pirated copies for its central library. Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy. The use of pirated copies also displaced market demand for authors’ books.
A trial is scheduled for December on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory.
This decision marks the first court ruling on the fair use defense in the context of generative AI. A similar ruling was issued shortly after by U.S. District Judge Vince Chhabria in favour of Meta, rejecting claims that Meta violated copyright law by using plaintiffs’ books to train its models. However, the scope of that ruling remains narrow — it is not a class action and applies only to the thirteen authors in the case. As the Court emphasised, this outcome does not establish the legality of Meta’s use of copyrighted materials to train its language models; it merely reflects that the plaintiffs in that case made the wrong arguments and failed to develop a record in support of the right one. In particular, the Court pointed to their failure to raise (in their complaint or summary judgment motion) or substantiate the issue of market dilution, especially in light of LLM training’s potential to flood the market with competing works.