U.S. Copyright Office’s AI-Training Report amid Political Turbulence

Last month, the U.S. Copyright Office released a study report on the fair-use implications of using copyrighted works without permission to train AI systems. As is widely known, the Register of Copyrights was then abruptly—and somewhat mysteriously—removed from office.

https://www.copyright.gov/ai/

Around an agency that normally attracts little public attention, an outsized political drama has erupted. Some news outlets even suggested the Office had shifted toward rejecting fair-use defenses for AI training. Personally, I do not find the report’s conclusions shocking; they strike me as a reasonable landing point for an eventual infringement analysis. In the United States many people seem to believe, perhaps naively, that once a use is deemed fair it remains fair forever. Given the vast sums now invested in AI, a report that questions blanket fair-use assumptions was bound to feel jarring to those who regarded the doctrine as an all-purpose shield.

The report’s through-line is clear: “training a generative AI foundation model on a large and diverse dataset will often be transformative” (p. 45). Because “transformative” use departs from the original expressive intent, it is less likely to harm the work’s market value and therefore weighs in favor of fair use. Yet the report also cautions that “where a model may generate expressive content or reproduce copyrighted expression, training-purpose use cannot be deemed ‘non-expressive’ ”. In other words, claims that AI training is nothing more than a technical process—or that it is simply “learning” like a human—do not automatically confer fair-use status. The Office’s position is that fair-use analysis for training must consider not just the technical steps but the purpose and the eventual model’s impact on the markets for the underlying works, including the ability to mimic an author’s style.

Although the study is impressively researched, I feel the Office overreaches when it assesses the market effects that AI models might have. This relates to the fourth fair-use factor: “the effect of the use upon the potential market for or value of the copyrighted work.” The Office signals that virtually any “effect” on a potential market should be counted. That includes the possibility that a model could generate material similar in content or style to works in the training data, thereby depressing sales, diluting the market, or undermining future licensing opportunities—even though copyright law traditionally does not protect “style.” Because the empirical record and case law are still thin, I believe the Office has drawn premature conclusions here.

On the first fair-use factor, the report leans heavily on lessons from Andy Warhol Foundation v. Goldsmith, making transformativeness the analytical core. Google Books is certainly important in showing that massive copying can be permissible when tied to a transformative purpose and when the final output matters, but Warhol probed more deeply into the nature of transformativeness itself and how it differs from the derivative-works right. That ruling therefore offers more direct guidance for evaluating AI training and underscores the need for a careful, fact-specific copyright inquiry going forward.

For the record, I drafted these comments only a few days after the report’s release but left them in my drafts folder when the unseemly political drama unfolded. I have now touched them up slightly and made them public as part of cleaning out that folder.

Open Source Guy

U.S. Copyright Office’s AI-Training Report amid Political Turbulence

Shuji Sado

Let’s connect

Join the fun!

Recent posts

Tracing Creative Commons Licenses Across AI: Training Data, Models, Outputs

The Hidden Risks of NVIDIA’s Open Model License

The Boundary of Copyrightability in AI-Generated Code: A Perspective from Japanese and U.S. Law

The Current State of the Theory that GPL Propagates to AI Models Trained on GPL Code

Reflections on the GEMA v. OpenAI Ruling (Munich I Regional Court)

Why Heavy Codes of Conduct are Unnecessary for most Open Source Projects

From Permission to Contract: Dual Enforcement and Rising Risk in Open Source Licensing

The Legal Hack: Why U.S. Law Sees Open Source as “Permission,” Not a Contract

Evaluating OpenMDW: A Revolution for Open AI, or a License to Openwash?

How Can Open Source Projects Accept AI-Generated Code? — Lessons from QEMU’s Ban Policy