The House IP Subcommittee’s “Artificial Intelligence and Intellectual Property: Part I—Interoperability of AI and Copyright Law” hearing has two former General Counsels of the US Copyright Office squaring off over whether using copyrighted works to train artificial intelligence (AI) models would qualify as a fair use of those works under US copyright law. Sy Damle, (2016-2018 General Counsel) testified that “the training of AI models will generally fall within the established bounds of fair use.” (S. Damle introductory statement, Tr. pg. 6). This testimony prompted Jon Baumgarten (1976-1979 General Counsel) to issue a letter to the Subcommittee contesting Damle’s assertion, emphatically stating that “I could not disagree more regarding Mr. Damle’s categorical treatment of fair use.” (emphasis original). Baumgarten asserted that “the question of fair use is subject to detailed analysis of various factors in each case,” (emphasis mine) and cited as support SCOTUS’s recent decision in Warhol Foundation v Goldsmith.
A detailed analysis of Goldsmith, however, yields little comfort to content owners when assessing the applicability of fair use to non-expressive uses of copyrightable content for use in AI learning models. See Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 598 S. Ct. 1258, (2023). Goldsmith involved the recasting by Andy Warhol, in or about 1984, of a photograph of the music artist Prince into an orange-hued silkscreen rendition (Orange Prince). In addition to Orange Prince, Warhol created an entire series of silk screens from the Goldsmith photograph back in 1984, and he sold several of the silk screens as stand-alone art objects (the Prince Series). Warhol created these silkscreens from a photograph of Prince taken by Lynn Goldsmith, who claimed copyright infringement when the Warhol estate licensed Orange Prince to Conde Nast after Prince’s passing in 2016 to illustrate an article about Prince’s life and music. At the time Goldsmith was also licensing her original photograph to several magazines that were also writing articles about Prince’s life and music.
The Warhol Estate reacted to Goldsmith’s assertion of copyright infringement by suing her in district court of New York for a declaratory judgment, asserting amongst other things that Warhol’s use of the original photograph was a transformative fair use. The district court agreed, but was reversed by the Second Circuit, which found the degree of new expression insufficient to justify a finding of fair use. The Warhol Estate appealed, but only on the first fair use factor. SCOTUS accordingly did not engage in a “detailed analysis of various factors,” but rather engaged in a very tightly limited scope of review— “Although the Court of Appeals analyzed each fair use factor, the only question before this Court is whether the court below correctly held that the first factor, ‘the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes,’ §107(1), weighs in Goldsmith’s favor.” Id. at 1272.
Goldsmith’s Doctrine of Granular Uses
Theoretically, this would have meant that SCOTUS would assess the Second Circuit’s formulation that in order to qualify as a transformative work (and not merely an infringing derivative work), “the secondary work’s use of its source material [must be in] service of a ‘fundamentally different and new’ artistic purpose and character, such that the secondary work stands apart from the ‘raw material’ used to create it.” Andy Warhol Foundation for Visual Arts, Inc. v. Goldsmith, 11 F.4th 26, 42 (2d Cir. 2021). What SCOTUS did instead was remarkable. It compared to a granular level of detail the challenged use by Warhol with the commercial uses by Goldsmith that Goldsmith was able to demonstrate, with concrete evidence:
- “A typical use of a celebrity photograph is to accompany stories about the celebrity, often in magazines. For example, Goldsmith licensed her photographs of Prince to illustrate stories about Prince in magazines such as Newsweek, Vanity Fair, and People.” See 143 S. Ct. at 1269.
- “In that context, the purpose of the image is substantially the same as that of Goldsmith’s photograph. Both are portraits of Prince used in magazines to illustrate stories about Prince. Such ‘environment[s]’ are not ‘distinct and different.’” Id. at 1278–79.
- “[T]he Court finds significant the degree of similarity between the specific purposes of the original work and the secondary use at issue. …. These variations in aesthetics did not stop the photos from serving the same essential purpose of depicting Prince in a magazine commemorating his life and career.” Id. at 1278 n.11.
- “AWF’s licensing of the Orange Prince image thus ‘supersede[d] the objects,’ i.e., shared the objectives, of Goldsmith’s photograph, even if the two were not perfect substitutes (emphasis added).” Id. at 1279.
- “The Foundation now owns Mr. Warhol’s image of Prince and it recently sought to license that image to a magazine looking for a depiction of Prince to accompany an article about Prince. Ms. Goldsmith seeks to license her copyrighted photograph to exactly these kinds of buyers. And because the purpose and character of the Foundation’s challenged use and the purpose and character of her own protected use overlap so completely, Ms. Goldsmith argues that the first statutory factor does not support a fair-use affirmative defense.” Id. at 1289 (Gorsuch, concurring) (emphasis added).
- “Instead, it requires courts to ask whether consumers treat a challenged use “as a market replacement” for a copyrighted work or a market complement that does not impair demand for the original.” Id. at 1290 (Gorsuch, J., concurring).
In doing this comparison the Court signaled that it was mostly focused on the possibility that the secondary work would supplant the demand for the original work— “[T]he first factor relates to the problem of substitution— copyright’s bête noire. The use of an original work to achieve a purpose that is the same as, or highly similar to, that of the original work is more likely to substitute for, or ‘supplan[t],’ the work.” Id. at 1274.
Moreover, while the hyperfocus on the actual use demonstrated by Goldsmith was surprising, even more so was just how granular SCOTUS’s use analysis is by virtue of the negative examples the majority and concurring opinions offered:
- “Only that last use, however, AWF’s commercial licensing of Orange Prince to Condé Nast, is alleged to be infringing. We limit our analysis accordingly. In particular, the Court expresses no opinion as to the creation, display, or sale of any of the original Prince Series works.” Id. at 1278.
- “Under both factors, the analysis here might be different if Orange Prince appeared in an art magazine alongside an article about Warhol.” Id. at 1279 n.12.
- “If, for example, the Foundation had sought to display Mr. Warhol’s image of Prince in a nonprofit museum or a for-profit book commenting on 20th-century art, the purpose and character of that use might well point to fair use. But those cases are not this case. Before us, Ms. Goldsmith challenges only the Foundation’s effort to use its portrait as a commercial substitute for her own protected photograph in sales to magazines looking for images of Prince to accompany articles about the musician.” Id. at 1291 (Gorsuch, J., concurring).
Thus, the concern is not over supplanting a theoretical demand for a theoretical work, or for a class of works. Rather, the concern is for specific, granular substitution for specific, granular works. Both the majority and concurring opinions, in their negative examples, raise vexing questions of exactly how granular the original demonstrated use needs to be. Would fair use be more likely if the Warhol Estate licensed Orange Prince in magazine articles about the general Minnesota music scene, or a pop history of the 1980’s, and Goldsmith had not? Or never tried?
What is implied, but pivotal, in Goldsmith then is the existence of EVIDENCE. Evidence that Goldsmith actually used the work in a particular context. Evidence of the scope of the market (magazines about the life of Prince, as opposed to a magazine article about the work of Andy Warhol). This is not the “potential” market of fourth fair use factor jurisprudence. Rather, Goldsmith appears to require actual evidence of the actual market under first factor analysis— evidence of the “objectives” of why the work was created in the first place, and to whom it was sold.
Reaffirmation of Google v. Oracle’s Context-Shifting Fair Use
Many folks expected the Goldsmith Court to revisit the rather broad, almost unfettered transformativeness analysis applied by SCOTUS in Google v. Oracle. There, the Court found that taking API code used in a desktop environment and adapting for use in smart phones was a transformative use. See Google LLC. v. Oracle Am., Inc., 141 S. Ct. 1183 (2021). As explained in Goldsmith:
[ ] Google put Sun’s code to use in the “distinct and different computing environment” of its own Android platform, a new system created for new products. Ibid. Moreover, the use was justified in that context because “shared interfaces are necessary for different programs to speak to each other” and because “reimplementation of interfaces is necessary if programmers are to be able to use their acquired skills.” 143 S. Ct. at 1277 n.8.
Finding that a fair transformation exists by taking a work (in Google, source code) and using it in a “new system created for new products” seemed to many to blow the doors off of the limits of fair use. It would seem, for example, that taking a novel and transforming it to the completely different environment of an audiovisual work would similarly qualify as a fair use under that rubric. Such a permissive view of transformativeness was surprising, and it was expected that Goldsmith would clarify Google if not overrule it.
The Goldsmith Court instead attempted to harmonize Google— “In other words, the same concepts of use and justification that the Court relied on in Google are the ones that it applies today.” Id. at 1283 n.18. The “context shifting” fair use cases then of Vanderhye, Google Books, and Hathi Trust therefore not only remain good law, but the precise perimeters delineating the context-shifting doctrine still appear quite vast. See generally A.V. ex rel. Vanderhye v. iParadigms, LLC,562 F.3d 630 (4th Cir. 2009); Authors Guild v. Google, Inc., 804 F.3d 202, (2d Cir. 2015); Authors Guild, Inc. v. HathiTrust 755 F.3d 87, (2d Cir. 2014).
Challenges for Content Owners in AI Training
No content owner will ever be able to demonstrate that it was their work, and their work alone, that enabled a usable AI model. Let’s use ChatGPT as an example. The current commercial offering of ChatGPT is based on the large language model GPT4. We do not know how many or what works were used to train GPT4. We do know, however, that GPT4’s predecessor, GPT2, required 8 million textual works to enable its predictive generative qualities. Under Goldsmith’s granular use doctrine, coupled with the continued vitality (and probable renewed vigor) of the context-shifting fair use cases, it will be virtually impossible for any one content owner to defeat a claim of transformativeness under factor one of the fair use analysis regarding its inclusion in a large language model (LLM) or foundation model such as GPT2 and GPT4. As Damle alluded to, the analysis might differ when one examines smaller scale models, such as fine-tuning a model off an LLM or foundation model, but everything will depend on the details in light of the evidentiary burden imposed by Goldsmith.
The Poor Fit of Collective Licensing
Baumgarten’s open letter suggests that the demands of advancing AI technologies can be handled by looking to the collective licensing products that arose to meet the demands of photocopying technology. However, there are at least two key distinctions between what collective licensing for reprography offered in the past, and what is needed to continue advances in AI technology today.
First, reprography-based collective licensing is premised on the assumption that individual works are desired by the licensee, and that the licensee will on a monthly or yearly basis select various works like meals from a menu. But AI models are not interested in any work qua work, but rather they need a massive collection of as many works as possible to get at the patterns lurking within the vast combination of works. That is, AI models need the menu, not the meals.
Second, the number of copies/downloads/users/seat licenses model simply does not work in the AI training context. As the works are not being accessed qua work as noted above, so too are they not kept in any fashion that would enable return of the work in a time-based subscription model. Rather, in the case of text at least, the AI models access the work and essentially immediately explode them into parts much too small for any useful human consumption, and then mix and store those packets into asynchronous chunks. These chunks cannot be untangled, and continuing to pay fees for the work going forward makes no sense, as the work qua work does not exist anymore in the model. Though the precise techniques deployed may differ for visual and audio works, the challenges of untangling specific works from a model similarly exist.
Moreover, the impediments to effective data science caused by limitations imposed by those actors that do purport to offer TDM licenses are well documented. See e.g., Michael W. Carroll, Copyright and the Progress of Science: Why Text and Data Mining Is Lawful, 53 UC Davis Law Review 893, 2019. (“Although some publishers have cooperated to enable some crosspublisher TDM research through the Crossref consortium, from the researcher’s perspective this solution is still only patchwork and technologically unnecessarily cumbersome.”).
This is not to say that entities that control large collections of copyrighted works may not offer valuable features that support a commercial solution. In addition to the legal liability and formatting conveniences identified by Baumgarten, those wishing to make meaningful corpora available could help solve privacy concerns such as ensuring biometric or other PII data is scrubbed from the subject content, or that content is cataloged and tracked in a fashion to comply with regulatory regimes, such as the currently proposed EU AI Law. However, it is not clear that such a regime should follow a licensing model, as opposed to say an access or service model, similar to the distinction between source code that is downloaded in on-premises installations, versus software that is merely accessed such as in software-as-a-services offerings.
The assessment of whether and how fair use applies to the use of content to train AI models is nuanced and conceptually complex. We need no greater demonstration of that than the differing opinions of two highly respected former USCO General Counsels. However, the Goldsmith opinion does not appear to bolster the position of content owners. Whether a mutually satisfactory arrangement between AI developers and content owners can be struck commercially is a work in progress.