I. Executive summary
Empirical research shows that large generative AI models may memorize training data which may include or consist of copyright-protected works and other protected subject-matter, or parts thereof. When prompted appropriately, these models may produce outputs that closely resemble such works and other subject-matter. Under what conditions may the resemblance between such pre-existing works or subject-matter and an AI-generated output, which in technical terms is referred to as “plagiaristic output,” be regarded as an actionable reproduction? Who would be prima facie liable for the doing of such acts of reproduction: would it be solely the user of the AI model inputting the prompt resulting in the infringing output or could it be that the developer and provider are also to be deemed liable? If prima facie liability is established, who may benefit from exceptions and limitations (“exceptions”) under copyright/related rights, and under what conditions?
By also considering the requirement that a fair balance is struck between protection of copyright and related rights, freedom to conduct a business and innovate, and freedom of expression and information alike, this study starts by mapping relevant acts under copyright and related rights in the transition from input/training to output generation. It then considers actionable reproduction, allocation of liability, and potential defences to prima facie infringement and beneficiaries thereof under EU and UK law. Key findings include:
-
Input/training phase and TDM exceptions: Exceptions for text and data mining (“TDM”) under EU and UK laws allow, at certain conditions, the extraction and reproduction for TDM purposes, not subsequent restricted acts, e.g. reproduction and/or communication/making available to the public through output generation. Furthermore, Article 53(1)(c) and recital 106 of Regulation 2024/1689Footnote 1 (“AI Act”) indicate that TDM is not an end in itself, but rather a step in the development and offering of AI models. The EU AI Act requires providers of general-purpose AI models to put in place a policy to comply with EU copyright and related rights rules, including Article 4(3) of Directive 2019/790 (“DSM Directive”). Subject to transitional periods, such an obligation also applies to (i) new versions/releases of general-purpose AI models placed on the EU market prior to the entry into force of the AI Act and, by operation of Article 111(3) of the AI Act generally, to (ii) general-purpose AI models put in the market in the EU twelve months before the entry into force of the AI Act.
-
Infringement through output generation: The test for actionable reproduction differs between copyright and related rights because of the different rationale of protection. The taking of a sufficiently original part of a copyright-protected work is actionable, while the taking of any part of subject-matter protected by related rights or, alternatively, the taking of a part that reflects the investment of the relevant rightholder is actionable.
-
Liability: Liability for infringing outputs may extend beyond users of AI models to developers and providers of such models, based on factual considerations. This is supported by case law of UK courts and the Court of Justice of the European Union (“CJEU”). AI developers and providers could be held liable as secondary infringers, accessories/joint tortfeasors, or even as primary infringers. A finding of primary/direct liability may be also foreseeable having regard to CJEU case law on internet platform operators. This case law also suggests that the contractual limitation/exclusion of liability of, e.g. AI model providers in relation to infringing activities performed by users of their services may turn out to be ineffective vis-á-vis rightholders in some instances, with the result that liability could be found to subsist alongside users of such models.
-
Defences: For an unlicensed act to fall within the scope of application of a given exception, relevant conditions thereunder must be satisfied, including having regard to the three-step test and, insofar as the UK is concerned, the requirement that the dealing at hand is fair. The use and the volume of the use must be justified in light of its purpose and having regard to its effects on the market for the original work or protected subject-matter.
While it is clear that each case will need to be assessed on its own merits and facts and that no sweeping conclusions should be drawn, this study shows that the generative AI output phase raises several issues under copyright and related rights. If AI is to develop in a sustainable manner and in compliance with the aforementioned fair balance principle, then relevant issues related to this phase deserve more careful scrutiny, whether it is in the context of risk assessment and compliance, licensing initiatives, or in contentious scenarios.
II. Introduction
One of the best-known frames of Todd Phillips’s 2019 Joker film starring Joaquin Phoenix is that of Phoenix’s “Joker” inside a lift.Footnote 2 Let’s imagine a situation in which the user of a generative Artificial Intelligence (“AI”) model inputted the following prompt: “Create an image of Joaquin Phoenix Joker movie, 2019, screenshot from a movie, movie scene.” As discussed in detail in a recent New York Times articleFootnote 3 , the output could look like shown in Figure 1.
This situation is neither unique nor unprecedented.Footnote 4 If, by using another AI image generator tool, a request was made to provide “a video game plumber in the style of Mario” or “an image of Dua Lipa,” the results could be those shown in Figures 2 and 3.
In all the examples above, the AI-generated outputs undeniably resemble the appearance of characters “Joker,” as played by Phoenix, and Nintendo’s “Mario” from the Super Mario franchise, as well as the likeness of singer-songwriter Dua Lipa.
Empirical research suggests that generative AI models may sometimes memorize training data and, when appropriately prompted, reproduce such data verbatim or nearly verbatim.Footnote 5 While the first two examples above may be also indicative of what has been referred to in literature as a “Snoopy problem,”Footnote 6 “Italian plumber problem”Footnote 7 or “Pikachu Paradox,”Footnote 8 claimants in generative AI lawsuits in both the US and the UK have been submitting evidence, which the reader is referred to in order to appreciate technical aspects of generative AI,Footnote 9 of alleged verbatim/semi-verbatim regurgitation of input data by generative AI models beyond issues of character copyright.Footnote 10
Furthermore, the generation of an output that closely resembles an actual person or their personal attributes, e.g. voice, demonstrates how legal questions extend beyond copyright and related rights (as well as trade marks) into areas like personality and image/publicity rights.Footnote 11 Issues connected to deepfakes and AI clones have generated several headlines over the past few months alone.Footnote 12 Therefore, it is unlikely that a statement like “the circumstances in which the output is similar to any given input will be rare”Footnote 13 may be regarded as generally accurate.
All this raises a crucial question: Could the similarity seen in the examples above, often referred to as “plagiaristic outputs,” be regarded in legal terms as actionable reproduction under copyright and related rights? If so, who would be prima facie liable for the doing of such acts of reproduction: would it be solely the user of the AI model inputting the prompt resulting in the prima facie infringing output or could it be that the developer and provider are also to be deemed liable? If prima facie liability is established, who would be the beneficiary of exception, and under what conditions?
So far, the discussions surrounding AI and copyright/related rights have mainly focused on the training/input phase and issues such as the lawfulness of unlicensed TDM. Insofar as the output phase is concerned, the discourse has mostly revolved around protectability considerations. In turn, the analysis of liability aspects connected to AI-generated outputs has remained largely underdeveloped.Footnote 14
This study intends to fill this gap and explore pertinent aspects arising from AI-generated outputs that reproduce third-party protected content. The analysis is limited to copyright and related rights, although it is evident – as seen above – that relevant legal issues exceed them. Furthermore, the present work takes an international, EU and UK perspective. While other jurisdictions could have also been considered – notably the US, Singapore and Japan given their relevance to the AI development discourse – the analysis does not specifically encompass those legal systems given the substantial differences with EU/UK copyright laws, notably having regard to defences to copyright infringement. Insofar as the US is concerned, there are nevertheless references to fair use case law in what follows, as appropriate. As regards jurisdictions like Singapore and Japan, with regard to TDM-specific exceptions relevant to the input/training phase, the reader is referred to earlier work of the same author of the present contribution.Footnote 15
The preambles to the WIPO Internet Treaties emphasize the need to fairly balance, on the one hand, the rights of authors, performers and producers of phonograms and, on the other hand, the larger public interest, particularly education, research and access to information. Similarly, the fair balance principle is referred in multiple EU copyright instrumentsFootnote 16 and the Court of Justice of the European Union (“CJEU”) has employed it as a material standard of interpretation in its case law.Footnote 17 Building on this foundational requirement in copyright law, this study is structured as follows:
-
Part III considers copyright and related rights issues in AI development processes. Section 1 maps relevant acts in the transition from unlicensed TDM/AI training to output generation. It also critically discusses and rejects the argument advanced by some commentators that no expression would be copied in the input/training phase and, thus, no reproduction aspects would come into consideration in either that or the output generation phase. Section 2 analyzes Article 53(1)(c) of the recently adopted AI Act, read in combination with recital 106 in the preamble thereof. The provision mandates that general-purpose AI models trained outside of the EU must comply with EU law, including Article 4(3) of the DSM Directive, if made available for use there. Issues of temporal and territorial application of the provision are also tackled in this section.
-
Turning to the test for actionable reproduction, Part IV shows that a distinction must be drawn between the authorial right of reproduction (Section 1), which is premised on originality, and the right of reproduction of other rightholders (Section 2), which is instead rooted within an investment logic. While the latter has traditionally been viewed as having no threshold conditions, the former requires a qualitative assessment of what has been taken. This is also consistent with pre-Infopaq case law of UK courts,Footnote 18 as well as more recent judgments. In any event, it is clear that no reproduction is actionable without proof of derivation.
-
Part V discusses allocation of liability for infringing outputs. While any such determinations will depend on specific circumstances and contractual relationships, existing case law on platform liability in the EU and UK suggests that liability for the use of AI models may not always solely rest with the user but could instead extend to developers and providers of such models (Section 1). In turn, terms of service of AI model providers that seek to remove any liability potentially arising from the use of their models might turn out to be ineffective towards third-party rightholders or users of their services (Section 2).
-
Defences to prima facie infringement are subsequently reviewed in Part VI. CJEU case law indicates that exceptions must be construed so as to ensure a fair balance of rights and interests and compliance with the three-step test. Furthermore, exceptions would be likely available to users of AI models, rather than providers thereof (Section 1). In any event, a justification for the use made of a third-party work/protected subject-matter without authorization is required, as is the consideration of the amount taken and the effects on the market for the original work/protected subject-matter. This is exemplified by reference to quotation and pastiche (Section 2).
On a final, preliminary note, a couple of observations are warranted given the terminology adopted in the AI Act. First, the more general term “user” is used instead of “deployer,” given the definition of the latter in the AI Act (recital 13 and Article 3(4)), notably the circumstance that the notion of “deployer” there excludes users in the course of a personal and non-professional activity. Those are instead considered in the present contribution. Second, while the notion of “developer” encompasses that of “provider” in the AI Act (Article 3(3)), the two roles are not conflated here, as it is foreseen that it might not always be the case that providers of AI models have also developed them and vice versa.Footnote 19
III. From AI training to output generation
When the “data” consists of protected works and/or subject-matter, or parts thereof, training of generative AI models is premised on the doing of acts restricted by copyright and related rights. TDM activities, defined as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations” (Article 2 No 2 of the DSM Directive), are an essential step in any AI development project. Considering the legislative history of Articles 3 and 4 of the DSM Directive and the rapid developments in the field of generative AI since the adoption of that piece of legislation, it has been discussed whether these provisions were intended to cover activities related to the development of generative AI.Footnote 20 The AI Act now appears to have settled this issue for good by linking the DSM Directive’s TDM exceptions to the development of general purpose AI models, including generative AI models.Footnote 21 Importantly, the AI Act recognizes the relevance of TDM to AI training, but in no way does it indicate that TDM is synonymous with AI training or that everything in-between TDM and AI training is covered by Articles 3 or 4 of the DSM Directive.
The above said, in the context of generative AI, a distinction needs to be drawn between the input/training and the output phase, with latter entailing the generation of content – whether it is text, audio, image, or video – in response to instructions (prompts) given by users of resulting AI models. What follows is a review of relevant acts under copyright and related rights with regard to these two key phases. After that, a discussion is provided of Article 53(1)(c) of the AI Act, which requires compliance with Article 4 of the DSM Directive – including the need to identify and comply with the reservations of rights by rightholders under paragraph 3 – for providers looking to place general-purpose AI models on the EU market, “regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of those general-purpose AI models take place.”
1. Mapping of restricted acts under copyright and related rights
As discussed in greater detail elsewhere,Footnote 22 during the TDM activities the main acts under UK and EU law are those of extraction (e.g. if the data is incorporated in a database) and reproduction. Insofar as the latter is concerned, the claim that copyright would not restrict non-expressive uses of protected content appears erroneous.Footnote 23 Several considerations support this conclusion, with two in particular worth highlighting: the broad construction of the right of reproduction and the fact that specific exceptions to the right of reproduction to allow (at certain conditions) TDM have been adopted in multiple jurisdictions over the past several years.
Starting with the scope of the right of reproduction (whether of authors or other rightholders), this must receive a broad construction and interpretation, aligning with relevant statutes. At the international level, Article 9(1) of the Berne Convention mandates the protection of reproduction “in any manner or form.” The Agreed Statements to the WIPO Internet Treaties further clarify that the right “fully appl[ies] in the digital environment.”Footnote 24 Article 2 of the InfoSoc Directive, which was adopted to implement the WIPO Internet Treaties (recitals 15 and 61) into the EU legal order, states that the right encompasses “the direct or indirect, temporary or permanent reproduction by any means and in any form, in whole or in part” of works and other protected subject-matter.
Consistent with the requirement of a high level of protection (recital 9–11), CJEU case law also indicates that the concept of “reproduction” must be construed broadly. This is because (i) recital 21 links a broad understanding of “reproduction” to legal certainty, and (ii) the already mentioned wording of Article 2 uses expressions such as “direct or indirect,” “temporary or permanent,” “by any means” and “in any form.”Footnote 25 While, as it is discussed in Part IV, a distinction appears warranted as regards the test of actionable reproduction under copyright vis-à-vis related rights, in both cases the right of reproduction is engaged regardless of the end goal of the act in question.
Turning to TDM-specific exceptions adopted in some jurisdictions, if no copyright-relevant act was undertaken, then the provisions introduced by legislatures over the past several years – ranging from Japan (Article 30-4 of the Copyright Act) to Singapore (Section 244 of the Copyright Act 2021), from the UK (section 29A of the Copyright, Designs and Patents Act (“CDPA”)) to other EU Member States prior to the adoption of the DSM DirectiveFootnote 26 and the DSM Directive itself – would have been unnecessary, if not altogether misleading. The fact that legislatures have deemed such exceptions necessary indicates that TDM activities entail the doing of copyright-relevant acts.Footnote 27
The same conclusion applies in jurisdictions, like the US, where no specific exceptions for TDM exist. Even the decision of the 2nd Circuit in Google Books, which is often referenced in support of the view that unlicensed TDM could generally fall within the scope of application of the fair use doctrine under Section 107 of the US Copyright Act,Footnote 28 in considering the possibility to mine the Google Books corpus as favouring a finding of fair use, does not dispute that all this was premised on the prior scanning – that is: reproduction – by Google of third-party books and, thus, a prima facie infringing activity.Footnote 29 Recent case law, notably the US Supreme Court ruling in Andy Warhol Foundation, Footnote 30 confirms that the analysis of whether the unlicensed use of protected content is fair use also substantially depends on whether there is a market for licensing content for training data. In light of emerging licensing practices and models, it is thus doubtful that fair use shall be broadly applicable to AI training on protected content.Footnote 31
A further important point to note is that the scope of the TDM exceptions (including in the EU and the UK) is limited to specified restricted acts, as shown in Table 1. The subsequent undertaking of, e.g. acts of reproduction and communication/making available to the public, including but not necessarily limited to output generation, is thus not covered by either section 29A CDPA or Articles 3 and 4 of the DSM Directive. To exemplify: if an AI model lawfully trained on the repertoire of a certain Italian singer-songwriter under the Italian transposition of Article 4 of the DSM Directive (and the lack of appropriate reservation on the side of relevant rightholdersFootnote 32 ) allowed a user of that model to reproduce fully or partly a song of that artist, such an act of reproduction would not fall within the scope of (the Italian transposition of) Article 4 of the DSM Directive.
2. Compliance with EU law regardless of the jurisdiction in which the training is performed
Before considering actionable reproduction in the context of AI-generated outputs, it is worth examining the requirements under Article 53(1)(c) of the AI Act.Footnote 33 The provision mandates compliance with inter alia Articles 3 and 4 of the DSM Directive, irrespective of where the acts of extraction and reproduction for TDM purposes have taken place, if the resulting AI model is made available in the EU. Recital 106 further clarifies that this is necessary in order to create a level playing field among AI providers. The AI Act also imposes an obligation to publicly disclose a sufficiently detailed summary of the content used for training the general-purpose AI model (recital 107 and Article 53(1)(d)). Two aspects of Article 53(1)(c) warrant closer examination: its temporal application and territorial reach.
Starting with the temporal application of the provision: Article 2(8) of the AI Act provides that the Regulation will inter alia not apply to development activities “regarding AI systems or models prior to their being placed on the market or put into service.” Thus, for systems or models trained lawfully outside of the EU, there is no requirement to comply with Article 4 of the DSM Directive until (if ever) the resulting systems or models are placed on the EU market. A question then arises whether the obligations under Article 53 of the AI Act, including the one at (c), apply to AI systems and models (lawfully) trained outside the EU and made available in the EU before the entry into force of the AI Act. Although some commentators have ruled out that this could be the case based on the general EU principle of non-retroactivity,Footnote 34 it appears more appropriate to qualify the answer.
The principle of non-retroactivity is expression of legal certainty and the protection of legitimate expectations. That said, as also recently recalled by the Grand Chamber of the CJEU, “according to settled case-law, a new rule of law applies from the entry into force of the act introducing it, and, while it does not apply to legal situations that have arisen and become final under the old law, it does apply to their future effects, and to new legal situations.”Footnote 35 It follows that it would not be a breach of the principle of non-retroactivity to extend the application of rules, including those under Article 53(1) of the AI Act, to future effects of situations (such as the placing of an AI model on the EU market) that arose before the entry into force of the AI Act. To exemplify: if an AI model was updated/upgraded following its “first” entry on the EU market prior to the AI Act becoming enforceable, then the AI Act could apply to the effects resulting from such subsequent updates/upgrades. That is so because those situations could not be considered as having become definitive before that date. Such a conclusion is also in line with the Regulation itself. Articles 111(3) and 113(b) indicate that the obligation on general-purpose AI models to keep and disclose sufficiently detailed summaries of the content used for training will apply from 2 August 2025 for new models and from 2 August 2027 for models placed on the market or put into service before 2 August 2024.
Turning to the territorial reach of the provision: Article 53(1)(c) of the AI Act implies that it is irrelevant where the acts of extraction and reproduction at the basis of the TDM processes described above take place and whether such acts are to be deemed lawful under the laws of those jurisdictions. If the resulting, general-purpose AI model is made available in the EU, all the copyright-relevant steps leading to such a situation must comply with EU law. Conversely, if no AI system or model is placed on the EU market, relevant TDM processes undertaken outside of the EU shall not be affected by EU rules. While such an approach raises concerns from the perspective of general principles, including sovereignty and international comity, as well as potentially the principle of territoriality under copyright law,Footnote 36 it is not unprecedented, including having inter alia regard to international and EU copyright law.
At the international level, for example, Article 2 of the Phonograms Convention requires contracting parties to give rightholders the right to prohibit the importation of copies of recordings made “without the consent” of rightholders. The obligation applies even if the act of copying would not have required rightholders consent in the first place in the country of exportation. Turning to the Berne Convention, it is commonlyFootnote 37 understood that Article 5(2) therein adopts a lex loci protectionis (law of the place of protection) approach to the localization of copyright-relevant acts.Footnote 38 At the EU level, the determination of the law applicable to copyright infringements is made through the lex generalis found in Regulation 864/2007Footnote 39 (“Rome II”). Like the Berne Convention, Article 8(1) Rome II adopts a lex loci protectionis approach to determine the applicable law in cross-border infringements.Footnote 40
Over time, courts around the world – including in the EU and the UK – have considered themselves competent to rule, under local laws, in disputes concerning situations consisting of multiple infringing acts committed in different jurisdictions, that is: in decentralized situations. As discussed in greater detail elsewhere,Footnote 41 various criteria have been adopted to this end, including accessibility (the court seized is competent if it is located in a country from which the allegedly infringing content is accessibleFootnote 42 ) and targeting (courts located in the territory at which the allegedly infringing content is targeted have jurisdiction). The latter in particular has gained traction as a criterion to determine applicable law and jurisdiction in several legal systems, including in the UK and at the level of the CJEU.
Insofar as the UK is concerned, case law indicates that, where a certain act originates outside the UK territory but its effects are felt in the UK, then such act shall be deemed to be targeted at the UK territory, with the result that UK law is applicable.Footnote 43 In a copyright context, this approach was approved of in TuneIn, where the Court of Appeal of England and Wales upheld the decision at first instance that the operators an online platform would be infringing UK law when giving access to UK customers to radio stations that had not been licensed for the UK.Footnote 44 It was also confirmed and discussed from a broader intellectual property perspective in the recent judgment of the UK Supreme Court in Lifestyle Equities. Footnote 45
Turning to the CJEU, in the copyright field, that court has expressly adopted a targeting approach for the sui generis (database) rightFootnote 46 and the right of distribution.Footnote 47 The CJEU has also indicated that EU copyright rules apply to infringements whose effects are ultimately felt in the EU, irrespective of where the users’ acts of reproduction and the servers are located.Footnote 48 Along the same lines, the CJEU has indicated that the focus should be on the intended final result of the various acts, thus adopting a functional approach. For example, in Mircom, C-597/19, the particular technique used in peer to peer (“P2P”) file sharing activities, known as “seeding,” was considered. This consists of the downloading of a file through a P2P network and the simultaneous provision for uploading of parts (“pieces” or “seeds”) of it, which can be very fragmentary when compared to the whole. As a result, the pieces that are downloaded and re-uploaded by users are not of the copyright-protected works or other protected subject-matter themselves; instead, they are pieces of the files containing such works or subject-matter. Endorsing the analysis contained in Advocate General (“AG”) Szpunar’s Opinion and reasoning by analogy with how the World Wide Web functions,Footnote 49 the CJEU considered that the fact that the pieces are unusable per se is irrelevant. What is made available through the seeding technique – irrespective of where the seeds are downloaded and re-uploaded and, thus, where the various acts of reproduction and communication/making available to the public take place – is the file containing the protected content, with the result that users have ultimately access to the complete work or subject-matter. The technical means or process used to undertake an act of communication are irrelevant. If the users have consented to the automatic re-uploading of the pieces that they download, they act in full knowledge of the consequences of their actions and, as a result, perform acts restricted by Article 3 of the InfoSoc Directive. It is not required that users manually initiate the re-uploading: what matters is that they have consented to the use of the relevant software after receiving information regarding its characteristics.Footnote 50
In sum: consistent case law of the CJEU indicates that consideration of the ultimate objective sought is necessary to characterize a given situation from a legal standpoint.Footnote 51 It follows from all this that, if the acts of extraction and reproduction undertaken during TDM processes are functional – as they are – to the training of AI models, which, in turn, are made available for use in the EU, the application of EU law also to those acts appears justified having regard to the broader process of which they are part.
IV. Reproduction issues through output generation
Following the analysis above, the focus shall now shift to the output generation phase: “[i]f an AI output is identical or similar to parts of an original copyrighted work included in the training data, copyright issues may arise regarding whether the original work’s copyright was infringed, and if so, who is responsible.”Footnote 52 To address these issues the test for actionable reproduction must be therefore determined. If an AI model generated an output like those discussed in Part II – that is outputs similar (if not, highly similar) to the content used during the input/training phase – would such an output be prima facie infringing under EU or UK law? And would it matter if the training of the AI model did lawfully occur under the UK or EU TDM exceptions? While the answer to the first question might be in the affirmative, that to the second would be in the negative because, as stated, TDM exceptions only encompass the input/training phase (whether fully or only in part).
1. The right of reproduction under copyright
In accordance with international law, under EU and UK law, the right of reproduction encompasses a work in its entirety or part thereof. It applies to instances of literal and non-literal copying and, therefore, also to situations that, formally, might fall under the right of adaptation.Footnote 53 That said, copyright does not protect against independent creation. While this aspect has been indicated as having the potential to be problematic in the context of generative AI outputs,Footnote 54 recent studies suggest that methods can be devised to identify if specific content, such as text sequences, were included in certain training data.Footnote 55 Moreover, the EU AI Act transparency obligation is intended to mitigate against the evidentiary problems related to establishing access to and/or derivation from pre-existing works through the transparency obligation under Article 53(1)(d).
In Infopaq, C-5/08 the CJEU introduced an EU-wide test of actionable reproduction for authorial worksFootnote 56 : there is reproduction in part when the part of a work that has been copied is original in the sense that it is its author’s own intellectual creation.Footnote 57 This broad understanding – which extends to the copying of short extracts of a work,Footnote 58 as long as the choice, sequence and combination of elements is sufficiently originalFootnote 59 – is in line with the objective of the InfoSoc Directive to introduce a high level of protection of authors.Footnote 60 Therefore, the prima facie infringement test requires to determine, first, if the claimant’s work or part thereof is protected and, secondly, if the defendant took a protected work or part thereof. The appropriateness of such an approach has also been upheld by members of the UK judiciary writing extra-judicially,Footnote 61 as well as by UK courts.
To establish prima facie infringement under section 16 CDPA, a claimant needs to demonstrate that the defendant has done any of the restricted acts in relation to the work as a whole or any substantial part of it. Even before the CJEU judgment in Infopaq, C-5/08, the notion of “substantial taking” was not intended in a quantitative sense.Footnote 62 Put succinctly: “the quality relevant for the purposes of substantiality is the literary originality of that which has been copied.”Footnote 63 As of today, the UK test remains informed by CJEU case law: when considering whether a substantial part has been reproduced, it is necessary to focus on (a) what has been reproduced and (b) whether that part expresses the author’s own intellectual creation.Footnote 64 In Sheeran, Zacaroli J confirmed that “[t]o amount to an infringement […] the copying must be of either the original work or a “substantial part” of it […] This is a qualitative, not quantitative, question. The test is whether the part in question contains elements which are the expression of the intellectual creation of the author of the work […] The essential consideration is to ask whether a defendant has taken that which conferred originality on the claimant’s copyright work (or a substantial part of it).”Footnote 65 In Pasternak, Johnson J also approved of this approach and regarded the cited passage from Sheeran as summarizing the test of infringement in a useful and succinct fashion.Footnote 66
2. The right of reproduction under related rights
The notion of reproduction “in part” as adopted by the CJEU in Infopaq, C-5/08 in relation to works in Article 2(a) of the InfoSoc Directive does not extend to the other subject-matter listed in that provision. That is because the right of reproduction holders of related rights (performers in relation to fixations of their performances; phonogram producers in relation to their phonograms; producers of the first fixations of films in respect of the original and copies of their films; broadcasting organizations in respect of fixations of their broadcasts) protects “not intellectual creation but financial investment.”Footnote 67
The traditional and broadly accepted view is that related rights are not subject to any threshold condition.Footnote 68 In turn, any reproduction of protected subject-matter would be actionable.Footnote 69 That said, in Pelham I, C-476/17, the CJEU adopted an admittedly odd test to determine actionable reproduction of the phonogram producer’s related right vis-à-vis third-party freedom of artistic expression. The Court ruled that any reproduction of a phonogram is actionable insofar as it is recognizable to one’s ear (the ear of whom, however, is unclearFootnote 70 ) and the part reproduced reflects the investment made by the rightholder. This finding would be consistent with the objective of guaranteeing a high level of protection and safeguarding the specific objective of the exclusive right of the phonogram producer.Footnote 71
Despite the specific context of Pelham I, C-476/17, some commentators have suggested that the test of recognizability could be generally applicable as a limitation to the scope of the right of reproduction, at least for rightholders other than authors.Footnote 72 This suggestion is flawed and should be rejected, including by the CJEU when it decides the referral in Mio, C-580/23, concerning the authorial right of reproduction.Footnote 73 There are two key reasons for this conclusion.
The first is the inherent difficulties and shortcomings of a recognizability-based approach, including the uncertainty regarding the benchmark to use to this end: should the part be recognizable to, e.g. an average person, a specialized person/expert (e.g. a musicologist), or should it be machine-detectable? Depending on the choice made, the scope of protection would vary (very) significantly. The second is because, as seen, the main – though not necessarily only (think of performers’ rightsFootnote 74 ) – rationale of protection of related rights is one of investment. Even if one were to reject the view that related rights are not subject to any threshold condition, the need to safeguard investment as the benchmark adopted to determine if the reproduction at hand is actionable would be preferable to a broader applicability of the approach adopted by the CJEU in Pelham I, C-476/17. This way, there would be reproduction “in part” within the meaning of Article 2(b)-(e) of the InfoSoc Directive when what is reproduced without the permission of the relevant reproduction rightholder interferes with the opportunity, which the rightholder has, of realising meaningful returns on their investment.Footnote 75 This approach could be employed in non-InfoSoc situations tooFootnote 76 and would be also aligned with the qualified infringement test adopted by the CJEU in ‘CV-Online Latvia’, C-762/19 in relation to the EU sui generis database right.Footnote 77
V. Liability for infringing AI-generated outputs
If AI-generated outputs that reproduce third-party copyright works and/or protected subject-matter may be infringing, allocation of resulting liability needs to be considered next. Specifically: how is the traditional divide between primary/direct and secondary/indirect/accessory liability to be construed in the context of generative AI? If the terms of service of AI model providers exclude any liability resulting from uses thereof, are – and, if so, under what circumstances – such terms enforceable against users of the models and/or effective in relation to third-party rightholders whose rights may have been infringed?
Both aspects will be tackled in what follows. Insofar as the first issue is concerned, drawing on case law of the CJEU and UK courts, it will be shown how not only users of AI models, but also developers and providers thereof might be considered liable for the doing of acts restricted by copyright and/or related rights through output generation. As a result, turning to the second issue, at least in certain circumstances, limitations of liability by AI developers and providers vis-à-vis users could be deemed unenforceable against the latter, and ineffective vis-à-vis third parties whose rights have been infringed.
1. Liability for the provision of AI models that could be used to infringe third-party rights
It is clear by now that AI-generated outputs may raise questions of actionable reproduction under copyright and related rights. The user inputting the prompt resulting in prima facie infringing output (and subsequently using that output) might be regarded as the one directly undertaking restricted acts.Footnote 78 Nevertheless, any resulting liability could also encompass parties other than users. Insofar as AI model developers are concerned, as seen above in Part III, any potentially applicable TDM exception under EU and UK law would only cover the extraction and reproduction during the input/training phase, not other acts. As for providers of AI models, their liability could potentially be established not only on a secondary/indirect/accessory basis, but also on a primary/direct basis, consistent with case law in the UK and EU.
Under UK law, an AI model provider could be held secondarily liable for, e.g. possessing and/or dealing with infringing copy (created, for example, by the developer of the AI model at hand) in the course of business (section 23 CDPA), or for supplying articles specifically designed or adapted for making copies of a copyright work (section 24(1) CDPA). As in all cases of secondary infringement, actual or constructive knowledge would need in any event to be proved. Liability could be also established where the provider of an AI model was in principle eligible for the safe harbours under the Electronic Commerce Regulations 2002 and yet played an “active role” as per relevant case law.Footnote 79 The same would be true under EU law and the regime now found in the Digital Services Act.Footnote 80
Besides secondary liability, AI model providers could, in principle, also be held liable as accessories or even primary infringers. Accessory liability could arise through authorization (section 16(2) CDPA). Another possibility might be joint tortfeasance by means of procuring the commission of the tort, as it has been considered to be the case in a number of situations concerning providers of P2P software/services,Footnote 81 or through a common design between multiple subjects pursuant to which the tort was committed. Recently, the UK Supreme Court considered these as distinct bases for imposing accessory liability and suggested a narrower scope of application of liability based on the construction of the subjective element needed in this regard. To this end, it would be necessary that a person knew the essential facts which made the act done wrongful, even if the tort is one of strict liability. This applies whether the claim is based on procuring the infringement of a right or on participation in a common design.Footnote 82 Even from this perspective, it is arguable that – at least in certain situations – the provision of a prompt requesting the generation of an output that is substantially identical or highly similar to a pre-existing work (as in the examples given in the opening to Part II) could be considered such as to confer on the provider of the AI model, which is used to infringe, knowledge of the features of the tort committed by the user. Specifically in a copyright context, accessory and primary liability were reviewed and found to subsist in the already mentioned judgment of the Court of Appeal of England and Wales in TuneIn.Footnote 83
The primary/direct liability of internet platforms for communicating/making available to the public infringing content uploaded and shared by users of their services is also well-established at the EU level, having regard to both legislation (notably but not exclusively Article 17 of the DSM Directive) and case law, including in relation to Article 3 of the InfoSoc Directive (which the UK transposed through section 20 CDPA). In YouTube, C-682/18 and C-683/18, the CJEU upheld the finding in the 2017 judgment in Ziggo, C-610/15 that a platform operator may be directly liable for users’ infringing acts. To this end, however, several factual elements must be considered to determine the subsistence or not of the indispensable/essential role played by the platform operator and the deliberate nature of its intervention. Among other things, the CJEU gave weight to the consideration whether a platform would implement appropriate technological measures, as they would be expected from a reasonably diligent operator, to counter – credibly and effectively – the undertaking of infringing acts.Footnote 84
Translating all this to generative AI, and also considering the approach taken in the proposed AI Liability Directive,Footnote 85 to mitigate the risk of liability a diligent AI model provider would need to implement effective measures to prevent users from using certain expressions, names, words and phrases when inputting prompts. Obviously, if primary liability was established on the side of a platform operator or – for what concerns here – AI model provider under the InfoSoc Directive or EU trade mark instruments,Footnote 86 no immunity (e.g. hosting safe harbour) would be available. A somewhat similar approach would be required where the AI provider fell within the scope of application of Article 17 of the DSM Directive.Footnote 87 In such a situation, and where no authorization was provided by relevant rightholders, the conditions of Article 17(4) would need to be cumulatively satisfied for an AI provider to remove the risk of its own direct liability, including by implementing measures that “distinguish adequately between unlawful content and lawful content.”Footnote 88
2. Liability aspects via-á-vis contractual considerations
Having considered the above, another issue that arises is the enforceability (and effectiveness) of terms of service used by AI model providers to exclude their liability for infringements committed by users.Footnote 89 While the answer will obviously depend on the specific circumstances at hand, in YouTube, C-682/18 and C-683/18, the CJEU referred to the terms of service of a platform operator, and found that – even if the terms require (i) users to respect third-party rights and (ii) that the user holds all the necessary rights, agreements, consents and licences for the content that they upload – the existence of such terms alone could not be enough to exclude the operator’s own liability.
This is because the terms of service are merely one of the several criteria that need to be taken into account and balanced in order to determine if the platform operator in question is liable for the copyright-restricted acts. With specific regard to Cyando’s cyberlocker Uploaded, the CJEU noted that its general terms and conditions prohibited users from infringing third-party copyrights.Footnote 90 Yet, the Court suggested that the liability of the platform should be assessed considering all the relevant aspects of the platform’s functions and activities, thus materially broadening the assessment beyond the wording of the platform’s terms of service. Consequently, the referring court (German Federal Court of Justice) applied the guidance received from the CJEU and concluded that liability for unauthorized acts of communication to the public would exist under certain conditions for the operator of Uploaded.Footnote 91
The above thus suggests that even if the terms of service of an AI model provider seek to exclude the provider from liability for users’ infringing acts, such terms alone may not rule out the liability of the provider. Therefore, rightholders could potentially enforce their rights against both users and providers of AI models.
VI. Availability of exceptions
Having reviewed the elements of prima facie liability for outputs incorporating third-party protected content, this final part will consider the availability of exceptions. To start, besides recalling that under EU law it is not possible to exceed the catalogue of available exceptions under national law (e.g. by invoking directly EU Charter of Fundamental RightsFootnote 92 ), it is worth noting that all exceptions need to be construed in light of the three-step test. Furthermore, in the context of AI-generated outputs produced at the request of users, the beneficiaries of exceptions will likely be them rather than AI model developers or providers.
The analysis that follows will consider also the applicability of the quotation exception under Article 5(3)(d) of the InfoSoc Directive and the exception for pastiche under Article 5(3)(k) of the InfoSoc DirectiveFootnote 93 , as also implemented into UK law. The reason for this focus is that the former has been claimed to be applicable without the need to consider the purpose of the use of the protected content at issue,Footnote 94 while the latter has been invoked in some ongoing contentious proceedings concerning AI-generated outputs. It is furthermore at the centre of the pending CJEU referral in Pelham II, C-590/23.
1. Compliance with the three-step test and beneficiaries of exceptions
The requirement that a fair balance is struck between protection of copyright and related rights and rights and interests of third parties is a fundamental tenet of the copyright system. The CJEU frequently refers to this requirement, including when construing exceptions. This, in turn, has inter alia led to the characterization of the three-step test in Article 5(5) of the InfoSoc Directive as an instrument that also contributes to the fair balance between exclusive rights and exceptions.Footnote 95 As it is discussed in greater detail elsewhere,Footnote 96 Article 5(5) of the InfoSoc Directive mandates that the exceptions provided for in paragraphs 1 to 4 therein shall only apply in certain special cases, which do not conflict with a normal exploitation of the work or other subject-matter and do not unreasonably prejudice the legitimate interests of the rightholder.
According to Article 5 and to the exclusion of Article 5(1), EU Member States enjoy the discretion to decide whether to incorporate an exception from the list into their own laws. Such a freedom is nevertheless limited in two fundamental ways: first, it cannot be used to compromise the objectives of that legislation and the functioning of the EU internal market; second, a national exception must also comply with Article 5(5).Footnote 97 From a UK perspective it is noted that, while the CDPA does not refer to the three-step test, case law indicates that the requirements thereunder have been traditionally considered covered by the notion of “fair dealing.”Footnote 98 That said, more recent case law indicates that UK defences also need to be construed in light of the three-step test.Footnote 99
A further, key consideration is that the availability of exceptions is premised on the lawful access to the copyright-protected work or other subject-matter: if the source from which the user has derived a certain work or subject-matter is unlawful, no exception will be available to them. This conclusion is compliant with the requirements under the three-step test: holding otherwise would encourage the circulation of unlicensed works and inevitably reduce the volume of sales or of other lawful transactions relating to the protected works. This would be contrary to exceptions not being in conflict with a normal exploitation of the work and not unreasonably prejudicing the legitimate interests of rightholders.Footnote 100 Let’s take the exception for quotation: under both Article 5(3)(d) of the InfoSoc Directive and the UK corresponding provisionFootnote 101 (discussed below at §2), this requires that the copyright-protected work or other subject-matter being quoted “has already been lawfully made available to the public.” If a user generated an output – whether text, visual, or music – through an AI model and such an output reproduced third-party protected content used unlawfully during the model’s input/training phase, no quotation exception could be arguably invoked by that user, given that the work or subject-matter at issue would have been derived from an unlawful source.
It follows that all exceptions require fulfilling relevant requirements thereunder, as well as the three-step test. To exemplify: if a music historian generated music through an AI model “in the style of Oasis” (as in the Britpop “lads”) in order to review and discuss the latter’s technique and features in the context of their PhD dissertation, then a defence, e.g. quotation, might be potentially available to them if the AI-generated output reproduced protectable elements from Oasis’s repertoire. Conversely, the owner of a night club who generated “a song in the style of Oasis” to exploit commercially in their establishment without relevant licences would most likely be ineligible for the application of any exception if the AI-generated output reproduced protected elements of Oasis’s repertoire.
Assuming that an exception is potentially applicable, a further preliminary point to clarify is who is entitled to benefit from the exception: would it be solely the user of the AI model who generates the output, or would it be also the developer and/or provider of the model? In light of relevant case law and recent legislative developments, the answer is that, in principle, it is the former and not the latter who may benefit from the exception. It is the user generating a certain output through an AI model who enters into a dialogue with an earlier work or other protected subject-matter, not the developer or provider of the model. Similarly, having regard to the construction of the right of communication to the public under Article 3 of the InfoSoc Directive in the context of internet platforms, it is the users who upload content incorporating third-party copyright works and other protected subject-matter that could invoke the application of an exception, not the operator of the platform where such content is being uploaded and shared. In the same sense, Article 17 of the DSM Directive provides for mandatory exceptions and makes it clear that the beneficiaries thereof are not online content-sharing service providers (“OCSSPs”) but rather users of their services. Along similar lines, it could be also recalled that the action launched by the Republic of Poland against Article 17 of the DSM Directive was based on the freedom of expression and information of users, not OCCSPs.Footnote 102
2. “Quotation” and “pastiche” under EU and UK law
Exceptions under Article 5(3)(d) and (k) of the InfoSoc Directive and Article 17(7) of the DSM Directive are rooted within freedom of expression and information.Footnote 103 Insofar as the quotation exception is concerned, its applicability is subject to fulfilment of the following requirements: first, the quotation must relate to a work or other subject-matter which has already been lawfully made available to the public; second, unless this turns out to be impossible, the source (including the author’s name) must be indicated; third, the use at hand must be in accordance with fair practice and to the extent required by the specific purpose. In Pelham I, C-476/17, the CJEU considered that the concept of “quotation” must be understood with reference to its usual meaning in everyday language and be justified by the purpose “of illustrating an assertion, of defending an opinion or of allowing an intellectual comparison between that work and the assertions of that user.” In summary, there must be “the intention of entering into ‘dialogue’ with that work.”Footnote 104
The approach adopted in Pelham I, C-476/17 may appear more permissive than that in Spiegel Online, C-516/17, in which the Grand Chamber held that the user must “establish a direct and close link between the quoted work and [their] own reflections, thereby allowing for an intellectual comparison to be made with the work of another.” This would be so because Article 5(3)(d) of the InfoSoc Directive allows quotations insofar as they enable criticism or review, with all this suggesting that “use of the quoted work must be secondary in relation to the assertions of that user, since the quotation of a protected work cannot […] be so extensive as to conflict with a normal exploitation of the work or another subject matter or prejudices unreasonably the legitimate interests of the rightholder.”Footnote 105 Despite their seeming different nuances, the judgments can nevertheless be reconciled: they both suggest that a permissible quotation is one that is justified by its purpose.
The same understanding of quotation is also found under UK law: the provision in section 30(1ZA) CDPA was derived from Article 5(3)(d) of the InfoSoc Directive and framed within “fair dealing.”Footnote 106 By referring to “use of a quotation from that work,” section 30(1ZA) implies that the defence be only available to limited, partial reproductions of a work.Footnote 107 Recently, the High Court of England and Wales considered – and ruled out – the applicability of section 30(1ZA) in Pasternak due to lack of acknowledgment of the work being quoted by the defendant.Footnote 108 Given that the conditions under section 30(1ZA) – like those under Article 5(3)(d) – are cumulative, the quotation defence could not find application.Footnote 109
Turning to pastiche, it can first be noted that at the time of writing the notion has not yet been defined at the EU level by the CJEU. Nevertheless, the UK Intellectual Property Enterprise Court (“IPEC”) recently considered and rejected the application of the corresponding UK defence (section 30A CDPA) in Shazam, in a case concerning copyright protection of a fictional character and reproduction thereof through an interactive dining show. Having established that copyright would subsist in the work at hand, the analysis turned to whether section 30A CDPA could be invoked successfully given that the defendants had argued that their show would not be infringing because it qualified as parody or, alternatively, pastiche. Insofar as the latter is concerned, relevant guidance on section 30A released by the UK Intellectual Property Office explains that, further to the introduction of this defence in 2014, an artist would be able use small fragments from a range of films to compose a larger pastiche artwork.Footnote 110 This is consistent with the framing of the defence within fair dealing.
The IPEC noted that, in his Opinion in Pelham I, C-476/17, AG Szpunar held that: “[a]s for the concept of pastiche, it consists in the imitation of the style of a work or an author without necessarily taking any elements of that work.”Footnote 111 Agreeing with scholarly literatureFootnote 112 and thus substantially discarding the position of AG Cruz Villalón in Deckmyn, C-201/13Footnote 113 , Deputy High Court Judge John Kimbell KC considered that the everyday meaning of pastiche indicates that it is distinct from, and operates outside of, the genres of parody and caricature. Pastiche entails the imitation of the style of pre-existing works and the use or assemblage of pre-existing works in new works. It also needs to be noticeably different from the original work.Footnote 114
The approach taken by the IPEC appears to be overall correct. In any event, the difference between “parody,” “caricature” and “pastiche” refers to a different field of application of these concepts, not a broader scope of “pastiche” vis-à-vis “parody” or “caricature”Footnote 115 or even that “caricature” and “pastiche” are not also autonomous concepts of EU law.Footnote 116 “Parody,” “caricature” and “pastiche” are ways through which one’s own freedom of expression may be exercised by entering into a “dialogue” with an earlier work or protected subject-matter and/or the ideas conveyed therein. Considering the requirements of the three-step test, a national exception for pastiche still requires a justification for both the use of someone else’s work or other protected subject-matter and the amount thereof that has been reproduced, thus also entailing a consideration of the effects of the use on the market for the original work or protected subject-matter.
VII. Conclusion
Generative AI has been booming over the past several months, yet the legal uncertainties surrounding the lawfulness of model training and resulting applications, including output generation, has the potential to impact very substantially on the development, investment,Footnote 117 and use of resulting models. As the study has explained, the liability landscape is likely to become increasingly complex, risky, and touch upon multiple stages of the “life” of an AI model.
By focusing on liability issues surrounding AI-generated outputs, the analysis has mapped relevant acts under copyright and related rights, from an international and EU/UK perspective. It has then turned to actionable reproduction, allocation of liability, and potential defences to prima facie infringement, along with their beneficiaries. The main findings may be summarized as follows.
Starting with the input/training phase and TDM exceptions under EU and UK laws, these cover acts of extraction and reproduction for TDM purposes (where relevant conditions have been met), not subsequent restricted acts, including having regard to the input/training phase itself. Article 53(1)(c) of the AI Act, read alongside recital 106 in the preamble thereof, indicates that TDM activities are not – at least insofar as AI development is concerned – an end in themselves but are rather a step in and functional to the training of AI models and the subsequent development and offering of the models. The obligations under that article, including that to respect EU copyright rules and the one of transparency, also apply (subject to transitional provisions) to new releases of generative AI models placed on the EU market prior to the entry into force of the AI Act, as well as to all models placed on the market twelve months before the entry into force of the Act.
Turning to actionable reproduction, the test differs between copyright and related rights because of the different rationale of protection. The copying of a sufficiently original part of a copyright-protected work is actionable, whereas (at least) any copying of a part that reflects the investment of the relevant rightholder is covered by the reproduction rights of related rightholders. Despite the approach in Pelham I, C-476/17, recognizability cannot and should not be intended as a general benchmark under either regime.
Prima facie liability for infringing outputs may extend beyond users of generative AI models to their developers and providers, depending on the assessment of the relevant facts and circumstances. This is supported by case law of UK courts and the CJEU. Under UK law, AI developers and providers could be liable as secondary infringers, accessories/joint tortfeasors, or primary infringers. A finding of primary/direct liability is also foreseeable having regard to case law of the CJEU in relation to internet platform operators. Furthermore, contractual limitations or exclusions of liability for providers may be unenforceable.
Finally, for an unlicensed act to be covered by an exception, relevant requirements must be fulfilled, including having regard to the three-step test and/or, insofar as the UK is concerned, the notion of fair dealing. The unauthorized taking must be overall justified in light of the requirement that a “fair balance” is struck between different rights and interests: the purpose and amount of the taking, as well as effects on the market for the original work or protected subject-matter, need to be considered too.
In sum: while it is clear that each case will need to be decided on its own merits and that, therefore one should refrain from making sweeping conclusions, the discussion undertaken here shows how the generative AI output phase raises several questions of liability under copyright law. If the goal of policymakers and relevant stakeholders is to ensure the balanced and sustainable development of AI, including in the context of the seemingly revived proposal for an AI Liability Directive, then the issues related to the generation and dissemination of AI outputs need to be given ample attention and a greater role in the debate than what has been the case so far, whether it is in the context of risk assessment and compliance, licensing initiatives,Footnote 118 or in contentious scenarios.
Acknowledgments
The author wishes to thank the European Journal of Risk Regulation Editorial Team and the anonymous peer reviewer for their comments. All internet sources were last accessed on 8 October 2024.
Competing interests
This study was prepared at the request of the International Federation of the Phonographic Industry (IFPI), but all views and opinions expressed – as well as any errors – are solely attributable to the author.