My Brilliant Friend: Generative Artificial Intelligence and its Evolving Impacts on Copyright Law

The Federal Government has recently released its Interim Response as part of the Safe and Responsible AI in Australia Consultation. The response indicates that, like those of the EU and US, the Australian Government has its focus on ‘high-risk’ applications such as law enforcement, biometric identification and emotion recognition.[1] However, it is also concerned about copyright issues, particularly the training of Generative AI on copyrighted material, and is considering updating existing laws to provide for remedies in cases of copyright infringement through AI.[2]

While the Interim Response does not yet affect the law in this area,[3] other jurisdictions are moving swiftly ahead with copyright litigation arising out of the use of AI. Perhaps most notably, the US Authors Guild has commenced a class action (including George Saunders, Jonathan Franzen and George R.R Martin) against ChatGPT developer OpenAI, as have authors Paul Tremblay and Mona Awad.[4] These cases are testing whether existing laws are apt to remedy copyright infringement caused by the training of AI as well as outputs created using AI.

In comparison with the US and EU, there have been no legal actions so far in Australia commenced against users or developers of AI for copyright infringement. But they are not hard to picture. Former head of the ACCC Rod Sims has suggested that ChatGPT and Google’s Bard are likely using copyrighted Australian news publications as part of their training data, and should have to pay for access to that content.[5]

In this piece we consider this fast-moving issue and flag some of the questions raised by Generative AI for Australian Copyright Law, including the potential for Generative AI to breach copyright, and questions about fair dealing, authorisation, authorship, evidence, and moral rights. In concluding, we briefly consider suggestions to strike the balance between copyright holders and users and developers of AI.

How Generative AI works (a precis):

Generative AI can be contrasted with traditional AI in that traditional AI needs to be ‘supervised’, that is, programmed to compile existing data rather than to create new information.[6] For example, the humble Google search engine, proficient in speedily sorting through masses of data otherwise unmanageable by humans, compiles and displays existing data that matches a search term by the user; it does not create new information.

By contrast Generative AI, taking similarly huge volumes of data, does not simply display or compile existing data but synthesises it to create content that is convincingly ‘new’.[7] Generative AI therefore mimics human creativity in that its outputs are free-form, despite being prompted by a human.

Take the often-mentioned example of OpenAI’s text-based model ChatGPT: ChatGPT uses the masses of data it has been trained on to write human-like, conversational responses to a user’s prompt.[8] While users who ask similar questions may receive the same response (e.g. ‘what colour is the sky’), ChatGPT is generally able to produce unique responses to each user’s prompts and questions.[9] In addition to ChatGPT, OpenAI have created DALL-E, which uses its large collection of images to produce new images, as well as Jukebox, which generates music using a dataset of 1.2 million songs.

Generative AI and copyright infringement:

Much of the restlessness around Generative AI is that it can create material that could compete with human-made literature, music, visual arts, film, code, games, and so on – works which are often subject to copyright. The creative outputs of Generative AI may infringe copyrighted works if they substantially reproduce existing copyrighted works; but it is also possible that training a Generative AI can infringe copyright if done using copyrighted works as part of the training data.[10]

Matulionyte argues that in Australian law copyright infringement may occur at four stages:[11]

When a data set is prepared for training an AI; the data set constitutes a lasting copy of that data which will be an infringement if the data includes work subject to copyright used without authorisation.
When an AI is trained on the data; in this process of ‘ingesting’ the data, the AI may create temporary copies of the training material which will not be permitted under copyright law unless the initial copy of those works was lawful.
When a user of AI uses copyrighted material as part of their prompt: for example, copying a short story into the AI so that the AI can produce a summary of it.
When an AI model’s output copies an existing work, meaning it contains a substantial part of the existing work.

On the opposite side, Samuelson has questioned whether Generative AI ‘copies’ for the purpose of copyright infringement at all:

Training a model begins by tokenizing the contents of works ingested as training data into component elements. The model uses these tokens to discern statistical correlations – often at staggeringly large scales- among features of the content on which the model is being trained.[12]

Indeed, if a Generative AI model keeps only a ‘component element’ of the training data, there is a question about whether that component part counts as a substantial copy of an original work. However, in some cases, Generative AI has been shown to reproduce its training data wholesale, and in such cases it would clearly have copied the material for the purposes of training, and its outputs would also likely constitute a copy of the (potentially copyrighted) training data.[13] In one example, when ChatGPT was prompted to provide computer code for a certain function, it provided incomplete and incorrect code because it had copied a practice exercise from a textbook in which the reader was invited to ‘fill in the blank’.[14] In such cases, it does not appear that the AI has retained sufficiently small ‘tokens’ from the training data for there to be no copying.

Fair dealing and Generative AI:

In the United States, questions are being raised as to the applicability of the fair use doctrine to Generative AI.[15] In that jurisdiction, fair use can be argued in relation to any works, though whether fair use is made out is determined by an open-ended analysis considering the purpose of the work, whether it is commercial, and whether it would have an effect on the market for the original work.[16] The fair use analysis also considers how substantially the output copies the original work, defined both in terms of the quantity of the copying as well as the quality of what is copied (how much the new output transforms the old work, whether it relies on important parts of the original work, and so on).[17]

As such, in Authors Guild v. Google Inc., 2d. Cir. 2015, Google did not infringe copyright by displaying small random snippets of published books because neither the quantitative or ‘qualitative heart’ of the books was displayed; the copying was subsequently unlikely to diminish the market for the full books and the use of the copy was transformative (users being unable to read the entirety of the book nor choose a selection from the entirety).[18]

While Google was able to display small snippets of books for the purpose of providing a sample that may cause a consumer to buy the book in question, training a Generative AI model to compete with texts, artworks, music etc. available on the market might mean that copying for training a Generative AI and any copying done in its outputs is not fair for the purposes of the exceptions.

Furthermore, Australia’s Copyright Act 1968 (Cth) (Copyright Act) has a narrower range of fair dealing exceptions which only apply in specific circumstances, such as research, news reporting and parody.[19] It should therefore be noted that fair dealing exceptions will apply much more narrowly in Australia in relation to Generative AI. In addition to the requirement of fitting into a prescribed exception, in each case a copy that falls into that exception must also be fair and that question including consideration of the factors set out in s 40(2) of the Act, which, similar to the US test, are open-ended questions ‘of degree … or of fact and impression’.[20] This dual requirement, of an exception as well as fairness in Australia may well make fair dealing arguments in relation to Generative AI harder to make out.

Authorisation – who infringes copyright?

Under the Copyright Act, one who authorises the infringement of copyright is also considered to have infringed copyright; whether authorisation has taken place is determined by the power of the person to prevent the act, the nature of the relationship between the persons, and any steps taken by the alleged authorising person. [21] As the High Court has recently stated, whether authorisation has been given will require ‘close focus upon all the facts’.[22]

It is therefore a live question whether the circumstances of using an AI model can be said to amount to an authorisation by the developer. Authorisation can include ‘indifference’, but this requires the authoriser to have had reason to suspect that the other person would infringe copyright and then fail to take steps within their control to prevent it.[23]

It is not clear under what circumstances a developer would suspect that users would infringe copyright. For example, OpenAI’s terms stipulate that users are to ensure that all inputs and outputs do not violate ‘any applicable laws’; furthermore, the terms purport to make the users of ChatGPT responsible for all activities that occur using ChatGPT and that users are to indemnify OpenAI for any lawsuits arising from the use of its service.[24]

There may also be disputes about whether it is the user or developer of a Generative AI model who is responsible for outputs of the model that infringe copyright. This may particularly be the case where the infringing output is caused by the user giving a prompt that asks the AI to draw on copyrighted material but where the prompt can only work because the AI has been trained on copyrighted material. For example, a user may ask a Generative AI model to render a copyrighted poem or lines of code.

In the trade mark case Hells Angels Motorcycle Corporation v Redbubble, Redbubble’s website allowed its users to upload images to the Redbubble site so they could print those images on blank merchandise provided by Redbubble and its suppliers. When users uploaded files with Hells Angels’ trade marks to the site for printing on Redbubble’s merchandise, Australian courts found Redbubble and not its users to be infringing Hells Angels’ trade marks because Redbubble was the ‘source of origin’ for goods bearing Hells Angels’ trade marks.[25]

As with Redbubble, it is the users of Generative AI models who prompt the models to create outputs. However, where copyright infringement does not rely on the ‘use’ of a trade mark but instead the copying of material, AI users rather than developers may be liable for an output that is similar to or contains a substantial part of a copyrighted work. Arguably, developers should also take steps to prevent breaches of copyright by their users, for instance, by preventing prompts that may cause their AI models to breach copyright.

Authorship:

Copyright attaches only to original works, which requires ‘independent intellectual effort’ on the part of the creator.[26] Purely ‘automated’ works that lack adequate human intervention will therefore not be afforded copyright protection.[27]

However, what constitutes ‘independent intellectual effort’ in the context of AI is not always self-evident. In many cases, it may be argued that users of AI work alongside AI rather than exploiting a fully automated process. Users often meticulously refine and revise their prompts to AI models, sometimes going through many rounds of revision before they are satisfied with a final output that they use. Those efforts might be considered independent intellectual effort, though it may be difficult to define the necessary quantity or quality of such revisions.[28] As others have pointed out, it will be wise for users to keep records of their prompts and their revisions to demonstrate intellectual output.[29]

The US, however, may go in a different direction. In a case currently on appeal, the US Copyright Office has ruled that only the parts of an AI-Generated artwork that had been edited by a human using Photoshop could be copyrighted, despite evidence that the plaintiff had revised and refined the AI prompt 624 times.[30]

Transparency – evidentiary issues:

Plaintiffs in cases involving Generative AI may face evidentiary issues because of the volume of material which is used to train Generative AI models. For example, in the forthcoming US class action Tremblay et al v OpenAI, INC. et al, the plaintiffs allege that OpenAI’s ChatGPT was trained on their novels copied from illegal torrent sites.[31] However, because the material provided publicly by OpenAI about GPT4 includes ‘no information about its dataset at all’,[32] the plaintiffs have had to rely on evidence that previous versions of ChatGPT were trained on datasets of books that were so large that:

The only “internet-based books corpora” that have ever offered that much material are notorious ‘shadow library’ [torrent] websites like Library Genesis (aka Libgen), Z-Library (aka B-ok), Sci-Hub and Bibliotik.[33]

The argument would be that despite the lack of more material evidence, the fact that GPT4 was advertised as being trained on an even larger dataset means that it also must contain likely pirated material. The lack of available evidence may make it difficult to know when a particular copyrighted work is being used. Further, claims that Generative AI models are trained on copyrighted works may be difficult to make out.[34] It is not clear the extent to which the discovery process, including preliminary discovery, may solve these problems. As Matulionyte notes:

In most cases, they [creators] are not consulted or informed when their works are used in an AI training context and are not being compensated for such uses. In addition, they are not able to enforce rights when their works are used […] without their authorisation due to a lack of transparency of such uses.[35]

Moral rights:

Users and developers of AI need to be aware that outputs of Generative AI models may infringe moral rights under the Copyright Act, which despite being often labelled a ‘non-economic’ right, can give rise to monetary damages.[36] Generative AI outputs may be distortions of an author’s existing work if they subject it to derogatory treatment, and therefore infringe the right of integrity; Generative AI outputs that copy the works they are trained on and do not attribute the original author will also risk infringing the right of attribution.[37]

One example of these issues in practice is the proliferation of what are commonly termed ‘deepfakes’: images, audio files or videos created using AI to convincingly depict a person saying or doing something that the creator of the fake video prescribes.[38] Notwithstanding the harms caused by deepfakes that may require addressing by defamation, privacy/e-safety and criminal law, deepfakes may infringe moral rights by either not attributing the author, or, more likely, distorting copyrighted material in ways that are damaging to the author. [39] Take the example of ‘Heart on my Sleeve’, a recording of a song written by TikTok user ghostwriter977 but recorded using AI-generated versions of the voices of The Weeknd and Drake. Universal Music Group (UMG) were able to have taken down on the (tenable, to say the least) basis that the AI models must have been trained on the copyrighted works of The Weeknd and Drake, who UMG represent.[40]

In such cases Australia’s fair dealing exception of parody[41] will be difficult to make good. As examples like ‘Heart on my Sleeve’ demonstrate, simply because an AI generated work shares elements of an artist’s existing work does not mean such deepfakes are created for the purpose of parody or satire; in law, to qualify as parody or satire, the work must have an element of direct criticism of the original work.[42]

Conclusion – where does Australia go from here?

As Generative AI continues to develop and we begin to see litigation in the copyright space on such issues in Australia, our legal system will need to balance its consideration of copyright owners, users of Generative AI, and its developers. If the legitimate interests of copyright owners are not protected, our legal system risks stifling incentives to create. At the same time, consideration needs to be given to the possibilities for innovation and creativity using Generative AI, which has the potential to benefit not just developers of Generative AI models but also a range of creators from artists to coders; it is therefore imperative that Australian law fosters creativity in this emerging field.[43]

One proposal is the creation of a licensing scheme to compensate creators so that their work can be used for the progression of AI but also provide a fair incentive to copyright holders.[44] Such a scheme would be administered by a statutory body and would be a ‘one-stop-shop’ so that obtaining licences is not administratively difficult for AI developers; it would also need to ensure that creators can be represented, are able to opt out for uses they object to, and will themselves receive remuneration rather than just secondary right holders such as publishers or music labels.[45] Lastly, AI developers would need to be subject to mandatory duties to keep their information transparent, as is the case in EU’s proposed Act.[46] Ultimately, if copyright holders and creators are unable to see how their work is being used, our legal system will provide little to no protection.

David Hing and David Stano

Image Credit: Pepifoto via iStock by Getty Images.

Notes:

[1] Australian Government Department of Industry, Science and Resources (2024) Safe and Responsible AI in Australia Consultation: Australian Government’s Interim Response, 17 January 2024, p.14

[2] Ibid., pp.15,22

[3] Australia is not yet moving forward with mandatory guidelines (only a voluntary scheme); Ibid.

[4] Emily St. Martin (2023) ‘Bestselling authors Mona Awad and Paul Tremblay sue OpenAI over copyright infringement’, LA Times 1 July 2023, accessed: https://www.latimes.com/entertainment-arts/books/story/2023-07-01/mona-awad-paul-tremblay-sue-openai-claiming-copyright-infringement-chatgpt

[5] Rod Sims, in Sam Buckingham-Jones (2023) ‘AI should pay for news content: Rod Sims”, Australian Financial Review 23 April 2023, accessed: https://www.afr.com/companies/media-and-marketing/ex-accc-boss-rod-sims-says-chatgpt-ai-should-pay-for-news-content-20230416-p5d0vb

[6] Bernard Marr (2023), ‘The Difference Between Generative AI and Traditional AI: An Easy Explanation for Anyone’ Forbes 24 July 2023, accessed: https://www.forbes.com/sites/bernardmarr/2023/07/24/the-difference-between-generative-ai-and-traditional-ai-an-easy-explanation-for-anyone/?sh=40151b8c508a

[7] Dr Rita Matulionyte (2023) ‘Generative AI and Copyright: Exception, Compensation or Both?’ Intellectual Property Forum 134, p. 36

[8] McKinsey (2023) ‘What is Generative AI?’, McKinsey 19 January 2023, accessed: https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-generative-ai

[9] OpenAI (2023) ‘Terms of use’ OpenAI March 14 2023, at 3(b), accessed: https://openai.com/policies/terms-of-use

[10] Dr Rita Matulionyte (2023) ‘Generative AI and Copyright: Exception, Compensation or Both?’ Intellectual Property Forum 134, pp. 36-37

[11] Ibid., pp. 34-35

[12] Paula Samuelson (2023) ‘Generative AI meets copyright’, Science 381(6654), p.159

[13] Henderson et al (2023) ‘Foundation Models and Fair Use’, Stanford University, Pre-Print, p.5, accessed: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4404340

[14] Dr Anton Hughes (2023) ‘Generative AI: Challenges at the Intersection of Copyright and Legal Practice’ Intellectual Property Forum 133, p.28

[15] Henderson et al (2023) ‘Foundation Models and Fair Use’, Stanford University, Pre-Print, p.3, accessed: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4404340

[16] Ibid. p.5

[17] Ibid., pp.8-15

[18] Ibid., pp.5-7

[19] Copyright Act 1968 (Cth) ss 40-42

[20] Universal Music Publishing Pty Ltd v Palmer (No 2) [2021] FCA 434 at [301]-[312]

[21] Copyright Act 1968 (Cth) ss 36(1), 36(1A)

[22]Real Estate Tool Box v Campaigntrack [2023] HCA 38, at [64], citing with approval Gibbs J in University of New South Wales v Moorhouse (1975) 133 CLR 1, at 12

[23] Real Estate Tool Box v Campaigntrack [2023] HCA 38, at [66]

[24] OpenAI (2023) ‘Terms of use’, at 1, 3(a) and (7a)

[25] Hells Angels Motorcycle Corporation (Australia) Pty Ltd v Redbubble Ltd [2019] FCA 355, at [468]

[26] IceTV Pty Limited v Nine Network Australia Pty Limited [2009] HCA 14, at [33]

[27] Telstra Corporation Limited v Phone Directories Company Pty Ltd [2010] FCAFC 149, at [149]

[28] The boundaries would be tested by use of a project like Google’s Poem Portraits, in which users are to input a single word and, if desired, a photo, to generate a unique poem. Reference to the project from Arts Law (no date) ‘Artificial Intelligence (AI) and Copyright’ Arts Law Information Sheets, accessed via: https://www.artslaw.com.au/information-sheet/artificial-intelligence-ai-and-copyright/

[29] Kim O’Connell et al (2023), ‘It copies, right? Generative AI & Copyright Law’, King & Wood Mallesons 11 May 2023, accessed: https://www.kwm.com/au/en/insights/latest-thinking/generative-ai-and-copyright-law.html#data

[30] Kate Knibbs (2023) ‘Why This Award-Winning Piece of AI Art Can’t Be Copyrighted’, Wired 6 September 2023, accessed: https://www.wired.com/story/ai-art-copyright-matthew-allen/

[31] Counsel for Individual and Representative Plaintiffs and the Proposed Class, ‘Complaint, Class Action, Demand for Jury Trial’ (28 June 2023), in Paul Tremblay and Mona Awad et al v OpenAI, Inc. et al (N.D. Cal.) [40]

[32] Ibid., at [35]

[33] Ibid.

[34] Mona Awad, one of the most prominent plaintiffs, has since filed a notice voluntarily dismissing herself from the action. The remaining plaintiffs also argue that the outputs of ChatGPT infringe the authors’ copyright in their novels. However, a look at the plaintiff’s evidence raises doubt that ChatGPT’s outputs copy the novels. Prompted to summarize various sections of Tremblay’s The Cabin at the End of the World, and Awad’s 13 Ways of Looking at a Fat Girl and Bunny, ChatGPT provided an accurate but superficial SparkNotes-like description of the plot and themes of the novels. It is difficult to regard the passages as a reproduction of the expression of the novels rather than a grasping at their underlying ideas and style. That ChatGPT can summarise the novels may also provide valuable evidence that ChatGPT was trained on copies of the novels, but there may be questions around the legality of such copies (e.g. fair use); Counsel for Individual and Representative Plaintiffs and the Proposed Class, ‘Complaint, Class Action, Demand for Jury Trial: Exhibit B’ (28 June 2023), in Paul Tremblay and Mona Awad et al v OpenAI, Inc. et al.

[35] Dr Rita Matulionyte (2023) ‘Generative AI and Copyright: Exception, Compensation or Both?’ Intellectual Property Forum 134, p.35

[36] For example, see Perez v Fernandez (2012) 260 FLR 1

[37] Dr Anton Hughes, ‘Generative AI: Challenges at the Intersection of Copyright and Legal Practice’ Intellectual Property Forum 133 (September 2023), p.29

[38] Zachary Boswell (2023) ‘Fiction to Fact: The rise of deepfakes and their legal implications in Australia’ UTS LSS, pp.40-43

[39] Ibid.

[40] Dr Anton Hughes, ‘Generative AI: Challenges at the Intersection of Copyright and Legal Practice’ Intellectual Property Forum 133 (September 2023), p.29

[41] Copyright Act 1968 (Cth), s 41A

[42] Universal Music Publishing Pty Ltd v Palmer (No 2) [2021] FCA 434, at [353]-[354]

[43] Paula Samuelson (2023) ‘Generative AI meets copyright’, Science 381(6654), p.159

[44] Dr Rita Matulionyte (2023) ‘Generative AI and Copyright: Exception, Compensation or Both?’ Intellectual Property Forum 134, p. 36

[45] Ibid., p. 38

[46] Proposal for an Artificial Intelligence Act 21 April 2021 accessed via eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206 See the note to article 3.5 which seeks to provide transparency of AI systems as is necessary ‘for individuals to exercise their right to an effective remedy’ whilst not disproportionately affecting the right to protection of intellectual property.