Who owns the copyright on a book written by artificial intelligence? Above all: is there a copyright on such materials? And how should the rights holders of the millions of works that the artificial intelligence consults, or processes, be remunerated?
These are questions that arise spontaneously from the theoretical-academic sphere to become the subject of public debate. On the one hand, the possibility of access to a tool such as ChatGPT has made more people aware of the potential of these tools on the side of written texts too. On the other hand, in terms of the creation of content through the use of artificial intelligence, a number of court cases have already been opened that may significantly affect the matter. A lawsuit brought before the US courts on 16 January by three women – cartoonist Sarah Andersen, digital artist Kelly McKernan and illustrator Karla Ortiz – against digital platforms Stability AI, Midjourney and DeviantArt shone a spotlight on an issue that is actually already well known to professionals in the sector: artificial intelligence, at least for the moment, does not invent anything. For the time being, AIs process data and, on the basis of this, create texts and images that are re-elaborations of other texts and images which are, in turn, covered by copyright. The holders of such rights, in some cases, feel that in this process of reworking by artificial intelligences, their rights have been infringed.
But let us go in order. To put a first full stop to this story we have to start with the behaviour of a monkey, more precisely a macaque, named Naruto. In 2016, the animal had picked up a camera left at its disposal in the forest by a British nature photographer, David Slater. The monkey had then taken some pictures that had soon become very popular and had circulated online on several websites. The association Peta, People for Ethical Treatment of Animals, filed a lawsuit demanding that Naruto be considered the owner of the rights to the pictures he had taken and, as such, it should be paid for their use (Peta proposed to handle the transactions on behalf of the animal). San Francisco federal judge William Orrick, however, rejected the claims, stating that there can’t be copyright in the absence of human creative activity. Thus: if a work isn’t created by human, it cannot be protected by copyright.
This principle of law is still valid today, even taking into account the differences between US and European legal frame. “In current legislation and international standards, from the Berne Union Convention to the 1996 WIPO/WIPO Treaty and the Trips (Trade-Related Aspects of Intellectual Property Rights), it is so far inconceivable that there is a protected work without an author, and this author must be a human person,” explains lawyer Luciano Daffarra. According to Daffarra, when discussing generative artificial intelligences today, i.e. algorithms capable of generating new content on the basis of a predefined text or image database, it is right that the materials produced by AIs should not be granted copyright protection. In fact, even if the results are astounding, they do not possess the characteristics of creativity and originality to deserve specific protection.
Instead, can we consider as the author of such materials the person who feeds the creative intelligence, gives indications on how to realise the work and, indeed, intervenes on the final input? According to Daffarra, even in this hypothesis, the recognition of authorial rights is a questionable circumstance: ‘In my opinion no, there are no prerequisites for conferring protection on the product of this technological process. The alleged author, in such a case, merely implements a merely technological activity. Let us suppose that I give instructions to a machine so that it produces a sonnet with verses, strophes and rhymes typical of the ‘dolce stil novo’. If, at the end of the operation, I appropriate the output of the machine and claim paternity without declaring where the work comes from and without proof of its actual origin, I could obtain the inherent copyright protection for myself. But if this were not the case, if I claimed to have used artificial intelligence to create the poem or if it were possible to trace its origin through log files, there would be no legitimate protection because that work would not be the fruit of human creativity. We would be faced with the appropriation by a subject of a work done by a machine that has a database of billions of information, images and other contents, many of which constitute autonomous protected works‘.
However, what has been said so far does not rule out the possibility that the framework will change in the future as a result of technological innovation. “My idea is that in the future – in the event of technological developments in artificial intelligence that allow the actual creation of original works through the use of predictive or decisional models, thus overcoming the current technology that is merely generative – we may reason to attribute specific protection to material created with the aid of algorithms”. However, this cannot be, according to Daffarra, the specific protection of copyright. “It would be” the lawyer argues, “a sui generis right, assimilable to that which already protects the creation of simple databases because of the technological investment of the producer, without this implying the recognition of a true and proper copyright in the head of the machine”.
Still open – and by no means a trivial matter – is the issue of the protection of the data through which these intelligences are trained, created. This is the subject of the lawsuit brought against Stability AI, Midjourney and DeviantArt, which we mentioned above and which brings together legal issues, but also issues of a philosophical nature with regard to what we consider a work of art to be or may be in the future, what we mean by originality, novelty, creativity, what is meant by plagiarism. At the basis of how AI works is data mining, i.e. that technological process through which algorithms collect and process huge amounts of data to create sets of answers to certain inputs that simulate, in fact generate, answers comparable to those that a human person might give when faced with a certain question. These are the technologies behind automatic translators, but also the tools that on Google or on e-mail servers suggest the most probable answer to an e-mail or a message we have received. Moving up the scale, data mining allows the creation of increasingly complex texts or images to the point of simulating the work of human creativity.
Given these new technological developments, we must then ask ourselves who should pay for the exploitation of the works included in the databases? The US legislation, which is more favourably oriented towards large companies in the digital sector, categorises text and data mining as a form of fair use and, therefore, allows such content to be freely consulted and used. The European Union allows data mining on freely consultable content, but with the possibility for the owners of the individual components of the database to reserve the right of ownership and, therefore, to oppose the use made by third parties. In this sense, rights holders choose whether or not to adhere to the exploitation of their content according to their own convenience and commercial strategies. For instance, the DeviantArt platform, named here in reference to the class action lawsuit pending before the San Francisco court, has tagged all the content it hosts on its site to prevent artificial intelligence platforms from using it.
But on this issue it is necessary to search for technological standards that allow publishers to exercise rights protection in a system of rules shared with hi tech companies and easily applicable. The Community Group Text and Data Mining Reservation Protocol set up within the W3C is working on this, with the Italian Giulia Marangoni representing the Italian Publishers Association as co-chair, together with the French Laurent Le Meur (EDRLab).
Even before giving structure and discipline to the product developed with AI algorithms, the problem arises of compliance with the rules and the correct overall framing of technologies that we still do not know in depth and that, above all, continue to evolve as we try to understand them (while we still lack even proper definitions and terms to talk about them).
An example may help in this regard. Many will remember the challenges between the computer Deep Blue and world champion Kasparov at the end of the last century, and the astonishment with which the first defeat of a human by a machine in the game of chess was greeted. Was Deep Blue already an expression of the artificial intelligence that we know today, or was it ‘only’ a system that, having stored all the chess knowledge of the time (with thousands of games of the champions, with the data of the openings and closings of the games, etc.), was able to beat a human in front of a chessboard simply due to its superior computing power? Technological developments in the field provide an initial answer: one of the currently best performing systems, called AlphaZero, has no such ‘library’ behind it. It has only received the basic knowledge of the rules of the game and is learning to play by itself. And the level of this tool is now so high that there are world chess championships (WCCC) in which computers compete against each other. Bottom line: is ChatGPT more like Deep Blue or AlphaZero? It is very difficult to answer this question. Artificial intelligences represent so-called black box models, i.e. systems whose inner working can’t be known, just the output.
What we can analyse now is the state of affairs, which at present does not suggest that artificial intelligences can actually be considered ‘creative’ or ‘generative’. It is true that they are already able, for instance, to produce texts in the style of a certain author, but this only depends on the way they are trained and the contents they have had access to. And this leads to considerations of a complex nature, especially if we are talking about content protected by copyright law, which may have been used without the authorisation of the owners, perhaps because it was available on the net in an abusive manner.
To give some examples of queries to AI engines, we know that ChatGPT responded as follows to some ‘provocative’ questions on access to protected content.
“I cannot promote or provide you with information on illegal or copyright infringing sites. Downloading copyrighted material without permission is a violation of copyright laws and may result in legal penalties. I advise you to find legal sources to purchase or read the material you are looking for.”
“[…] as an AI model trained by OpenAI, I do not have the ability to access or download copyrighted material. I rely only on publicly available information and knowledge to provide you with accurate and reliable answers’.
However, ChatGPT stated that he had read ‘many books by Italo Calvino’ but also that ‘as the AI model of OpenAI, I don’t have the ability to read books, but I can give you general information about…’.
There is therefore a potential issue of copyright infringement both upstream (in the dataset being used) and downstream, depending on the uses to be made of these systems, once they begin to be more open and customisable. If it is likely that among the billions of contents used in the current dataset only a fraction are counterfeit, what can we expect once we have the possibility of instructing the system through (even protected) contents of our choice? Will it be possible to use artificial intelligence to obtain the summary of a study book, to prepare a report or a dissertation from bibliographies of our choice, will it be possible to provide the machine with all the works of our favourite writer so that it can produce a story invented by us, but written as he would have done, or so that it can translate a work into another language? In other words, will AI be able to be used to produce creative works? If so, there is no doubt that the content produced, in the absence of the consent of the rights holders of the works used to train artificial intelligences, will have to be considered illegal.
But on the subject of the protection of content generated by artificial intelligence, there is another aspect to consider. We are certainly undergoing the fascination of technological novelty and we enjoy challenging it to explore its possibilities, but we must not forget that it is currently limited to answering our questions, which may certainly be simple but could also be so complex, so structured, so articulated as to deserve protection. And if the question deserves protection, why should not the answer also deserve it?