With AI taking the entire world by storm, it is imperative to stay on top of developments, assess and re-assess risks and opportunities, and make sure to know what the latest applications in the sector are. On 3 July, the international online seminar organised by the Italian Publishers Association (AIE) Artificial Intelligence: one year later Evolutions, developments and opportunities for the publishing industry”, one year after the first event on this topic, was a great moment to take stock of where the publishing industry is in terms of accelerating uptake where it makes sense and keep a cautious watching brief where the dust hasn’t settled yet.
The event is part of a wider programme of initiatives and training courses planned by the AIE to provide the industry with an articulated overview of the different impacts and opportunities of AI in publishing, as explained in the welcome greetings by AIE Delegate to Innovation Andrea Angiolini, Editorial director and head of digital publishing programs at Il Mulino publishing house.
The first session, moderated by Giulia Marangoni (International Relations Office, AIE), addressed the evolution of the regulatory framework on copyright and AI, such as the impact of AI on publishing house under different perspectives, the availability of innovative tools, the relationship with authors, the emerging opportunities for licensing contents for AI.
As explained by Piero Attanasio (Head of Institutional Relations – AIE), in the last year, one key development was the adoption of the AI Act in the European Union, which is the first attempt at regulating providers of the so-called General Purpose AI (the likes of GPT-4 and Claude), which will soon have to provide details about the works they used for training. Regardless of future behaviours, it is yet to be clarified whether training of models so far happened lawfully, as a commercial exception for Text and Data Mining has been available in the EU, where the rightsholder doesn’t reserve rights, only since mid-2021. When re-defining the publishers’ relations to AI builders and providers, potentially through licensing, it is important to ensure they won’t be excused of their liability, should they have used content unlawfully. Although most obligations don’t apply to publishers, the sector will have to reflect on ethical implications and on what level of transparency is appropriate in disclosing to readers where AI was used in the publishing process.
Peter Schoppert (NUS Press – National University of Singapore) made an interesting analysis of licensing opportunities for copyrighted content. For now, it looks like copyrighted content may play a role in several phases of the AI cycle: the pre-training, the fine-tuning/alignment, and for Retrieval Augmented Generation (RAG, a new AI architecture linking Large Language Models to datastores); future applications will doubtlessly emerge in the future. RAG carries particular potential for the licensing of publishers’ high-quality, curated content. In June 2024, Fortune Business Insights reported that data licensing for AI is already a 2.9-billion-dollar market, and the first deals in the publishing space have emerged. To determine a publisher’ strategy for licensing, careful consideration should be made of whether short-term revenue gains may impact the brand and value of IP assets in the long term; what type of institution/actor they’re licensing to; how clear and enforceable the licensing conditions are, including how narrowly defined the permitted uses are.
From a technological perspective, Giuseppe Attanasio (Postdoctoral Researcher at the University of Lisbon and the Center for Responsible AI) went through the latest functions and opportunity offered by AI platforms, such as multimodality and new forms of interactivity.
The second session of the seminar, moderated by Cristina Mussinelli (Digital Publishing Consultant, AIE), focused on international case history from different publishing segments: STM, educational and trade, highlighting challenges and opportunities in each of them.
Claudio Colaiacomo (Vice President – Public Affairs & Academic Relations – Elsevier BV) and Joris van Rossum (Programme Director, STM Solutions) explained that in scholarly publishing, a suite of applications and services are already available to publishers for different purposes e.g., to assist the authoring process, to summarise topics, to support research discovery, to screen manuscripts, recommend reviewers. Potential is seen especially in addressing research integrity issues – which, at the same time, risk being exacerbated by the very same technology.
In education, as illustrated by Anna Helminen – Head of Data & Research, Sanoma Learning, different levels and combinations of blended learning are being selected depending on the uses. AI is used internally to analyse usage and improve products and efficiency, for editing and versioning of existing content, to adapt learning content to students’ performance, needs and preferences, support teachers in managing their materials and personalising the ways of teaching.
Chantal Restivo-Alessi (Chief Digital Officer and CEO International Foreign Language for HarperCollins Publishers) explained that in trade, marketing is one of the key use cases, with AI aiding in campaigns on social media, formatting and layout, newsletters, and summaries of content
Overall, it emerged that many publishers and authors across all sectors are currently experimenting, with multi-modality, the increasing availability of models and their potential customisation making such experimentation more approachable for many. The key to success is to start from an analysis of the objectives and define a framework to ensure a safe and sound transition; operating through pilots, especially at the beginning; agreeing on principles, checklists (including to ensure compliance with legislations) and an evaluation/feedback process; ensuring training is available to bring all the staff along; and always keeping a human in the loop.
Safeguards are considered a necessity to protect privacy, confidentiality and intellectual property and guarantee the security of environments and contents; as an example, many publishers have recommended that final responsibility for the submitted content should stay with author, that GenAI tools shouldn’t be credited as authors and that original works shouldn’t be uploaded to publicly available GenAI tools.
At the core of what publishers provide and reason why caution is necessary, quality, veracity and trustworthiness of information are at clear risk of being undermined. This emerged distinctly as the main threat deriving from the indiscriminate use of technology, together with the embedding and amplification of potential bias and the adverse environmental impacts.