Artificial hallucination: GPT on LSD?

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Critical Care


Artificial hallucination: GPT on LSD?
Gernot Beutel 1* , Eline Geerits 1 and Jan T. Kielstein 2 To the Editor, In the following, we will comment on the publication of Salvagno et al. [1], as we not only share the enthusiasm but also the concerns about the potential risks of Generative Pre-trained Transformer (GPT) in scientific writing, automated draft generation and article summarisation. In fact, their paper sparked an immediate interest to try this disruptive technology ourselves, using the identical prompt (command or action sentence used to communicate with ChatGPT) as Salvagno et al., referring to the discussion of the paper by Suverein et al. "Early Extracorporeal Cardiopulmonary Resuscitation for Refractory Out-of-Hospital Cardiac Arrest" [2]. Unfortunately, the same prompt provided by Salvagno et al. [1] resulted in a completely different response from ChatGPT. Even after correcting the typo made by the authors-"Sovereign" instead of "Suverein"-we obtained the following result ( Fig. 1).
Additionally, a prompt asking for a summary of each paper did not correspond to the original publications [2,4,5] and contained incorrect information about the study period and the participants. Even more disturbing, the command "regenerate response" leads to different results and conclusions [3]. So the question arises whether artificial intelligence could suffer from artificial hallucination, and if so, what is the pathogenesis of this hallucination?
In general, "hallucinations" of ChatGPT or similar large language models (LLMs) are characterized by generated content that is not representative or senseless to the provided source, e.g. due to errors in encoding and decoding between text and representations. However, it should be noted that artificial hallucination is not a new phenomenon as discussed in [6]. Although in a more visual note it first appeared in 1968 as a malfunction of the supercomputer HAL9000 in the movie "2001: A Space Odyssey" [7]. For those who do not recall: The American spaceship Discovery One is on a mission to Jupiter, with mission pilots and scientists. The supercomputer HAL9000 is controlling most of the operations. As the journey continues, a conflict arises between HAL9000 and the astronauts concerning a malfunction of an antenna. While mission control sides with the astronauts and confirms that the computer has made a mistake, HAL9000, however, continues to blame any problem on human errors.
But why does ChatGPT communicate the result of the prompt, like HAL9000, as a confident statement that is not true? What are the underlying reasons for ChatGPT to give different answers to the same prompt? Is it operating under the influence?
Let's take a closer look at the given publication [1]: that ChatGPT and other LLM do not know where the AI is getting its specific responses from. By looking at different sources with varying information, the same prompt can lead to different answers and conclusions. So source control is lacking. (4) In addition, the "temperature" of an LLM affects the output and the extent of the artificial hallucination. "Temperature" can be translated as the degree of confidence a LLM has in its most likely response. A higher temperature makes the answer less confident. ChatGPT uses a temperature of 0.7 for its predictions, allowing the model to generate more diverse responses, or in other words, to "hallucinate. " In our opinion, LLMs such as ChatGPT will have a substantial impact on medical information processing, but as new technologies they should be critically questioned. Even more importantly, the limits and risks of these technologies should be understood by the users, including those working at the bedside. A prerequisite for using LLM in a productive manner is to avoid fundamental errors like those on board of the spaceship Discovery One, where a computer overruled human intelligence and the obvious reality. Hence, the results of LLMs should be evaluated by medical experts before they are used in research or clinical practice. ChatGPT makes one quickly forget that despite its enormous computational power and incredible database it is still not intelligent but merely programmed to recognize patterns and compile sentences based on probability calculations.
As LLMs can hallucinate artificially, we should remember the words of LSD advocate Timothy Leary: "Think for yourself and question authority. " This also applies to ChatGPT!