Skip to main content

Beware the self-fulfilling prophecy: enhancing clinical decision-making with AI

The integration of artificial intelligence (AI) in healthcare through large language models (LLMs) presents a paradigm shift with both promising opportunities and significant challenges. As we stand at the cusp of this transformation, it is crucial to critically examine its implications, particularly the potential pitfalls. Here, we discuss the latent risk of over-reliance on AI-driven clinical decision-making and propose potential solutions to mitigate this risk. While acknowledging the potential of LLMs to enhance healthcare delivery, we emphasize the importance of implementing a post-decision review process and utilizing its outcomes to create curated, publicly available datasets. This approach aims to avoid a “self-fulfilling prophecy” where AI systems amplify existing biases. By establishing this feedback loop, we can synergize AI capabilities with the expertise of clinicians, ensuring responsible and effective integration of LLMs in healthcare.

Our recent study[1] evaluated the accuracy of GPT-4.0 in predicting the risk for endotracheal intubation within 48 h for 71 patients receiving high-flow nasal cannula (HFNC) oxygen therapy by comparing its performance to that of clinical physicians. The area under the receiver operating characteristic curve (AUROC) reflecting the predictive efficacy of GPT-4.0 has been comparable to that of specialist physicians (0.821 vs. 0.782, p = 0.475) and superior to that of non-specialist physicians (0.821 vs. 0.662, p = 011) [1]. Similar findings have been reported in other clinical scenarios [2,3,4], highlighting the potential of the near-future application of LLMs in augmenting clinical decisions and answering medical questions.

The integration of LLMs in clinical decision-making holds the promise of substantially increasing the efficiency of healthcare delivery and promoting the homogenization of healthcare services. By providing rapid, data-driven insights, LLMs can assist clinicians in making timely and accurate decisions, thus potentially improving patient outcomes [3, 5]. Moreover, AI can help standardize care practices across different healthcare settings and ensuring that patients receive consistent and high-quality care regardless of location and development levels.

However, these promising results also raise significant concerns. To illustrate, consider a hypothetical scenario from our study: a patient with respiratory failure receiving HFNC oxygen therapy has an AI-predicted 70% risk (95% CI: 60–85%) of requiring intubation within the next 48 h. Based on this prediction, delaying intubation could increase the patient's mortality risk, prompting physicians to opt for immediate intubation and mechanical ventilation. This decision would seemingly validate and reinforce the AI's prediction, leading to higher predicted risks for similar cases in the future since the AI's learning is based on real-world data outcomes. This situation resembles a self-fulfilling prophecy, akin to the narrative in Shakespeare's play “Macbeth.” However, patients might have avoided unnecessary intubation had the clinical decision been based on comprehensive clinical judgment rather than AI prediction alone.

Potential over-reliance on AI could lead to a loss of clinical intuition, reducing the clinician's role to merely following algorithmic recommendations without critical judgment. A potential solution to mitigate these concerns involves a post-decision review process where, after clinical decisions are made with AI involvement, all data are archived and reviewed by multiple specialist physicians. This review process can help correct any suboptimal judgments and create a refined dataset. Making this data publicly available can further enhance the reliability of AI systems. LLMs like GPT could then learn from these high-quality, curated datasets, potentially improving their predictive capabilities and making more accurate and reliable future predictions.

This solution leverages the collective expertise of specialist physicians to refine AI-driven clinical decisions, ensuring that AI recommendations are critically evaluated and validated. However, there are challenges, such as ensuring the privacy and security of patient data, obtaining consistent and thorough reviews from specialists, and addressing legal and ethical concerns for applications of AI in healthcare. For example, the European Union’s General Data Protection Regulation (GDPR) requires that algorithms must have transparency before being used in patient care [5]. Despite these challenges, this approach presents a pathway towards a future where AI and clinicians work synergistically to enhance patient outcomes, improve healthcare delivery efficiency, and reduce disparities in healthcare services across different locations and development levels.

Availability of data and materials

No datasets were generated or analysed during the current study.

References

  1. Liu T, Duan Y, Li Y, Hu Y, Su L, Zhang A. ChatGPT achieves comparable accuracy to specialist physicians in predicting the efficacy of high-flow oxygen therapy. Heliyon. 2024;10: e31750.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Cabral S, Restrepo D, Kanjee Z, Wilson P, Crowe B, Abdulnour R-E, et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA Intern Med. 2024;184:581.

    Article  PubMed  Google Scholar 

  3. Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620:172–80.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on Medical Challenge Problems [Internet]. arXiv; 2023 [cited 2024 Jul 21]. Available from: http://arxiv.org/abs/2303.13375

  5. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

None.

Funding

T.L. was supported by National High Level Hospital Clinical Research Funding [BJ-2023–173].

Author information

Authors and Affiliations

Authors

Contributions

Writing – original draft, T.L. and Y.D.; writing – review & editing, T.L. and Y.D.

Corresponding author

Correspondence to Yaocong Duan.

Ethics declarations

Competing interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Duan, Y. Beware the self-fulfilling prophecy: enhancing clinical decision-making with AI. Crit Care 28, 276 (2024). https://doi.org/10.1186/s13054-024-05062-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13054-024-05062-3