Machine learning for the real-time assessment of left ventricular ejection fraction in critically ill patients: a bedside evaluation by novices and experts in echocardiography

Background Machine learning algorithms have recently been developed to enable the automatic and real-time echocardiographic assessment of left ventricular ejection fraction (LVEF) and have not been evaluated in critically ill patients. Methods Real-time LVEF was prospectively measured in 95 ICU patients with a machine learning algorithm installed on a cart-based ultrasound system. Real-time measurements taken by novices (LVEFNov) and by experts (LVEFExp) were compared with LVEF reference measurements (LVEFRef) taken manually by echo experts. Results LVEFRef ranged from 26 to 80% (mean 54 ± 12%), and the reproducibility of measurements was 9 ± 6%. Thirty patients (32%) had a LVEFRef < 50% (left ventricular systolic dysfunction). Real-time LVEFExp and LVEFNov measurements ranged from 31 to 68% (mean 54 ± 10%) and from 28 to 70% (mean 54 ± 9%), respectively. The reproducibility of measurements was comparable for LVEFExp (5 ± 4%) and for LVEFNov (6 ± 5%) and significantly better than for reference measurements (p < 0.001). We observed a strong relationship between LVEFRef and both real-time LVEFExp (r = 0.86, p < 0.001) and LVEFNov (r = 0.81, p < 0.001). The average difference (bias) between real time and reference measurements was 0 ± 6% for LVEFExp and 0 ± 7% for LVEFNov. The sensitivity to detect systolic dysfunction was 70% for real-time LVEFExp and 73% for LVEFNov. The specificity to detect systolic dysfunction was 98% both for LVEFExp and LVEFNov. Conclusion Machine learning-enabled real-time measurements of LVEF were strongly correlated with manual measurements obtained by experts. The accuracy of real-time LVEF measurements was excellent, and the precision was fair. The reproducibility of LVEF measurements was better with the machine learning system. The specificity to detect left ventricular dysfunction was excellent both for experts and for novices, whereas the sensitivity could be improved. Trial registration: NCT05336448. Retrospectively registered on April 19, 2022. Supplementary Information The online version contains supplementary material available at 10.1186/s13054-022-04269-6.


Introduction
The assessment of left ventricular ejection fraction (LVEF) is part of the point of care echocardiographic evaluation of critically ill patients [1][2][3]. It has the *Correspondence: frederic.michard@bluewin.ch disadvantage of being time-consuming and operator dependent. Machine learning algorithms have recently been developed to facilitate, automate, and decrease the variability of echocardiographic measurements [4][5][6][7]. Several algorithms have been designed specifically for the real-time assessment of LVEF [8][9][10]. They have been trained to recognize specific ultrasound images, enable instantaneous image quality control, and measure LVEF automatically in just a few seconds. However, clinical validation studies remain scarce and have been done in ambulatory cardiac patients [8][9][10].
In critically ill patients, we compared real-time LVEF measurements taken with a new machine learning algorithm to reference manual measurements taken by experts in echocardiography.

Methods
We prospectively studied critically ill patients who required an echocardiographic evaluation during their ICU stay and in whom it was possible to obtain transthoracic images enabling a manual and quantitative evaluation of left ventricular systolic function. Real-time LVEF measurements were taken with a machine learning algorithm (Real-Time EF, GE Healthcare, Chicago, USA) installed on a cart-based ultrasound system (Venue, GE Healthcare). The real-time LVEF software is a neural network algorithm which has been trained with thousands of cardiac images to automatically detect the 4-chamber view of the heart, locate landmarks on the left ventricular wall and detect end-diastolic and end-systolic times from the mitral valve motion. Once the endocardial border is detected, the algorithm provides immediate user feedback regarding image quality using color-coding. When image quality is considered acceptable (green or yellow endocardial border displayed on screen), left ventricular volumes are automatically estimated from the singleplane Simpson disk method, enabling LVEF calculation from real-time end-diastolic and end-systolic volumes.
Real-time LVEF measurements obtained by a novice (LVEF Nov ) and by an expert (LVEF Exp ) were compared with LVEF measurements taken manually by an expert in critical care echocardiography (LVEF Ref ). Seven novices (all residents in our department and beginners in echocardiography) and two experts (senior intensivists with the European Diploma in Advanced Critical Care Echocardiography) participated in data collection. Measurements taken in triplicate were averaged for comparisons, and the intra-operator reproducibility was assessed by calculating the coefficient of variation (standard deviation divided by the mean) expressed as a percentage.
The quality of echo images was classified as good, fair, or poor by the experts, and as green (optimal), yellow (acceptable), or red (not acceptable for real-time LVEF measurements) by the machine learning algorithm.
Results are expressed as mean ± standard deviation (SD). Agreement between real-time and reference LVEF measurements was tested using the Bland-Altman method. Statistical comparisons were made with a t-test. A p value < 0.05 was considered statistically significant.
According to experts' judgement, the quality of echo images was good, fair, and poor in 41, 43, and 11 patients, respectively. The average difference (bias) between realtime and reference LVEF measurements was comparable when images were of good quality (n = 41) and of fair or poor quality (n = 54), both for experts and novices (Table 1). And results did not change significantly after excluding the 11 patients with poor image quality (Table 1).
According to the machine learning algorithm, the quality of echo images was green, yellow, and red flagged in 80, 15 and 0 patients, respectively. Results did not change significantly after excluding the 15 patients in whom images were non-optimal/yellow flagged ( Table 1).

Discussion
An increasing number of anesthesiologists and intensivists have been trained to perform qualitative echocardiographic assessments [1][2][3]. However, quantitative evaluations remain challenging for many, particularly for novices. In the present study, we tested an artificial intelligence-enabled tool specifically designed to facilitate and automatize the bedside measurements of LVEF. Our findings suggest that this tool enables a clinically acceptable estimation of LVEF when compared to manual measurements. They also suggest that the real-time LVEF tool enables novices to assess LVEF with a better reproducibility than what experts can achieve manually. Several machine learning algorithms have been designed to assess LVEF from a parasternal long axis view or from an apical 2 or 4-chamber view [8][9][10]. Comparison studies published so far yielded promising results. Indeed, close correlations and good agreements have been reported between LVEF measurements taken by skilled operators and by machine learning algorithms, particularly when the algorithm detects and analyze the apical 4-chamber view [9,10]. However, clinical validation studies remain scarce and have been done in ambulatory cardiac patients. Our study appears to be the first evaluation done in critically ill patients in whom transthoracic echocardiography is often challenging, in particular when patients are mechanically ventilated. Our findings suggest that the real-time LVEF algorithm may help clinicians, including beginners in echocardiography, to accurately measure LVEF in just a few seconds. Such a tool may contribute to further increase the adoption of point of care echocardiographic evaluations in critically ill patients.
Our study has limitations. Because ultrasound evaluations are time-consuming, we studied hemodynamically stable patients to ensure comparability between measurements taken at each step of the evaluation (LVEF measurements were first taken by a trainee, then by an expert both manually and with the automatic method). Also, we did not assess the ability of the new real-time LVEF method to track changes in LVEF. A small number of patients had a severely impaired left ventricular systolic function (LVEF Ref < 30%, n = 4) or a hyperkinetic ventricle (LVEF Ref > 70%, n = 2). Therefore, future studies will need to assess the clinical value of the real-time LVEF algorithm during hemodynamic instability, in patients with a very low or supranormal LVEF, and during therapeutic interventions (e.g., inotropic stimulation) known to induce significant changes in systolic function.

Conclusion
Machine learning-enabled real-time measurements of LVEF were strongly correlated with manual measurements obtained by experts. The accuracy of real-time LVEF measurements was excellent, and the precision was fair. The reproducibility of LVEF measurements was better with the machine learning system, including for novices. The specificity to detect left ventricular systolic dysfunction was excellent both for experts and novices, whereas the sensitivity could be improved. Studies are needed to confirm our findings in mechanically ventilated patients with cardiogenic shock or hyperdynamic states.