Study design, patients and ethics
A retrospective cohort study was conducted of patients admitted to the intensive care unit (ICU) of a large university hospital; the Amsterdam University Medical Centres, location AMC between April 2020 and April 2021. We analysed all patients who (1) were intubated for COVID-ARDS, defined according to the Berlin criteria , and (2) received CT scans at 10 cmH2O and 20 cmH2O PEEP, with a recruitment manoeuvre between scans. Per hospital protocol, each patient with COVID-ARDS who underwent a CT scan was imaged at those two PEEP levels with a recruitment manoeuvre in between. These recruitment manoeuvres were performed as part of standard practice to inform the physician about the recruitability of consolidated lung tissue. The institutional review board approved the study protocol and waived the need of informed consent.
CT scan and data collection
Non-enhanced chest CT scans were acquired at 10 cmH2O PEEP (PEEP before recruitment) and after a recruitment manoeuvre at a PEEP level of 20 cmH2O (PEEP after recruitment). Both scans were acquired in the end-expiratory phase. In-between the two CT scans, a Hamilton C2 ventilator was used to deliver 3 sustained inflations lasting 10 s by an inspiratory hold, increasing the airway pressure to 40 cm H2O for the entire hold.
Shortly before the CT scans, clinically available respiratory parameters and blood gas results were collected. Formulas used to calculate additional respiratory parameters and more details on data collection are listed in the supplementary materials. Besides that, routine laboratory results and patient demographics were collected.
CT Quantitative analysis
Lung tissue in the CT scans was segmented by an open-source artificial intelligence algorithm  and then manually adjusted using ITK-Snap . Segmentation consisted of drawing the outline of the lungs in each CT slice, excluding hilar vessels, the main bronchi, and if present pleural effusions, pneumothorax and pneumomediastinum areas. The segmentation method depended on the available slice thickness: for the patients with a CT scan composed of 3-mm slices, all slices were segmented. For patients with a CT scan consisting of 1-mm slices, a reduced number of slices were extrapolated and segmented in order to make the quantitative analysis more efficient. This was done similar to previous studies that validated the use of a reduced amount of slices for an accurate evaluation of lung aeration [14, 15]. The distance between the 1-mm slices was set at 20 mm, as previously suggested .
The determination of gas and tissue volumes was performed by analysing CT numbers of all lung voxels in Hounsfield units (HU). Lung regions were classified as normally aerated (from − 900 to − 501 HU), poorly aerated (from − 500 to − 101 HU), non-aerated (from − 100 to 100 HU) and hyper-inflated (from − 1000 to − 901 HU) to allow for comparison with previous ARDS literature .
To calculate lung volume, the number of lung voxels was multiplied by the volume of one voxel in millilitres to form the total volume of the lung irrespective of aeration of the tissue. As lung tissue is assumed to be a composition of air (− 1000 HU) and lung parenchyma with a similar density to water (0 HU), lung weight could be calculated using the tissue fraction of the lung derived from CT numbers [16, 17]. End-expiratory lung volume was calculated using the gas fraction of the lung. Recruitment was defined as the decrease in non-aerated lung weight after the recruitment manoeuvre, divided by total lung weight before the recruitment manoeuvre.
LCA was applied on clinically available respiratory parameters, blood gas analysis, CT-derived gas and tissue volumes (at PEEP before recruitment) and routine laboratory results. Outcomes and demographics were not considered during model design and LCA.
Variables that correlated considerably (Spearman correlation coefficient > 0.7 and p value < 0.05) or were mathematically coupled were excluded from the LCA, as variables are assumed to be locally independent and a violation of that assumption could introduce bias [18, 19]. Data used in the LCA were imputed (supplementary materials) and transformed to resemble a normal distribution, which was verified using histograms, Q–Q plots and the Shapiro–Wilk test. Variables were scaled by subtracting the mean and dividing by the standard deviation. The best fitting latent model in terms of number of latent classes was evaluated by the Bayesian information criterion (BIC) score and Lo–Mendell–Rubin adjusted likelihood ratio test (LMR-LRT) . Entropy, median class assignment probabilities and the amount of class assignment probabilities above 90 per cent were calculated. During model design and LCA, key steps and considerations described by Sinha et al.  were taken into account. Subphenotype characteristics were displayed using a profile plot containing standardized mean differences (SMDs) of subphenotype defining variables. To perform the LCA, open-source package ‘Flexmix’ was used with model driver ‘FLXMCmvcombi’ which allows for both binary and Gaussian indicators as input.
Continuous data were expressed as mean with the standard deviation or median with the interquartile range according to statistical distribution. Categorical data were presented as numbers with percentages. Demographical parameters, clinical parameters, CT-derived lung volumes and weight and outcomes were compared between subphenotypes. Differences in mean, median and proportion between subphenotypes were tested using t-test, Wilcoxon signed-rank test or chi-squared test as appropriate. Tests were two-sided with a type I error set at 5%.
For a simplified subphenotype identification, a nested parsimonious model was created considering all LCA variables and tested for subphenotype prediction capacity. To create the nested model, all variables were entered in a LASSO regression, while tuning its λ parameter in order to arrive at a 4 variable model. Next, these variables were entered in a generalized linear model (GLM) validated with fivefold cross-validation. Finally, to quantify, assess and compare the subphenotype prediction performance of the nested models, areas under the receiver operating characteristics curve (AUROC) and bootstrapped confidence intervals were calculated . An additional nested parsimonious model excluding CT-derived parameters was also created and tested, as well as standard ICU severity scores PaO2/FiO2, Apache II and SOFA.
Survival was visualized using Kaplan–Meier curves and analysed using a Cox proportional hazards model. Time until successful extubation and survival was analysed in a Fine and Gray competing risks regression model and displayed in a cumulative incidence function plot. The subdistribution hazard ratio (SHR) was calculated representing time until successful extubation in the presence of diverging survival. Both the survival and the competing risk analysis were corrected for predefined confounders: age, gender and Apache II  score. Survival and time until successful extubation were treated as right-censored data, with censoring representing having left the ICU alive or completing the 60-day follow-up period.
All analyses were performed with R through the R-studio interface, version 4.0.3.