Peripheral blood transcriptomic sub-phenotypes of pediatric acute respiratory distress syndrome

Background Acute respiratory distress syndrome (ARDS) is heterogeneous and may be amenable to sub-phenotyping to improve enrichment for trials. We aimed to identify subtypes of pediatric ARDS based on whole blood transcriptomics. Methods This was a prospective observational study of children with ARDS at the Children’s Hospital of Philadelphia (CHOP) between January 2018 and June 2019. We collected blood within 24 h of ARDS onset, generated expression profiles, and performed k-means clustering to identify sub-phenotypes. We tested the association between sub-phenotypes and PICU mortality and ventilator-free days at 28 days using multivariable logistic and competing risk regression, respectively. Results We enrolled 106 subjects, of whom 96 had usable samples. We identified three sub-phenotypes, dubbed CHOP ARDS Transcriptomic Subtypes (CATS) 1, 2, and 3. CATS-1 subjects (n = 31) demonstrated persistent hypoxemia, had ten subjects (32%) with immunocompromising conditions, and 32% mortality. CATS-2 subjects (n = 29) had more immunocompromising diagnoses (48%), rapidly resolving hypoxemia, and 24% mortality. CATS-3 subjects (n = 36) had the fewest comorbidities and also had rapidly resolving hypoxemia and 8% mortality. The CATS-3 subtype was associated with lower mortality (OR 0.18, 95% CI 0.04–0.86) and higher probability of extubation (subdistribution HR 2.39, 95% CI 1.32–4.32), relative to CATS-1 after adjustment for confounders. Conclusions We identified three sub-phenotypes of pediatric ARDS using whole blood transcriptomics. The sub-phenotypes had divergent clinical characteristics and prognoses. Further studies should validate these findings and investigate mechanisms underlying differences between sub-phenotypes.


Introduction
Acute respiratory distress syndrome (ARDS) is characterized by acute onset of bilateral pulmonary edema and hypoxemia not fully explained by cardiac dysfunction [1,2]. Primarily defined for adults, ARDS affects 45,000 children in the United States annually [3], representing 10% of mechanically ventilated children in pediatric intensive care units (PICUs) [4], with a mortality rate of 20% in the United States and 30% worldwide [5,6]. There are no specific pharmacological therapies for adult or pediatric ARDS despite several trials, and supportive care with lung-protective ventilation [7] and fluid restriction [8] remains the mainstay of treatment.
ARDS is heterogeneous, with patients having distinct comorbidities and inciting etiologies. This heterogeneity has contributed to negative trial results, as therapies effective in some patients are ineffective in others [9]. Methods to reduce heterogeneity, including sub-phenotyping using protein and mRNA biomarkers, have been proposed for improving patient selection for future clinical trials [10]. Extensive work in adult ARDS has demonstrated differential response to positive end-expiratory pressure [11], conservative fluid management [12], and simvastatin [13] depending on subtypes defined, in part, by protein biomarkers. By contrast, the presence of subtypes in pediatric ARDS is largely unexplored [14].
Whole blood transcriptomics has led to significant insights into the heterogeneity of adult [15,16] and pediatric sepsis [17,18]. Unsupervised clustering has identified sepsis subtypes with differential biology, and potentially differential response to therapy [19]. Few gene expression studies have been performed in adult ARDS [20], and none in pediatrics. The aim of the present study was to identify sub-phenotypes of pediatric ARDS using unsupervised clustering on whole blood transcriptomics, hypothesizing that ≥ 2 subtypes would be identified.

Procedures
Clinical data were recorded prospectively. Blood was collected ≤ 24 h of ARDS onset (time of fulfilling all Berlin criteria) in PAXgene RNA tubes (BD Biosciences, San Jose, CA), kept overnight at room temperature up to 24 h, and then stored at − 20 °C for batched analysis. After ensuring RNA integrity, we generated gene expression profiles using Human Gene 2.1 ST Array (Affymetrix, Santa Clara, CA) and the GeneTitan instrument. Microarray data was background-corrected and quantile-normalized using robust multi-array average for downstream analyses [23]. Data were uploaded to the Gene Expression Omnibus (GSE147902).

Outcomes
The objective of this study was to identify sub-phenotypes of pediatric ARDS and assess the association of these subtypes with clinical variables, PICU mortality, and ventilator-free days (VFDs) at 28 days. Only invasive ventilation was counted, with the first day as ARDS onset. Liberation from invasive ventilation for > 24 h defined ventilator duration. Patients requiring re-intubation > 24 h after extubation had additional days counted towards total ventilator days. VFDs were determined by subtracting total ventilator days from 28 in survivors. Patients with total ventilator days ≥ 28 days and all PICU non-survivors were assigned VFD = 0.

Statistical analysis
For sub-phenotype discovery, we analyzed gene expression using k-means clustering, restricting the analysis to 31,136 annotated genes (as of July 2019). We chose an optimal number of clusters k using the gap statistic and 95% confidence intervals (CI). First, we computed the gap statistic and 95% CI for k = 1-10, considering clusters with overlapping confidence intervals as having similar performance (Additional file 1: Fig. 1). We then chose the maximal gap statistic with > 10 subjects per cluster (~ 10% of entire cohort). Clustering was performed solely based on gene expression, blinded to clinical characteristics and outcomes. For pathway analysis of the identified sub-phenotypes, probes were filtered for expression values ≥ 10 in ≥ 10 samples and differentially expressed genes (DEGs) for each subtype determined using DESeq2 [27] with BioMaRT [28]. Two-fold upregulated and downregulated DEGs were analyzed in Ingenuity Pathway Analysis (IPA) [29] and ToppGene [30] to identify predicted upstream regulators, Gene Ontogeny terms, and key pathways. Pathways with q value < 0.1 are presented in the Supplement.
Sub-phenotypes were assessed for association with clinical characteristics using non-parametric statistics. Categorical data were compared using Fisher exact test. We tested the association between sub-phenotypes and mortality and VFDs using logistic and competing risk regression [31], respectively, adjusting for (individually and together) immunocompromised status and PRISM III score. We reasoned that these two variables plausibly contributed to both the identity of the sub-phenotypes as well as outcomes, as they are associated with circulating immune cell gene expression and pulmonary and non-pulmonary severity of illness. Thus, immunocompromised status and PRISM III represent potential confounding of the association between subtypes and outcomes. Separately, we tested the association between sub-phenotypes and outcome adjusting for predicted mortality based on a recent pediatric ARDS-specific mortality prediction score [32]. Additionally, we repeated the above regressions while also adjusting for absolute neutrophil count (ANC) and absolute lymphocyte count (ALC) in order to assess whether associations between sub-phenotypes and outcomes were driven by lymphocyte subset proportions. Due to the limited number of deaths in the cohort, we restricted the number of confounders in all models to minimize bias and variance. Analyses were performed in Stata 14.2/SE (StataCorp, LP, College Station, TX) and R 3.0.1 (www.r-proje ct.org). Heatmaps were generated with pheatmap and gridExtra in R.
To understand the biology of the sub-phenotypes, we analyzed the association between sub-phenotype and total leukocytes, ANC, and ALC (Additional file 1: Table 1). All leukocyte metrics were associated with CATS subtypes, with modest overall effect sizes (η 2 ) between 5.4 and 11.2%. We performed analyses assessing for upstream regulators, Gene Ontogeny terms, and key pathways ( CATS-1 was enriched for adaptive immune and T cell pathways. CATS-2 was enriched for complement pathways. CATS-3 showed upregulation of G-protein receptor signaling and olfactory pathways. Regulator analysis demonstrated significant inflammatory cytokine regulation of CATS-1 pathways. In unadjusted analysis, CATS-3 had better survival and more VFDs than the other subtypes (Table 1, Fig. 3). After adjustment for PRISM III and immunocompromised status (  . 2 Heatmap of over-and under-expressed functional pathways and regulators using Gene Ontology (GO) and Ingenuity Pathway Analysis (IPA). The scale for A to D represents − log 10 (q value) for upregulated and log 10 (q value) for downregulated terms. Color scale represents activation/ inhibition score for E and F immunocompromised status attenuated it. Results were unchanged when also adjusting for ANC or ALC. We found similar results when we adjusted for the probability of death based on a published prediction model (Additional file 1: Table 2). The association of CATS-3 with better outcomes was not completely explained by fewer immunocompromised subjects in CATS-3, as an analysis restricted to immunocompetent subjects had point estimates confirming the association with lower mortality and greater VFDs in CATS-3 (Additional file 1: Table 3), although not all analyses reached statistical significance with the reduced sample size.

Discussion
We identified three sub-phenotypes of pediatric ARDS with distinct biologic pathways and prognoses using whole blood transcriptomics within 24 h of ARDS onset. The sub-phenotypes demonstrated some overlap of traditional clinical characteristics of ARDS severity, with immunocompromised status, stem cell transplant, and severe hypoxemia seen at differing proportions across all subtypes. Transcriptomic sub-phenotypes may provide insight into molecular mechanisms underlying pediatric ARDS heterogeneity, particularly when combined with clinical characteristics.
ARDS heterogeneity has contributed to the paucity of therapies, and sub-classification into subtypes has been proposed as a way to address this. ARDS has been divided into direct or indirect [33][34][35], infectious or non-infectious [36,37], focal versus non-focal [38], and on the basis of biomarkers [11,33]. A recent trial attempted predictive enrichment by stratifying treatment arm based on radiographic classification of focal or non-focal ARDS [39]. A limitation of this approach in this trial was the imprecision of the clinical designation of focal versus non-focal ARDS, with 21% of subjects misclassified. Thus, while clinical variables such as risk factors and comorbidities can inform heterogeneity, these terms remain imprecise.
Biomarker-and transcriptomic-based sub-phenotyping may offer some advantages, including greater insight into pathophysiology. Re-analysis of adult ARDS trials have identified hyper-and hypo-inflammatory sub-phenotypes characterized, in part, by differential levels of inflammatory biomarkers [11][12][13] and gene expression [40]. These findings in adults, and our results in pediatrics, demonstrate the utility of transcriptomics to uncover mechanisms underlying subtypes. Indeed, transcriptomics offer higher dimensional analysis, relative to protein biomarkers, a fact which potentially allows for better discrimination of sub-phenotypes.
We have previously demonstrated that infectious and non-infectious ARDS have different predictors of mortality [37]. CATS sub-phenotypes did not stratify according to either direct/indirect or infectious/non-infectious classifications. This may reflect the imprecision of clinical subtyping, different underlying biology between clinical characterization and peripheral gene expression, or low power. However, clinical characteristics may potentially serve as one level of sub-classification which can be improved upon with the addition of transcriptomics. Full realization of this requires more rapid turnaround for biologic-based sub-phenotyping, as clinical categorization is immediately applicable at bedside.
CATS sub-phenotypes revealed mechanisms which were not immediately apparent. CATS-1, for example, was enriched in adaptive immunity, which could be related to its relatively higher ALC. CATS-1 also demonstrated persistent hypoxemia, which is potentially related to signaling associated with adaptive immunity or to the types of organisms which may have caused the ARDS. CATS-2, which had nearly half of its subjects immunocompromised, was enriched in complement-related pathways, consistent with an emerging role for this pathway with stem cell transplant patients [41]. CATS-3 had suppression of adaptive immune and T cell receptor pathways. The sub-phenotypes also demonstrated prognostic utility, with CATS-3 subjects demonstrating improved survival and VFDs in unadjusted and adjusted analyses.
There are few trials in pediatric ARDS, and management is largely extrapolated from adults. The identification of sub-phenotypes with divergent biology forms the premise for targeted treatment. Subtypes with differential upregulation of innate and adaptive immunity offer intriguing opportunities for predictive enrichment in future trials of immunomodulatory therapies. Transcriptomics also allows insight into the mechanisms underlying the broader condition of ARDS, as well as the pathophysiology underlying different subtypes. ARDS has long been considered a disease of predominantly neutrophil infiltration [42,43]. However, leukocyte populations and pathways other than innate immune hyperinflammation contribute to ARDS pathogenesis, which can potentially be dissected via transcriptomics [44,45].
Given the ARDS heterogeneity, transcriptomic differences between the CATS sub-phenotypes may simply reflect differences in underlying risk factors, limiting their utility for predictive enrichment. However, the molecular basis for the heterogeneity of risk factors is also poorly elucidated. Pathway enrichment of the CATS sub-phenotypes provides insights into the different immune pathways implicated in early ARDS. Whether this can assist with predictive enrichment remains to be demonstrated. However, given the differences in mortality rate, these sub-phenotypes may also have a role for prognostic enrichment.
We performed microarray rather than direct RNA sequencing (RNA-seq). While RNA-seq provides greater dynamic range and is superior at identifying low abundance transcripts, whole blood presents unique challenges. Up to 70% of the mRNA in a blood total RNA sample can be globin mRNA, with the remaining total RNA composed of > 90% ribosomal RNA (rRNA). Neither globin mRNA nor rRNA sequences contribute high-value information, and unlike hybridization techniques, over-representation of non-informative sequences consume reagents and require greater sequencing depth to yield useful information. Globinand rRNA-depletion techniques are available [46,47]; however, depletion techniques reduce the amount of RNA (particularly from leukopenic subjects) and potentially introduce artifact. Since microarrays are based on hybridization, over-abundance of globin or rRNA is less problematic, and so microarray was chosen for this study.
Notably, every whole blood transcriptomic sub-phenotyping study to date has used microarray [15,16,48]. However, as RNA-seq technology improves and achieves better performance in whole blood, future transcriptomic studies may benefit from the improved coverage of direct sequencing technologies. Our study has several strengths. We prospectively collected blood ≤ 24 h of ARDS onset and generated expression profiles in > 90% of samples. Detailed clinical data was collected and correlated with sub-phenotypes. However, our study has important limitations. Subjects were recruited from a single center, which may limit generalizability. However, demographics and severity of ARDS are comparable to other published cohorts [6,[49][50][51]. We did not use the recent Pediatric Acute Lung Injury Consensus Conference (PALICC) definition of pediatric ARDS [52], which allows unilateral infiltrates and has a specific SpO 2 -based severity stratification. Cohorts defined using PALICC may differ from ours in important ways which limit generalizability. Our sample size was small and only collected at ARDS onset, limiting our ability to fully characterize the subtypes, assess their temporal stability, and detect associations with outcomes. Our small sample size and low mortality rate precluded adjustment for multiple potential confounders. We sampled the blood, which while accessible, may not best reflect the transcriptome most relevant for ARDS. Alveolar sampling is uncommon in pediatrics, and impractical for most clinical trial purposes. A future goal will be to reduce the number of transcripts required to discriminate between sub-phenotypes and operationalize a subtyping strategy. We did not include an external control population to assess upor down-regulation of pathways, relative to a non-ARDS cohort. Most importantly, our study lacks a validation cohort to assess the robustness of the CATS sub-phenotypes. This is the first transcriptomic study of pediatric ARDS, and validation cohorts with mRNA collection are lacking. Future studies of pediatric ARDS with transcriptomics are needed to assess for reproducibility of the CATS sub-phenotypes. Development of a reduced gene signature would simplify this process, and is the focus of current work. Future cohorts should have parallel efforts correlating transcriptomics with plasma biomarkers, as a protein biomarker-based signature would likely prove faster, cheaper, and less labor-intensive. Biomarkers could also delineate mechanisms underlying the subphenotypes, as well as facilitate comparisons with adult sub-phenotypes which have largely been defined using plasma proteins [11][12][13]. Re-analyses of adult ARDS trials have suggested differential treatment response based on subtype. To reproduce this in children, future trials in pediatric ARDS should collect both plasma for proteins and whole blood mRNA for transcriptomics and test treatment response by sub-phenotypes, as differences between adult and pediatric ARDS do not necessarily allow for translation of adult trial data to children.

Conclusions
We identified three sub-phenotypes of pediatric ARDS using whole blood transcriptomics. The subtypes had differing clinical characteristics and divergent prognoses. Further studies should validate these findings and investigate mechanisms underlying differences between sub-phenotypes. Our results are the first steps towards reducing heterogeneity and designing trials of targeted, precision therapies in pediatric ARDS.