Introduction The use of unsupervised clustering has identified different subtypes of asthma. Choosing the variables to input into the clustering algorithm is one of the important considerations. The majority of previous studies selected variables based on expert advice, whilst others used dimension reduction techniques such as principal component analysis (PCA). We aimed to compare the results of unsupervised clustering when using raw variables, or variables transformed using dimensionality reduction techniques.
Methods We performed our analysis on 613 asthmatics aged 6–23 years from Ankara, Turkey. We conducted extensive phenotyping and recorded 49 variables including demographic data, sensitisation, lung function, medication, peripheral eosinophilia, and markers of asthma severity. We performed hierarchical clustering (HC) using: (1) all variables; and (2) variables transformed using dimensionality reduction techniques.
Results PCA revealed 5 components describing atopy and variations in asthma severity, which were then used to infer cluster assignment. The optimal HC solution in both PCA-transformed and raw untransformed data identified five clusters. However, these clusters were not identical. Both identified mild asthma with good lung function, severe atopic asthma and late-onset mild atopic asthma. However, the overlap between children assigned to these three clusters in two HC analyses was modest. Clustering without PCA identified early-onset severe atopic asthma and late-onset atopic asthma with high BMI, whilst early onset non-atopic mild asthma in females was identified in HC with PCA. Using both methods, we identified four features that characterised the clusters. These were age of onset, atopy, asthma attacks, and asthma severity. Using only these four features, we identified early onset atopic mild asthma, early onset non-atopic mild asthma, severe asthma, late onset asthma, and exacerbation prone asthma. Cluster stability increased drastically.
Conclusion Different methodologies applied to the same dataset identified differing clusters of asthma. We identified four features that characterised the clusters. We propose that these four features could be more useful in identifying asthma endotypes.