Skip to main content
Log in

Chemometrics models for overcoming high between subject variability: applications in clinical metabolic profiling studies

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

In human metabolic profiling studies, between-subject variability is often the dominant feature and can mask the potential classifications of clinical interest. Conventional models such as principal component analysis (PCA) are usually not effective in such situations and it is therefore highly desirable to find a suitable model which is able to discover the underlying pattern hidden behind the high between-subject variability. In this study we employed two clinical metabolomics data sets as the testing grounds, in which such variability had been observed, and we demonstrate that a proper choice of chemometrics model can help to overcome this issue of high between-subject variability. Two data sets were used to represent two different types of experiment designs. The first data set was obtained from a small-scale study investigating volatile organic compounds (VOCs) collected from chronic wounds using a skin patch device and analysed by thermal desorption-gas chromatography-mass spectrometry. Five patients were recruited and for each patient three sites sampled in triplicate: healthy skin, boundary of the lesion and top of the lesion, the aim was to discriminate these three types of samples based on their VOC profile. The second data set was from a much larger study involving 35 healthy subjects, 47 patients with chronic obstructive pulmonary disease and 33 with asthma. The VOCs in the breath of each subject were collected using a mask device and analysed again by GC–MS with the aim of discriminating the three types of subjects based on breath VOC profiles. Multilevel simultaneous component analysis, multilevel partial least squares for discriminant analysis, ANOVA-PCA, and a novel simplified ANOVA-PCA model—which we have named ANOVA-Mean Centre (ANOVA-MC)—were applied on these two data sets. Significantly improved results were obtained by using these models. We also present a novel validation procedure to verify statistically the results obtained from those models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Assfalg, Michael, Bertini, I., Colangiuli, D., Luchinat, C., Schäfer, H., Schütz, B., et al. (2008). Evidence of different metabolic phenotypes in humans. Proceedings of the National Academy of Sciences of the United States of America, 105, 1420–1424.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Basanta, M., Ibrahim, B., Dockry, R., Tal-Singer, R., Douce, D., Woodcock, A., et al. (2012). Exhaled volatile organic compounds as potential biomarkers in chronic obstructive pulmonary disease. Respiration Research, 13, 72.

    Article  CAS  Google Scholar 

  • Biais, B., Allwood, J. W., Deborde, C., Xu, Y., Maucourt, M., Beauvoit, B., et al. (2009). 1H NMR, GC–EI-TOFMS, and data set correlation for fruit metabolomics: application to spatial metabolite analysis in melon. Analytical Chemistry, 81, 2884–2894.

    Article  CAS  PubMed  Google Scholar 

  • Blatt, M., Wiseman, S., & Domany, E. (1996). Superparamagnetic clustering of data. Physical Review Letters, 76, 3251–3254.

    Article  CAS  PubMed  Google Scholar 

  • Brereton, R. G. (2003). Chemometrics: Data analysis for the laboratory and chemical plant. Chichester: Wiley.

    Book  Google Scholar 

  • Cheung, W., Xu, Y., Thomas, C. L. P., & Goodacre, R. (2008). Discrimination of bacteria using pyrolysis-gas chromatography-differential mobility spectrometry (Py-GC-DMS) and chemometrics. Analyst, 134, 557–563.

    Article  PubMed  Google Scholar 

  • de Noord, O. E., & Theobald, E. H. (2005). Multilevel component analysis and multilevel PLS of chemical process data. Journal of Chemometrics, 19, 301–307.

    Article  Google Scholar 

  • Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

    Book  Google Scholar 

  • Fens, N., Zwinderman, A. H., van der Schee, M. P. C., de Nijs, S. B., Dijkers, E., Roldaan, A. C., et al. (2009). Exhaled breath profiling enables discrimination of chronic obstructive pulmonary disease and asthma. American Journal of Respiratory and Critical Care Medicine, 180, 1076–1082.

    Article  CAS  PubMed  Google Scholar 

  • Ferreira, D. L. S., Kittiwachana, S., Fido, L. A., Thompson, D. R., Escott, R. E. A., & Brereton, R. G. (2009). Multilevel simultaneous component analysis for fault detection in multicampaign process monitoring: Application to on-line high performance liquid chromatography of a continuous process. Analyst, 137, 1571–1585.

    Article  Google Scholar 

  • Harrington, P. B., Vieira, N. E., Espinoza, J., Nien, J. K., Romero, R., & Yergey, A. L. (2005). Analysis of variance-principal component analysis: A soft tool for proteomic discovery. Analytica Chimica Acta, 544, 118–127.

    Article  CAS  Google Scholar 

  • Hartigan, J. A., & Wong, M. A. (1979). A K-means Clustering Algorithm. Journal of the Royal Statistical Society Series C (Applied Statistics), 28, 100–108.

    Google Scholar 

  • Ibrahim, B., Basanta, M., Cadden, P., Singh, D., Douce, D., Woodcock, A., et al. (2011). Non-invasive phenotyping using exhaled volatile organic compounds in asthma. Thorax, 66, 804–809.

    Article  PubMed  Google Scholar 

  • Jansen, J. J., Hoefsloot, H. C. J., Greef, J., Timmerman, M. E., & Smilde, A. K. (2005a). Multilevel component analysis of time-resolved metabolomics data. Analytica Chimica Acta, 530, 173–183.

    Article  CAS  Google Scholar 

  • Jansen, J. J., Hoefsloot, H. C. J., Greef, J., Timmerman, M. E., Westerhuis, J. A., & Smilde, A. K. (2005b). ASCA: Analysis of multivariate data obtained from an experimental design. Journal of Chemometrics, 19, 469–481.

    Article  CAS  Google Scholar 

  • Kohonen, Teuvo. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69.

    Article  Google Scholar 

  • MacNee, W. (2009). Accelerated lung aging: a novel pathogenic mechanism of chronic obstructive pulmonary disease (COPD). Biochemical Society Transactions, 37, 819–823.

    Article  CAS  PubMed  Google Scholar 

  • Penn, D. J., Oberzaucher, E., Grammer, K., Fischer, G., Soini, H. A., Wiesler, D., et al. (2007). Individual and gender fingerprints in human body odour. Journal of the Royal Society, Interface, 4, 331–340.

    Article  PubMed  Google Scholar 

  • Smilde, A. K., Jansen, J. J., Hoefsloot, H. C. J., Lamers, R-Jan, van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics, 21, 3043–3048.

    Article  CAS  PubMed  Google Scholar 

  • Smilde, A. K., Timmerman, M. E., Hendriks, M. M. W. B., Jansen, J. J., & Hoefsloot, H. C. J. (2012). Generic framework for high-dimensional fixed-effects ANOVA. Briefings in Bioinformatics, 13, 524–535.

    Article  PubMed  Google Scholar 

  • Thomas, A. N., Riazanskaia, S., Cheung, W., Xu, Y., Goodacre, R., Thomas, C. L. P., et al. (2010). Novel noninvasive identification of biomarkers by analytical profiling of chronic wounds using volatile organic compounds. Wound Repair and Regeneration, 18, 391–400.

    Article  PubMed  Google Scholar 

  • Timmerman, M. E. (2006). Multilevel component analysis. British Journal of Mathematical and Statistical Psychology, 59, 301–320.

    Article  PubMed  Google Scholar 

  • van Velzen, E. J. J., Westerhuis, J. A., van Duynhoven, J. P. M., et al. (2008). Multilevel data analysis of a crossover designed human nutritional intervention study. Journal of Proteome Research, 7, 4483–4491.

    Article  PubMed  Google Scholar 

  • Vis, D. J., Westerhuis, J. A., Smilde, A. K., & van der Greef, J. (2007). Statistical validation of megavariate effects in ASCA. BMC Bioinformatics, 8, 322–330.

    Article  PubMed  PubMed Central  Google Scholar 

  • Westerhuis, J. A., van Velzen, E. J. J., Hoefsloot, C. J., & Smilde, A. K. (2010). Multivariate paired data analysis: Multilevel PLSDA versus OPLSDA. Metabolomics, 6, 119–128.

    Article  CAS  PubMed  Google Scholar 

  • Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58, 109–130.

    Article  CAS  Google Scholar 

  • Xu, Y., Cheung, W., Winder, C. L., & Goodacre, R. (2010). VOC-based metabolic profiling for food spoilage detection with the application to detecting Salmonella typhimurium-contaminated pork. Analytical and Bioanalytical Chemistry, 397, 2439–2449.

    Article  CAS  PubMed  Google Scholar 

  • Zwanenburg, G., Hoefsloot, H. C. J., Westerhuis, J. A., Jansen, J. J., & Smilde, A. K. (2011). ANOVA-principal component analysis and ANOVA-simultaneous component analysis: A comparison. Journal of Chemometrics, 25, 561–567.

    Article  CAS  Google Scholar 

Download references

Acknowledgments

We thank Dr. Maria Basanta and Dr. Baharudin Ibrahim for providing the breath VOCs data; Dr. Alexi Thomas, Dr. Svetlana Riazanskaia and Dr. William Cheung for providing the skin VOCs data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 36 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Y., Fowler, S.J., Bayat, A. et al. Chemometrics models for overcoming high between subject variability: applications in clinical metabolic profiling studies. Metabolomics 10, 375–385 (2014). https://doi.org/10.1007/s11306-013-0616-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11306-013-0616-8

Keywords

Navigation