Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer

J Pathol. 2013 Sep;231(1):63-76. doi: 10.1002/path.4212. Epub 2013 Jul 8.

Abstract

The recognition that colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behaviour and response to therapy translates into an urgent need for robust molecular disease subclassifiers that can explain this heterogeneity beyond current parameters (MSI, KRAS, BRAF). Attempts to fill this gap are emerging. The Cancer Genome Atlas (TGCA) reported two main CRC groups, based on the incidence and spectrum of mutated genes, and another paper reported an EMT expression signature defined subgroup. We performed a prior free analysis of CRC heterogeneity on 1113 CRC gene expression profiles and confronted our findings to established molecular determinants and clinical, histopathological and survival data. Unsupervised clustering based on gene modules allowed us to distinguish at least five different gene expression CRC subtypes, which we call surface crypt-like, lower crypt-like, CIMP-H-like, mesenchymal and mixed. A gene set enrichment analysis combined with literature search of gene module members identified distinct biological motifs in different subtypes. The subtypes, which were not derived based on outcome, nonetheless showed differences in prognosis. Known gene copy number variations and mutations in key cancer-associated genes differed between subtypes, but the subtypes provided molecular information beyond that contained in these variables. Morphological features significantly differed between subtypes. The objective existence of the subtypes and their clinical and molecular characteristics were validated in an independent set of 720 CRC expression profiles. Our subtypes provide a novel perspective on the heterogeneity of CRC. The proposed subtypes should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity. Original microarray data were uploaded to the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) under Accession Nos E-MTAB-990 and E-MTAB-1026.

Keywords: colorectal cancer; gene expression; histopathology; molecular heterogeneity.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adenocarcinoma / genetics*
  • Adenocarcinoma / mortality
  • Adenocarcinoma / pathology
  • Colorectal Neoplasms / genetics*
  • Colorectal Neoplasms / mortality
  • Colorectal Neoplasms / pathology
  • Female
  • Gene Dosage
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic / physiology*
  • Genetic Heterogeneity*
  • Humans
  • Kaplan-Meier Estimate
  • Loss of Heterozygosity
  • Male
  • Mutation
  • Neoplasm Proteins / genetics
  • Oligonucleotide Array Sequence Analysis
  • Prognosis

Substances

  • Neoplasm Proteins