Article Text


Gene expression profiling: good housekeeping and a clean message
  1. R C Chambers
  1. Correspondence to:
    Dr R C Chambers, Centre for Respiratory Research, University College London, London WC1E 6JJ, UK;

Statistics from

Microarray technology offers us the means of monitoring gene expression on a scale which was hard to envisage only a few years ago.

There is no doubt that gene expression studies based on evaluating mRNA levels for single or multiple genes of interest in human lung biopsy tissue have had a major impact on our understanding of the molecular mechanisms underlying respiratory disease. The recent advent of microarray technology has added further impetus to the central paradigm that mRNA quantification in lung tissue can shed light on pathogenesis and identify new targets for therapeutic intervention. This technology is now so advanced that it allows the parallel monitoring of entire genomes using microarrays with a surface area equivalent to just a few square centimetres and as little as 5 μg RNA starting material.

Since its first application in the mid 1990s,1 microarray technology has been applied to all aspects of biomedical research with over 60 papers in respiratory research alone. It has been successfully used for the classification and molecular diagnosis of lung cancer,2 the identification of potential target genes for therapeutic intervention in idiopathic pulmonary fibrosis,3 mechanistic studies in animal models of asthma4 and pulmonary fibrosis,5 and for profiling lung development.6 Global expression profiling of cellular responses in vitro has provided new insights into the transcriptional programs involved in cytokine signalling,7 growth arrest and apoptosis,8 and it is already enabling us to understand the operation of functional gene networks.


Although a number of microarray platforms have been developed, microarrays come in two basic formats. Complementary DNA (cDNA) arrays usually contain polymerase chain reaction (PCR) products generated from cDNA libraries or clone collections, spotted onto glass slides or nylon membranes. Expression values are based on the competitive hybridisation of two samples being directly compared following the incorporation of two fluorescent dyes (Cy3 and Cy5) on a single array. In contrast, oligonucleotide arrays (for example, Affymetrix GeneChips) contain relatively short sequences (20-mers) synthesised onto silicon wafers in situ by photolithography or arrayed as pre-synthesised oligonucleotides onto glass slides. The final target consists of biotin labelled cRNA and each sample is hybridised to a separate array. Hybridisation is detected by staining with a streptavidin-phyocoerythrin conjugate followed by confocal fluorescence laser scanning. The advantage of oligonucleotide arrays is that they contain multiple validated probe sequences for each gene and mismatch control sequences to allow correction of non-specific hybridisation signals. In contrast, cDNA arrays usually consist of user defined probe sequences but allow a much greater degree of flexibility and are generally cheaper as slide printing can be performed in house.


Managing and mining the huge amount of data generated by microarray experiments remains a major challenge for most users. Although this side of microarray analysis is still considered a major bottleneck, help is at hand via a plethora of online data mining, clustering, and analysis tools. In fact, most of the best tools are available to academic users as freeware upon request. A detailed description of these tools is beyond the scope of this editorial. However, Gene Express (, a new column edited by Naftali Kaminski and hosted by the ATS website, is a valuable resource aimed specifically at lung researchers and an excellent route to other sites of interest.

Despite its growing use in both academia and industry, microarray experiments are still considered by many as nothing more than a sophisticated fishing trip. This is because microarray analysis challenges the traditional hypothesis driven method of investigation and shifts the emphasis towards hypothesis generation. Investigators are then faced with what is probably the greatest challenge—namely, the extraction of biological meaning from microarray data and the prioritisation of candidate genes for follow up. Faced with hundreds of possibilities, it is not surprising that investigators have, in the past, tended to focus on genes they recognise and can integrate into a reasonable hypothesis regarding their likely role in the disease process. Fortunately, the need to address these limitations of microarray analysis is fuelling the rapid development of novel computational tools. This includes unbiased scoring methods for identifying the most meaningful and informative genes in microarray experiments. One such tool has recently been applied to great effect to funnel and prioritise candidate genes for follow up in expression studies of human lung biopsy material from patients with pulmonary fibrosis.3 Used in combination with computational tools which allow the visualisation of gene expression data on maps representing biological pathways (for example, GenMAPP at and programs based on artificial neural networks which can be trained to recognise signature expression profiles,9 these tools are likely to significantly accelerate our understanding of the molecular basis of disease.


Although microarray technology is improving rapidly and confidence in the data generated is growing, validation of microarray expression trends using a second readout remains a critical requirement. This is especially important if the sample size is too small to allow rigorous statistical analysis. For this purpose, the real time fluorescence based reverse transcriptase polymerase chain reaction (RT-PCR) is generally the method of choice. However, in this issue of Thorax, Glare et al10 revisit one of the most stubborn problems associated with all RT-PCR based methods—namely, the choice of a reference gene with which to normalise signals obtained to allow the legitimate comparison between samples and eliminate differences of non-biological origin. One of the most commonly used methods is to normalise against a housekeeping gene because its mRNA levels are thought to remain constant. Using competitive RT-PCR, Glare et al provide compelling evidence that mRNA levels of two of the most commonly used housekeeping genes in asthmatic airways—glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and β-actin—are, in fact, highly variable and therefore totally unsuitable for normalising the expression levels of potential genes of interest. This study is a welcome addition to a growing body of evidence that mRNA levels of a number of traditional housekeeping genes are not invariable under a variety of experimental and pathological conditions.11 The evidence is now so strong for samples obtained in vivo that their use should either be discontinued or can only be viewed as valid when appropriate experiments have been performed to confirm that their expression is indeed constant under the experimental conditions of the study.

So what are the alternatives for normalising gene expression data? There are no ideal solutions but, for conventional gene expression studies, the use of total cellular RNA has been proposed as one of the least unreliable methods for data normalisation.12 Although the use of total RNA levels for normalising expression data derived from patient material still has to be fully validated, recent technical advances for RNA quantitation—including the RiboGreen RNA quantification assay and the Agilent Bioanalyzer which allows RNA quantity and quality assessment in a single step—are likely to prove very useful for studies of human biopsy material with very low RNA yields. Another alternative is to use ribosomal RNA (rRNA) which makes up the bulk of total RNA. Despite reservations regarding changes in expression levels and potential imbalances in rRNA and mRNA fractions between different samples, 18S rRNA has recently been validated for normalising expression levels by quantitative RT-PCR analysis under a number of experimental conditions and is demonstrably more reliable than normalising to housekeeping genes.11,13 While considering the issue of normalisation, it is also worth pointing out that, regardless of the platform used, uncertainties relating to the use of housekeeping genes for signal normalisation are also relevant to microarray experiments. For oligonucleotide based arrays, the most commonly used approach is to scale or normalise the output data using a transcriptome equivalent strategy (or global normalisation) in order to derive an average intensity for each array with the assumption that the total sum of all transcripts present is similar between samples.

Finally, it is also worth remembering that gene expression studies measure mRNA levels and no more. Since most genes are also highly regulated at the post-transcriptional stage, changes in mRNA levels may not necessarily reflect changes at the protein level. In addition, interpreting expression studies in disease versus control tissue is often confounded by the very dramatic differences in cell populations present within the two types of tissue. Genes which appear to be highly differentially expressed may therefore reflect changes in the cellular composition of the tissue rather than changes in gene expression per se. Additional analysis by conventional immunohistochemistry and/or in situ hybridisation therefore becomes essential when analysing whole biopsy tissue. Similarly, important changes in gene expression may be masked because of dilution of the message. This may be particularly problematic when dealing with biopsy tissue where the disease is confined to a small number of cells within the sample. Recent advances in RNA amplification technology14 and laser capture microdissection (LCM) to sample individual cell populations within a biopsy sample are proving particularly useful for addressing these potential problems.2

In conclusion, we now have the means of monitoring gene expression on a scale which was hard to envisage only 5 years ago. The integration of this technology with rapidly evolving innovations in novel computational tools, public domain data repositories, in combination with the appropriate post-microarray validation experiments is likely to have a major impact on our understanding of complex human disease processes in the future.

Microarray technology offers us the means of monitoring gene expression on a scale which was hard to envisage only a few years ago.


View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.