Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Guidelines
  • Published:

Expression profiling — best practices for data generation and interpretation in clinical trials

Abstract

Microarrays are routinely used to assess mRNA transcript levels on a genome-wide scale. As use and acceptance increases, there is intensified focus on appropriate methods of data generation and interpretation, with important questions being asked about the best data analysis methods. The development of such 'best practices' is needed, as microarrays — in particular, Affymetrix oligonucleotide arrays — are becoming increasingly important in human clinical trials, both for differential diagnosis and monitoring of pharmacological efficacy. Here, representatives from high-volume microarray core centres consider the current status of 'best practices', focusing on the broadly used Affymetrix oligonucleotide arrays.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Sample processing and microarray interpretation of Affymetrix GeneChips.
Figure 2: Dense time-series data with adequate replicates can provide robust visual interpretation of data.

Similar content being viewed by others

References

  1. Brazma, A. et al. Minimum information about a microarray experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001).

    Article  CAS  Google Scholar 

  2. Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046.1-0046.9 (2002).

    Article  Google Scholar 

  3. Brazma, A. et al. ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71 (2003).

    Article  CAS  Google Scholar 

  4. Zhao, P., Iezzi, S., Sartorelli, V., Dressman, D. & Hoffman, E. P. Slug is downstream of myoD: identification of novel pathway members via temporal expression profiling. J. Biol. Chem. 277, 20091–20101 (2002).

    Google Scholar 

  5. Di Giovanni, S. et al. Gene profiling in spinal cord injury shows role of cell cycle in neuronal death. Ann. Neurol. 53, 454–468 (2003).

    Article  CAS  Google Scholar 

  6. Jin, J. Y., Almon, R. R., DuBois, D. C. & Jusko, W. J. Modeling of corticosteroid pharmacogenomics in rat liver using gene microarrays. J. Pharmacol. Exp. Ther. 307, 93–109 (2003).

    Article  CAS  Google Scholar 

  7. Bakay, M. et al. Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformat. 3, 4–15 (2002).

    Article  Google Scholar 

  8. DePrimo, S. E. et al. Expression profiling of blood samples from an SU5416 Phase III metastatic colorectal cancer clinical trial: a novel strategy for biomarker identification. BMC Cancer 3, 3 (2003).

    Article  Google Scholar 

  9. de Vos, S. et al. Gene expression profile of serial samples of transformed B-cell lymphomas. Lab. Invest. 83, 271–285 (2003).

    Article  CAS  Google Scholar 

  10. Hittel, D. S., Kraus, W. E. & Hoffman, E. P. Skeletal muscle dictates the fibrinolytic state after exercise training in overweight men with characteristics of metabolic syndrome. J. Physiol. 548, 401–410 (2003).

    Article  CAS  Google Scholar 

  11. Zambon, A. C. et al. Time- and exercise-dependent gene regulation in human skeletal muscle. Genome Biol. 4, R61 (2003).

    Article  Google Scholar 

  12. Bakay, M. et al. Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformat. 3, 4–15 (2002).

    Article  Google Scholar 

  13. Cardozo, A. K. et al. Gene microarray study corroborates proteomic findings in rodent islet cells. J. Proteome Res. 2, 553–555 (2003).

    Article  CAS  Google Scholar 

  14. Chun, T. W. et al. Gene expression and viral prodution in latently infected, resting CD4+T cells in viremic versus aviremic HIV-infected individuals. Proc. Natl Acad. Sci. USA 100, 1908–1913 (2003).

    Article  CAS  Google Scholar 

  15. Kamme, F. et al. Single-cell microarray analysis in hippocampus CA1: demonstration and validation of cellular heterogeneity. J. Neurosci. 23, 3607–3615 (2003).

    Article  CAS  Google Scholar 

  16. Huang, J. et al. Effects of ischemia on gene expression. J. Surg. Res. 99, 222–227 (2001).

    Article  CAS  Google Scholar 

  17. Seo, J. et al. Interactive color mosaic and dendrogram displays for signal/noise optimization in microarray data analysis. IEEE ICME 3, 461–462 (2003).

    Google Scholar 

  18. Somorjai, R. L., Dolenko, B. & Baumgartner, R. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19, 1484–1491 (2003).

    Article  CAS  Google Scholar 

  19. Mei, R. et al. Probe selection for high-density oligonucleotide arrays. Proc. Natl Acad. Sci. USA 100, 11237–11242 (2003).

    Article  CAS  Google Scholar 

  20. Li, C. & Hung Wong, W. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2, RESEARCH0032 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Irizarry, R. A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003).

    Article  Google Scholar 

  22. Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

    Article  CAS  Google Scholar 

  23. Ambroise, C. & McLachlan, G. J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA 99, 6562–6566 (2002).

    Article  CAS  Google Scholar 

  24. West, M. et al. Predicting the clinical status of human breast cancer utilizing gene expression profiles. Proc. Natl Acad. Sci. USA 98, 11462–11467 (2001).

    Article  CAS  Google Scholar 

  25. Tusher, V., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116–5124 (2001).

    Article  CAS  Google Scholar 

  26. Tibshirani, R., Hastie, T., Narasimhan, B. & Chu, G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci. USA 99, 6567–6572 (2002).

    Article  CAS  Google Scholar 

  27. Huang, E. et al. Gene expression phenotypic models that predict the activity of oncogenic pathways. Nature Genet. 34, 226–230 (2003).

    Article  CAS  Google Scholar 

  28. Black, E. P. et al. Distinct gene expression phenotypes of cells lacking Rb and Rb family members. Cancer Res. 63, 3716–3723 (2003).

    CAS  PubMed  Google Scholar 

  29. Huang, E. et al. Gene expression predictors of breast cancer outcomes. Lancet 361, 1590–1596 (2003).

    Article  CAS  Google Scholar 

  30. Alter, O., Brown, P. O. & Botstein, D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA 97, 10101–10106 (2000).

    Article  CAS  Google Scholar 

  31. Chen, J. et al. The PEPR GeneChip data warehouse and implementation of a dynamic time series query tool (SGQT) with graphical interface. Nucleic Acids Res. 32, D578–D581 (2004).

    Article  CAS  Google Scholar 

  32. Almon et al. In vivo multitissue corticosteroid microarray time series available online at Public Expression Profile Resource (PEPR). Pharmacogenomics 4, 791–799 (2003).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank their respective funding agencies, particularly the larger collaborative funding initiatives that make systematic and large-scale studies of the bioinformatics and biostatistics of genome-wide data sets possible from the Department of Defense and the Doris Duke Charitable Foundation CSDA. The authors also thank S. Hilmer, A. DeBiase and G. Miyada for their critique of the manuscript.

Author information

Consortia

Additional information

Corresponding author: ehoffman@cnmcresearch.org

Eric P. Hoffman is at the Research Center for Genetic Medicine, Children's National Medical Center, Washington DC 20010, USA. email: ehoffman@cnmcresearch.org Tarif Awad, John Palma, Teresa Webster, Earl Hubbell and Janet A. Warrington are at Affymetrix, Santa Clara, California 95051, USA. emails: tarif_awad@affymetrix.com; john_palma@affymetrix.com; teresa_webster@affymetrix.com; earl_hubbell@affymetrix.com; janet_warrington@affymetrix.com Avrum Spira is at The Pulmonary Center, Boston University Medical Center and the Bioinformatics Program, Boston University, Boston, Massachusetts 02118, USA. e-mail: aspira@lung.bumc.bu.edu George Wright is at the Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institute of Health, Bethesda, Maryland 20892, USA. e-mail: wrightge@mail.nih.gov Jonathan Buckley and Tim Triche are at the Children's Hospital, University of California, Los Angeles, California 90089, USA. e-mail: buckley@hsc.usc.edu; triche@hsc.usc.edu Ron Davis, Robert Tibshirani and Wenzhong Xiao are at Stanford University, Palo Alto, California 94303, USA. e-mails: dbowe@stanford.edu; tibs@stat.stanford.edu; wzxiao@pmgm2.stanford.edu Wendell Jones is at Expression Analysis Inc., Durham, North Carolina 27713, USA. e-mail: wjones@expressionanalysis.com Ron Tompkins is at Harvard University, Boston, Massachusetts 02115, USA. e-mail: rtompkins@partners.org Mike West is at the Institute of Statistics and Decision Sciences, Duke University, Durham, North Carolina 27708, USA. e-mail: mw@stat.duke.edu

Related links

Related links

DATABASES

LocusLink

GAPDH

got1

FURTHER INFORMATION

Affymetrix Developers' Network

Affymetrix Technical Note

Agilent Technologies

ArrayExpress microarray database

The Children's National Medical Center Microarray Center

HOPGENE Program for Genomic Applications

MGED Data Transformation and Normalization Working Group

MGED Society

NETAFFX web site

NIGMS Glue Grant

Glossary

A-, B- AND C-SERIES ARRAYS

A series of human, rat and mouse Affymetrix arrays released in 2003, in which the A array contained the best-characterized genes, and B and C arrays contained less well-defined expressed sequence tags. In 2004, all probe sets have been condensed so that there is only one microarray per species that covers the entire genome.

CROSS-SECTIONAL DESIGN

The use of different subjects in an experimental and control group or groups. The statistical analysis compares the median and variation within each group relative to the other groups.

FEATURE

Typically one element (spot) on a microarray. In spotted cDNA or oligonucleotide arrays, features correspond to genes or transcripts; in Affymetrix arrays, there are typically 22 elements per probe set and often multiple probe sets per gene, so a feature might refer to a single oligonucleotide, a probe pair or a probe set, or a gene with multiple probe sets. In bioinformatics it is most often synonymous with a gene.

FLUORESCENCE-ACTIVATED CELL SORTING

(FACS). A method whereby dissociated and individual living cells are sorted, in a liquid stream, according to the intensity of fluorescence that they emit as they pass through a laser beam.

FLUOROPHORE

A small molecule, or a part of a larger molecule, that can be excited by light to emit fluorescence.

ISCHAEMIA

The loss of blood supply, and hence oxygenation, to a tissue or cells.

LASER CAPTURE MICRODISSECTION

A technique in which individual cells, or regions of tissue, are excised from a histological preparation, using specially equipped microscopes, and isolated for further study.

LONGITUDINAL DESIGN

The use of multiple samples from the same subject. With this design, each subject serves as their own control, eliminating confounding inter-individual variations at baseline; paired t-tests are used to interpret the data.

NEGATIVE CELL ISOLATION

The use of antibodies or other reagents to remove all unwanted cells from a mixed population of cells. In this method, the desired cells are not exposed to bound antibodies, thereby avoiding potential activation or other molecular alteration in the desired cells.

PENALTY WEIGHT

In Affymetrix arrays, hybridization to the 'mismatch' probe of a probe pair might or might not be considered as a form of measurement of noise or background, and can be factored into the signal seen with the paired 'perfect match' as a penalty weight.

PHOTOLITHOGRAPHY

The process of using light to either etch or activate regions of a surface (substrate). This method is used in microelectronics to create integrated circuits and processors.

REAL-TIME PCR

The quantification of the amount of PCR product during each cycle of a PCR reaction. The product concentration, as a function of cycle number, provides a good estimation of the relative quantity of the mRNA being tested.

RESECTION

Surgical removal of tissue, most commonly used for removing tumorous masses from surrounding tissue.

S1 NUCLEASE PROTECTION

An experimental method for determining mRNA transcript concentration in a tissue or cell RNA sample. It involves using labelled DNA probes that bind the RNA, with overhanging non-hybridized tails of the probe then being digested by the S1 nuclease. This creates a smaller labelled DNA probe that is indicative of the abundance of the mRNA being tested.

SURVIVAL DATA ANALYSIS

A battery of statistical methods applied to data when mortality is often the only, or best, measured outcome.

TIME-SERIES STUDY

The use of a series of samples taken at defined time points after a defined stimulus. In mice and rats, the samples at different time points are usually from different animals. In humans, time-series studies are necessarily longitudinal to avoid additional confounding noise.

TUKEY'S BI-WEIGHT ESTIMATOR

Many statistical tests require underlying definitions that are assumed to be valid (for example, tumour versus non-tumour), and require data that show a normal distribution. Microarray data, and the clinical information underlying the definition of samples, is often less exact, with genes or samples often performing as statistical outliers. Tukey's bi-weight estimator is one of the M-class of statistical models that is less sensitive to outliers and performs more gracefully when underlying assumptions are inexact.

WILCOXON'S SIGNED RANK

A statistical test that investigates the population median of paired differences. It is well suited for microarray work as it treats each gene as an independent variable and does not require normal distributions of the data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

The Tumor Analysis Best Practices Working Group. Expression profiling — best practices for data generation and interpretation in clinical trials. Nat Rev Genet 5, 229–237 (2004). https://doi.org/10.1038/nrg1297

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1297

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing