Sequencing of the human genome and introduction of clinical next-generation sequencing enable discovery of all DNA variants carried by an individual. Variants may be solely responsible for disease, may contribute to disease, or may have no influence on the development of disease. Interpreting the effect of these variants upon disease is a major challenge for medicine. Although the process is still evolving, certain methods are useful in discriminating the effect of variants upon phenotype. These methods have been employed to the greatest extent in Mendelian disorders where deleterious changes in one gene can cause disease. Here, we briefly review the relative merits of these methods, with emphasis on using a comprehensive approach modelled after the analysis of variants that causes cystic fibrosis.
- Mutation Analysis
- Genetic Variation
- Population Genetics
Statistics from Altmetric.com
Greater understanding of the influence of variation in our genome upon health and disease will help usher in the era of individualised medicine. The aetiology of common diseases is complex, in that multiple genetic and environmental risk factors combine to cause a distinct phenotype. Genome-wide association studies have identified DNA variants in numerous locations that confer risk for common pulmonary disorders, such asthma and COPD. The mechanism by which variants cause common diseases is generally unknown. However, rare families manifesting a common disease inherited in a Mendelian fashion have facilitated the identification of genes bearing variants of high functional impact. Examples include variants in BMPR2 that cause pulmonary arterial hypertension and variants in the promoter of MUC5B in patients with pulmonary fibrosis.1 ,2 Thus, Mendelian, or so-called ‘single gene’, disorders provide an unparalleled opportunity to find genes that have been substantially modified by a variant in their DNA sequence. Unfortunately, the genome contains many variants that occur in or near genes, and only some of which change gene function sufficiently to cause disease. Some variants alter function in a manner that produces mild or incomplete forms of disease, while other variants cause no discernable change in phenotype. Furthermore, a variant may be an innocent hitchhiker with a pathogenic variant elsewhere in the same gene, or it may combine with other variants in the same gene to cause disease. Thus, assessment of the disease liability of a variant requires an understanding of its effect upon gene, cellular and organ function, and the genetic context in which it occurs. Elucidating the pathologic potential of variants and their relative contribution to phenotype will provide critical insight into disease mechanisms and opportunities for intervention.
What approaches can be used to interpret the consequences of genetic variants? If the actual frequency that a variant occurs in those affected and unaffected by the disease were known, one could calculate the likelihood that a variant causes disease when present (ie, penetrance). Unfortunately, the frequency of variants across broad populations is not known, therefore, other lines of evidence must be employed to determine the extent that a variant causes disease. The usability of each method varies across diseases with different modes of inheritance, but each approach has a degree of usefulness for all single gene disorders. Each of these methods has potential shortcomings, so a strategy combining multiple modalities is proposed (figure 1), as recently demonstrated in a study of the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) gene.3
Segregation analysis of a family pedigree with individuals in multiple generations that have genotype and phenotype information can be used to establish if variants segregate with disease within a family; that is, variants should be found in affected family members, but not in unaffected family members. The expected segregation of a variant that causes disease varies for different inheritance patterns. Variants found in an affected individual that occurred de novo in a gene previously associated with the disease (spontaneous changes that are not present in either unaffected parent) are highly likely to be disease-causing. Spontaneous mutations are most often observed as causes of disease with dominant, codominant, or X-linked inheritance patterns. Pedigree analysis indicating a lack of segregation with disease (such as identifying a variant in some, but not all, affected family members) is strong evidence that a given variant is not disease-causing. If the analysis of segregation in a pedigree is consistent with the known mode of inheritance it is supportive, but not conclusive, that the variant causes the disease. In this case, or in the case that an informative pedigree is not available, further evaluation of the mutation is required.
Clinical presentation can be quantified and standardised to establish the parameters of the genotype-phenotype relationship. Specific phenotype information is needed to ensure all individuals presumed to have the disease meet uniform diagnostic criteria. Ideal measures, regardless of whether dichotomous (eg, a positive or negative methacholine challenge), or continuous (pulmonary artery pressure), are traits that are known to be highly influenced by the gene under study. For example, sweat chloride concentration quantifies the effect of CFTR variants to a greater degree than lung function, because sweat chloride has a greater correlation with CFTR function than lung function.3
Functional assessment establishes the biologic plausibility for how a change in gene function results in disease. Variants can disrupt gene function by altering efficiency of transcription, RNA splicing, protein processing or function. Loss of function is generally observed in recessive disorders or for dominant disorders caused by the loss of one of two working copies of the gene (ie, haploinsufficiency). In loss of function disorders, there is consensus that almost all variants resulting in a premature termination codon (eg, nonsense, frame shift) are highly likely to be pathogenic.4 Variants that cause the substitution of an amino acid require experimental determination that function is lost. The same method of experimental determination can be used for dominant disorders caused by gain of function effect of the protein product. In these assessments, a threshold that determines the level of function necessary for disease must be established through testing of previously well-characterised variants or extrapolated from other research (eg, animal models). Because functional studies require time and resources, considerable effort has been expended in the development of bioinformatic predictors that use protein structure information and/or common ancestral sequences to make an ‘in silico’ prediction of the effect of a variation.5 Computational methods have been widely employed experimentally, but are not yet a substitute for direct experimental evaluation.
A penetrance analysis can be performed to assess whether a variant does not cause disease. Essentially, one searches for the presence of a variant in a ‘healthy’ gene. For dominant disorders, genes in individuals confirmed to not have disease from the general population can be studied. For recessive disorders, two deleterious variants need to be present to cause disease. Thus, obligate heterozygotes who carry a deleterious gene and a ‘healthy’ gene that was not passed to the offspring (such as the confirmed unaffected parents of affected offspring) can be informative, as any variant seen in the ‘healthy’ gene can be presumed to not cause disease. Highly rare variants pose a major challenge for penetrance assessment, as very large numbers of unaffected individuals are needed for analysis.
For all modes of inheritance, variants that meet segregation, clinical and functional criteria can be considered disease-causing. All diseases, even those associated with Mendelian inheritance, are comprised of traits that are subject to modification from other genes and from the environment. Therefore, it is not unexpected that some variants not meeting criteria may be neither pathogenic nor neutral, but capable of causing disease under certain circumstances (ie, variably penetrant), or capable of causing a partial form of the disease. Alternatively, some variants will be indeterminate due to absence of, or inconclusive, evidence. As an example of the usability, but also the challenges of widespread use of the interpretation of variants, a recent publication in Thorax that examined individuals with single organ system manifestations of cystic fibrosis (CF) within the spectrum of CFTR dysfunction.6 As expected, only a minor fraction of individuals with incomplete CF carried two CF-causing variants. Although it may be desirable to have discrete categories of pathogenic versus neutral variants, for some disorders such as atypical forms of CF, clear demarcation between the two groups may not exist.
As genetic analysis becomes less expensive and more efficient, there will be greater opportunity to examine the genetic contributions to disease. As noted, determining the consequence of individual variants is not straightforward, even for Mendelian disorders such as CF.3 Multiple modalities are needed to examine the effects of any given variant, but this divining is a necessary step to widespread use of genetic information in diagnosis and prognosis; as well as to implementing therapeutics based on DNA variation.
The authors would like to thank Karen Siklosi Raraigh for her review of the manuscript.
Contributors PRS and GRC both wrote the article.
Competing interests None.
Provenance and peer review Commissioned; internally peer reviewed.