Both common and rare variants contribute to the genetic architecture of pulmonary fibrosis. Genome-wide association studies have identified common variants, or those with a minor allele frequency of >5%, that are linked to pulmonary fibrosis. The most widely replicated variant (rs35705950) is located in the promoter region of the MUC5B gene and has been strongly associated with idiopathic pulmonary fibrosis (IPF) and familial interstitial pneumonia (FIP) across multiple different cohorts. However, many more common variants have been identified with disease risk and in aggregate account for approximately one-third of the risk of IPF. Moreover, several of these common variants appear to have prognostic potential. Next generation sequencing technologies have facilitated the identification of rare variants. Recent whole exome sequencing studies have linked pathogenic rare variants in multiple new genes to FIP. Compared with common variants, rare variants have lower population allele frequencies and higher effect sizes. Pulmonary fibrosis rare variants genes can be subdivided into two pathways: telomere maintenance and surfactant metabolism. Heterozygous rare variants in telomere-related genes co-segregate with adult-onset pulmonary fibrosis with incomplete penetrance, lead to reduced protein function, and are associated with short telomere lengths. Despite poor genotype-phenotype correlations, lung fibrosis associated with pathogenic rare variants in different telomere genes is progressive and displays similar survival characteristics. In contrast, many of the heterozygous rare variants in the surfactant genes predict a gain of toxic function from protein misfolding and increased endoplasmic reticulum (ER) stress. Evidence of both telomere shortening and increased ER stress have been found in sporadic IPF patients, suggesting that the mechanisms identified from rare variant genetic studies in unique individuals and families are applicable to a wider spectrum of patients. The ability to sequence large cohorts of individuals rapidly has the potential to further our understanding of the relative contributions of common and rare variants in the pathogenesis of pulmonary fibrosis. The UK 100,000 Genomes Project will provide opportunities to interrogate both common and rare variants and to investigate how these biological signals provide diagnostic and prognostic information in the era of stratified medicine.
- Idiopathic pulmonary fibrosis
- Interstitial Fibrosis
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Interstitial lung diseases (ILDs) are a diverse set of diseases affecting the tissue around the alveoli and can be caused by environmental exposures, drug reactions and connective tissue diseases, and are subcategorised based on their specific radiographic, histopathological and clinical characteristics. A subset of ILDs is characterised by both interstitial abnormalities and irreversible pulmonary fibrosis. Fibrosing ILDs not linked to underlying systemic disorders or identifiable environmental insults are known as fibrosing idiopathic interstitial pneumonias (IIPs), the most common of which is idiopathic pulmonary fibrosis (IPF). IPF is characterised by relentless scarring of the lungs and poor prognosis, with a median survival time from diagnosis of 3 years,1 ,2 few medical therapies and no cure outside of lung transplantation.3 ,4 It has long been observed that pulmonary fibrosis at times appears to cluster in families, in which case the disease is called familial interstitial pneumonia (FIP). Recent studies have deepened our appreciation of the role of inherited risk in the development of IPF, FIP and IIP, leading to new hypotheses regarding disease pathogenesis.
Genetic variants, common and rare, contribute to the genetic architecture of pulmonary fibrosis. The advancement of next generation sequencing technologies and genome-wide studies have facilitated the discovery and study of genetic variants and disease risk. The ability to sequence large numbers of individuals has the potential to advance our understanding of the relative contributions of common and rare variants in the pathogenesis of pulmonary fibrosis. Here we review the current understanding of genetic risk and pulmonary fibrosis and its clinical implications in the era of stratified medicine.
Common variants in fibrosing IIP
Although numerous rare variants have been associated with risk of FIP, there is evidence that common variants (defined as minor allele frequency (MAF) >5%) also play a role in FIP risk (figure 1). Notably, the common variants associated with FIP risk are the same as those associated with sporadic IPF risk and, in aggregate, account for one-third of the risk of developing familial or sporadic IIP. The concordance of common variants strongly suggests a shared genetic aetiology for both sporadic and familial pulmonary fibrosis. However, common variants are themselves much more common than pulmonary fibrosis in the population, and therefore, while these common variants provide important insights and clues into the role of genetic risk and familial and sporadic IPF, there remains much to understand about gene by gene and gene by environment interactions that lead to incomplete penetrance.
MUC5B promoter polymorphism
In 2011, Seibold et al used genome-wide linkage analysis followed by targeted genetic sequencing in FIP and sporadic IPF cases to conclude that a single nucleotide polymorphism (SNP) rs35705950 (G>T transversion) on the p-terminus of chromosome 11 within the promoter region of the MUC5B gene, a highly conserved, regulatory genetic region, is associated with both FIP and IPF. The rs35705950 variant is also associated with significantly higher MUC5B expression in healthy lung tissue (37.4-fold increased for those with the T allele), and MUC5B gene expression was more than 14-fold higher in fibrotic lung tissue than in unaffected controls irrespective of genotype.5 The ORs for disease for those who were heterozygous and those who were homozygous for the minor allele for this SNP were 6.8 (95% CI 3.9 to 12.0) and 20.8 (95% CI 3.8 to 113.7), respectively, for FIP (genotypic association test, p=3.7×10−12).5 Similarly, the ORs for heterozygous and homozygous individuals were 9.0 (95% CI 6.2 to 13.1) and 21.8 (95% CI 5.1 to 93.5), respectively, for sporadic IPF (genotypic association test, p=4.6×10−31).5 MUC5B encodes Mucin 5B, a glycosylated macromolecular mucus component secreted by airway epithelial cells. It is the major gel-forming mucin in mucus and lubricates saliva, lung mucus and cervical mucus, and based on recent studies, is important to innate immune function.6 MUC5B can be localised to the honeycomb cyst, a characteristic structure seen in IPF,7 strengthening the case for a pathological role for MUC5B in lung fibrosis.
The association between the rs35705950 minor allele and IPF has been validated in seven independent cohorts;8–14 it remains the strongest and most replicated single genetic risk factor for IPF. Adding to its credibility as a genetic risk factor specific to IIPs, rs35705950 has been examined in cohorts of subjects with ILD secondary to systemic sclerosis,11 ,12 asbestosis, sarcoidosis,11 acute respiratory distress syndrome (ARDS), COPD and asthma,15 and these investigations have shown no association between genotype and these other pulmonary disorders.
Furthermore, rs35705950 is likely to be a significant risk factor for pulmonary fibrosis in some populations outside of the non-Hispanic white (NHW) population in which it was originally described: in a Mexican cohort, the variant was a risk factor for IPF (OR=7.36, p=0.0001), yet was rare in a Korean IPF cohort.16 Among Asians, the MAF of rs35705950 is low, but in Japanese and Chinese studies, subjects with IPF had a significantly higher frequency of the risk allele than controls.13 ,17 The rs35705950 MAF across different ethnic groups appears to reflect disease prevalence in these groups, since NHWs have been observed to be at higher risk of pulmonary fibrosis than Hispanics, Asians and Africans.18 Indeed, in sub-Saharan African populations, in which pulmonary fibrosis is thought to be rare, the MUC5B promoter polymorphism is not present,19 so although the rs35705950 variant may have been discovered to be important in familial and sporadic forms of pulmonary fibrosis in the NHW population, it appears to be important in other groups as well, and broader study of this common variant in other ethnic groups will be important in terms of understanding its role in disease pathogenesis. Given the broad prevalence differences by ancestry, it is possible that the gain-of-function MUC5B promoter variant confers a beneficial biological effect, perhaps improved host defence, during childhood or through childbearing years, leading to positive selection in the NHW population.
Also, the 2011 study by Seibold et al5 was the first to implicate secreted airway mucins in fibrogenesis, thereby suggesting that airway epithelia could play an important role in disease development. Furthermore, it identified a specific common genetic variant, rs35705950, that could be utilised to identify individuals at high risk for the development of disease.5 However, the variant's MAF of 9.1% in the control cohort (vs 33.8% in FIP and 37.5% in sporadic IPF) suggests that the variant may not be sufficient to cause a disease as rare as FIP (or IPF) despite the significant risk it confers. Indeed, other large cohort studies, such as the those examining the Framingham Heart Study population, have shown that in NHWs, 19% of those without any evidence of interstitial lung abnormalities (ILAs) or fibrosis carry at least one T allele (MAF of 9.7%).20 Therefore, there may be significant gene by gene or gene by environment interactions that are critical in the translation of increased risk to the development of fibrosis;21 these potential interactions are an area of active investigation.
Follow-up studies have shown that the same variant is associated with radiographic evidence of ILAs, hypothesised to be an early form of disease.22 ,23 For each copy of the rs35705950 minor allele, the odds of ILAs (which included ground-glass or reticular abnormalities, centrilobular nodularity, non-emphysematous cysts, honeycombing and traction bronchiectasis) were 2.8 times higher.20 This is notable since these ILAs progress over time and have been linked to increased mortality, specifically from respiratory causes.22 ,23 Intriguingly, the Framingham Heart Study investigations have also shown that 2% of NHWs over 50 years of age have definite radiographic evidence of pulmonary fibrosis, higher than what had been reported in the past as the prevalence of disease.20 ,24 These associations between rs35705950, FIP, IPF and ILAs are provocative as they suggest that common variants could be used alone or in concert with other genetic or demographic parameters to guide screening, earlier diagnosis and early intervention.20–23 ,25 ,26
Other common genetic variants and fibrosing IIP
Although rs35705950 is the most robustly replicated common genetic variant associated with pulmonary fibrosis (familial or sporadic), other common variants have been found to be associated with disease utilising genome-wide approaches.
A 2008 genome-wide association study (GWAS) identified a common variant in TERT, rs2736100, associated with IPF in a Japanese cohort.27 In 2013, Fingerlin et al8 published a case-control GWAS in 1616 individuals with fibrotic IIP and 4683 control subjects followed by a replication study of 876 cases and 1890 controls. The study included individuals from FIP families (one-per-family, n=390 in the discovery cohort) in addition to sporadic cases of IIP. This large GWAS confirmed disease associations with TERT (chromosome 5p15), MUC5B (11p15) and the 3q26 region near TERC; however, it also identified seven new loci associated with fibrotic IIP, including FAM13A (4q22), DSP (6p24), OBFC1 (10q24), ATP11A (13q4), DPP9 (19p13), and regions on chromosomes 7q22 and 15q14–15.8 Excluding MUC5B, these loci account for approximately one-third of disease risk.8 Although these genes have disparate reported biological functions, in aggregate they implicate host defence (MUC5B, ATP11A), cell-cell adhesion (DSP, DPP9) and DNA repair (TERT, TERC, OBFC1) pathways in disease pathogenesis.8 ,21 ,25 ,28
As was observed in the case of the MUC5B promoter polymorphism rs35705950, the ORs for loci identified in the Fingerlin et al study did not differ in post hoc analyses comparing the familial to sporadic cases, suggesting that when it comes to risk conferred by common variants, FIP is similar to sporadic IIP.8 ,21 However, many of the individual variants identified in this GWAS had MAFs approaching 50% even in the control cohorts (eg, rs2076295 in the DSP locus on chromosome 6, MAF of 44% in the control cohort); given the rarity of fibrotic IIP as a phenotype, the relationship of individual common variants to the development of disease phenotype may involve gene by gene or gene by environment interactions that have yet to be elucidated.
Another GWAS performed in subjects with sporadic IPF also confirmed the association of the MUC5B promoter polymorphism with IPF and also identified new disease-associated variants, including one in Toll-interacting protein (TOLLIP) and signal peptidase-like 2C (SPPL2C).10 This study identified these risk variants and also showed that rs5743890 in TOLLIP was associated with increased IPF mortality.10 ,29
Common variants and disease outcomes
The ability of genotype to predict phenotype is an important clinical application of the finding that common variants are associated with disease. With respect to the common MUC5B promoter polymorphism, retrospective analysis of large clinical trial data illustrates that sporadic IPF subjects with the risk variant (T) had improved survival at 2 years when compared to those without the variant in single-variable proportional hazards models as well as in models adjusting for age, sex, lung function and treatment status; outcome modelling accuracy improved significantly when the rs35705950 genotype was included.30 This observation regarding improved survival for those carrying the rs35705950 variant was noted in two independent study cohorts (INSPIRE and a cohort from the University of Chicago) and supports the hypothesis that there are at least two phenotypes of IPF defined by the T allele at rs35705950.30 Given the importance of rs35705950 in FIP as well as sporadic IPF, it is likely that this variant could have similar prognostic power in FIP cases as well, although this has not formally been studied.
Similarly, the TOLLIP variant described by Noth and colleagues to be associated with IPF was also associated with differential survival. However, the rs5743890 minor allele (G), although associated with decreased risk of disease overall, was, within analysis of diseased subjects alone, associated with increased mortality.10 An analysis of the PANTHER-IPF clinical trial data also showed that when TOLLIP and MUC5B risk variants were examined retrospectively in the trial cohorts, of those who received N-acetylcysteine (NAC), those with the TT genotype for rs3750920 (TOLLIP) had decreased risk of the trial's composite endpoint of death, transplantation, hospitalisation or greater than 10% decrement in forced vital capacity (FVC).29 Of the subjects that received NAC, those with the CC genotype for rs3750920 had increased risk of the trial's composite endpoint.29 While NAC has not been shown to be effective in all patients with IPF, it is possible that a subset of patients defined by TOLLIP genotyping could have some therapeutic response to the drug while others may have clinical decline in response to the same drug, highlighting the importance of pharmacogenetics in this disease and the need for genotype-stratified prospective clinical trials to confirm this finding.29 ,31
The relationship between common variants, disease risk and survival may be more complicated than initially hypothesised, since validated risk variants (eg, rs35705950) are strongly associated with disease risk but simultaneously associated with better clinical outcomes within IPF patients.30 Similarly, as Noth and colleagues described, the rs5743890 variant appears protective in term of disease risk, yet associated with increased mortality within the diseased subject group.10 It may be that such common variants predispose individuals to clinically milder forms of disease. These findings suggest that ‘IPF’ may itself be a heterogeneous phenotype that could be more granularly defined by genetic markers and risk variants, implying an important future role for genetics-based personalisation of care.
Therefore, further prospective study of the relationship between genotype and therapeutic response will be critical to tailoring individual patient treatment in the future. Currently, although the studies described point to intriguing genotype-phenotype relationships, clinical use of these findings remains uncertain, especially as gene-gene interactions are poorly understood in terms of how they relate to disease phenotype and therapeutic response. At this time, therapeutic decision-making should be done on the basis of published, prospective, randomised, controlled trials and well-described risks and benefits, which have been studied independently of genotype.3 ,4
Future clinical trials should take into account genotype-phenotype relationships, especially given the findings described by Oldham et al29 and Peljto et al,30 since both these retrospective studies suggest that primary outcomes (ie, mortality and therapeutic response such as lung function decrement) can be influenced by common variant genotypes.
Rare variants in FIBROSING INTERSTITIAL PNEUMONIA
For decades, physicians have observed that pulmonary fibrosis can segregate in families with an autosomal dominant pattern of inheritance affecting individuals over multiple generations. In some cases, a rare variant can be identified that co-segregates with disease and is predictive of overt or subclinical disease. These families represent a monogenetic form of FIP, attributable to a rare variant in a single gene. In contrast with the common variants discussed above, these variants are generally absent or infrequently present in control populations (MAF of <0.1%) and are termed ‘rare’ or alternatively as ‘novel’, ‘singletons’ or ‘ultra-rare’ variants, or those that are ‘private’ to the family or individual in which they were discovered. The rarity of these alleles is consistent with the population frequency of FIP (roughly estimated at ∼1 in 100 000) and with evolutionary theory predicting that deleterious disease alleles should be rare.32 Individual rare variants are more highly penetrant than common variants and generally have larger ORs.33 For many ultra-rare variants, ORs are several orders of magnitude larger than for common variants or cannot be calculated because allele frequencies are so small to non-existent in control populations.
While association studies demonstrate genome-wide significance for common variants, this measure of significance cannot be achieved for individual rare variants given their low allele frequencies. Genome-wide measures of the significance of the rare variant FIP genes are represented either by linkage of an identical-by-descent genomic region with pulmonary fibrosis within large extended families34–36 or by gene burden tests demonstrating excess numbers of variants in cases versus controls.35 While common variants are usually characterised by a single tagging SNP, the rare variants linked to FIP include multiple different genetic changes within a gene that each predict a deleterious change in the genomic, RNA or protein sequence. Various in silico tools exist for predicting variant effects on protein function based upon alignments of evolutionarily conserved sequences or predicting effects on splicing or post-translational modifications. Additional proof in support of the pathogenicity of individual rare variants is made through segregation analysis in families or examination of protein function by molecular, biochemical or pathophysiological assays.
Rare variants in genes involved with telomere maintenance
Telomeres are composed of repetitive nucleotides sequences (TTAGGG) at the ends of chromosomes that protect the chromosome from progressive shortening during cell replication. Dysfunction of telomere maintenance machinery components results in short telomeres, activation of the DNA damage response and cellular senescence. Multiple genes in the telomere maintenance pathway have been implicated in adult-onset familial pulmonary fibrosis, including those that affect telomerase catalytic enzyme activity (TERT, TERC),34 ,37 its biogenesis (DKC1, PARN, NAF1)35 ,38 ,39 or the telomere end (RTEL1, TINF2).35 ,40–42 All pathogenic rare variants lead to a reduction in protein function, which is manifested as decreased telomerase catalytic activity,34 ,37 ,43 decreased enzyme processivity,44 aberrant T-circle formations40 or shortened telomere lengths.35 ,38 ,39 ,41 ,42 These heterozygous germline mutations lead to telomere shortening in somatic cells, including blood leukocytes, oral mucosal epithelial cells and lung epithelial cells.45 ,46
Collectively, rare variants of deleterious function are found more often in the TERT gene than in any other gene associated with pulmonary fibrosis. Distinct novel variants have been discovered in different patients and families that span the full length of the gene. Pathogenic rare variants in TERT are found in ∼15% of familial cohorts34 ,37 and in ∼3% of cases of sporadic IPF.34 ,45 The development of pulmonary fibrosis is age-dependent, being uncommon in TERT variant carriers less than 40 years old. Incomplete penetrance is observed, with ∼60% of male and ∼50% of female TERT rare variant carriers over 60 years of age having overt disease.43 Most mutation carriers who develop pulmonary fibrosis report a prior history of smoking or another fibrogenic exposure, suggesting a role of the environment in modulating the risk of genetically susceptible individuals.
FIP and IPF are the most common pulmonary manifestations associated with rare variants in telomere genes,47 although a variety of ILD subtypes, including those of known and unknown causes,48 as well as an increased risk for emphysema,49 have been described. Heterogeneous diagnoses of family members with identical mutations are common and found in up to 80% of kindreds.48 Despite poor genotype-pulmonary phenotype correlations, lung fibrosis associated with rare variants in four different telomere genes (TERT, TERC, RTEL1 and PARN) is progressive and displays similar survival characteristics.48 Asymptomatic rare variant TERT carriers exhibit subclinical signs of pulmonary fibrosis including increased quantitative tissue volumes measured from high resolution CT scans of the chest, reduced diffusion capacity at rest, and reduced recruitment of diffusion capacity with exercise.50 Consistent with the findings of incomplete penetrance in kindreds,34 ,35 the lag time between subclinical to overt disease is highly variable even for individuals with the same telomerase mutation.51
Additive effects of pathogenic variants in the telomere-related genes can be seen in human patients. Whereas monoallelic mutations are associated with adult-onset pulmonary fibrosis, biallelic mutations in TERT, RTEL1 and PARN are found in rare children and young adults with more severe disease presenting as dyskeratosis congenita (DC) or Hoyeraal-Hreidarsson syndrome with mucocutaneous manifestations, bone marrow failure, developmental delay, enterocolitis and/or immunodeficiency.52–59 Frequently, DC-associated phenotypes are often found in adults with these ‘short telomere syndromes’.60 Patients with rare variants in TERC appear to have a higher prevalence of severe manifestations of bone marrow dysfunction (leukopenia, thrombocytopenia, aplastic anaemia, myelodysplastic syndrome) than those with rare variants in the other telomere genes.48 And the presence of a personal or family history of a spectrum of short-telomere phenotypes (cytopenias, macrocytosis, liver dysfunction, early greying of hair, predisposition to leukaemia, and myelodysplastic syndrome) in conjunction with pulmonary fibrosis is often predictive of short telomere lengths or a rare variant in a telomere-related gene.50 ,61 ,62
Genetic anticipation can be seen in families with telomere-related gene mutations, with earlier and more severe onset of disease in successive generations, and is related to progressive telomere shortening.48 ,63–65 Consistent with these observations, the mean age at the time of ILD diagnosis is correlated with telomere length, with earlier ages at the time of pulmonary fibrosis diagnosis for TERC rare variant carriers with shorter telomere lengths than PARN mutation carriers with longer telomeres.48
The lung transplantation outcomes of patients with rare variants in telomerase have been described.66–68 The incidence of haematological complications, renal disease and infections in this patient group suggests an increased risk for certain adverse effects which are unmasked by transplantation or post-transplant medication regimens. Pre-transplant bone marrow biopsies may identify those at increased risk for severe haematological complications.61
Short telomere lengths in sporadic IPF
Patients with sporadic, or non-familial, forms of various ILDs also have shorter telomere lengths than would be predicted by their age. Nearly 25% of sporadic IPF patients have age-adjusted telomere lengths of <10th percentile.45 ,46 Telomere length is a prognostic biomarker in sporadic IPF; patients with the shortest telomere lengths have the worst transplant-free survival even after adjusting for age, gender and pulmonary function.69 This finding has been validated in multiple independent cohorts.69 ,70 Thus, the molecular mechanism of telomere shortening which was discovered through genetic studies of rare families is relevant in patients with sporadic disease.
Rare variants in genes involved with surfactant metabolism
Surfactant produced by the type II alveolar epithelial cell is rich in lipids and proteins. The surfactant proteins are translated within the endoplasmic reticulum (ER) and transported to lamellar bodies where they are stored prior to secretion into the alveolar space. Pulmonary surfactant serves two main functions: (1) to reduce surface tension and prevent atelectasis, and (2) to opsonise pathogens and modulate the innate immune response.71 Rare variants in the genes encoding surfactant protein A and C (SFTPA1, SFTPA2 and SFTPC) have been associated with adult-onset pulmonary fibrosis. Biallelic rare variants in the genes encoding surfactant protein B (SFTPB) and the lamellar body transporter of cellular proteins and phospholipids (ABCA3) are more commonly associated with neonatal respiratory distress syndrome and paediatric diffuse parenchymal lung disease. Rare case reports of adults with homozygous ABCA3 mutations have been described.72 ,73
SFTPC heterozygous rare variants have been described in patients presenting from infancy to late adulthood with a range of pulmonary fibrosis subtypes. These FIP kindreds demonstrate an autosomal dominant pattern of inheritance. Over 40 different missense, splice site or frameshift mutations within the SFTPC gene have been associated with familial or sporadic disease.74 The mechanism of most SFTPC mutations is thought to be due to a gain of toxic function from protein misfolding. The SP-C pre-protein contains a BRICHOS domain that is critical for proper folding and trafficking within the secretory pathway.75 Some rare variants within the BRICHOS domain cause misfolding of the pre-SP-C, which accumulates within the ER resulting in activation of the unfolded protein response, increased ER stress and type II alveolar cell toxicity.75–77 One of the rare variants that has been described across different cohorts predicts a missense change, Ile72Thr (c.218T>C), within the linker domain which leads to impaired trafficking and disruption of the autophagy pathway.78 ,79 Although rare variants SFTPC mutations are rarely found (1–2%) in most FIP cohorts,80–83 they have been found at a higher incidence (25%) in a Dutch cohort.84
Surfactant protein A is encoded by two closely related genes, SFTPA1 and SFTPA2, located on chromosome 10q22.3.85 Mutations in both genes have been associated with FIP and lung adenocarcinoma. All pathogenic heterozygous rare variants in the SFTPA1 and SFTPA2 genes encode missense mutations located within the carbohydrate recognition domain of the protein.36 ,86 ,87 These mutations result in reduced secretion of mature protein and increased ER stress,36 ,86 ,88 markers of which have been found in alveolar epithelial cells of sporadic IPF patients adjacent to regions of lung fibrosis,89 ,90 suggesting that increased ER stress may lead to a cellular substrate with increased vulnerability to fibrosis after injury. Thus, the pathway identified by studying rare patients and families with surfactant mutations is relevant to a broader IPF patient population.
Implications for personalised medicine
There is ample evidence in support of both common and rare variants' contribution to the genetic architecture of pulmonary fibrosis. Recent advances in genetic sequencing and bioinformatics have made it much easier to detect genetic variants rapidly. The UK's 100,000 Genomes Project is an effort to sequence genomes from 70 000 individuals in the National Health Service with a focus on cancer and rare diseases, including IIPs. As this study is completed, hypotheses regarding the relevance of common and rare variants can be tested in this real-life human experiment.
Major challenges lie ahead to determine if individual sequence data can be translated into clinically relevant information. For common variants, a major challenge will determine how these variants can inform an individual patient's risk, since a tagging SNP could be a proxy for a linked causal variant. Or perhaps interplay between one or more common SNP(s) and environmental triggers is critical in disease risk or response to therapy. For rare variants, the major challenge will be to determine which are clinically meaningful given the plethora of rare variants per human genome.91–93 The American College of Medical Genetics and the European Society of Human Genetics have outlined standards for interpreting genetic variants,94 ,95 using suggested terminology such as pathogenic, likely pathogenic, uncertain significance, likely benign and benign. These designations highlight the uncertainty regarding the clinical implications of individual variants and the need for new scoring systems for predicting pathogenicity. A newly identified rare variant in an individual IPF patient is likely to be called a variant of unclear significance (VUS) if it has never before been reported. Will the identification of a VUS lead to a ‘genetic purgatory’96 for the patient and the medical team if there is little certainty regarding predictions of risk for the patient or family members? The burden of proof for pathogenicity of individual rare variants often requires segregation analyses in families and/or demonstration of impaired assays of protein function, data not obtained by sequencing alone.
The ILDs are a diverse collection of disorders, and the genetic studies substantiate this heterogeneity at a molecular level. Studies to date have demonstrated that the inclusion of genetic information leads to stratification of pulmonary fibrosis patients with regard to rate of progression and responsiveness to therapeutics. Given the potential impact of rare and common variants on disease outcome, one must consider the prognostic potential of these sequence changes and their inclusion in interventional trials.
Large population sequencing projects will provide opportunities to describe the penetrance and effect sizes of a spectrum of common and rare variants with regard to subclinical and overt pulmonary phenotypes. With datasets collected from large numbers of well-phenotyped individuals, interactions between common and rare variants, genes and pathways can be interrogated across both healthy and diseased subjects. Thus, the UK's 100,000 Genomes Project will lead to an era of studying individual genotypes in conjunction with classic phenotypes, creating new ways to molecularly characterise pulmonary fibrosis.
SKM and CAN contributed equally.
Funding The U.S. Department of Veterans Affairs (grant no. Merit Award 1I01BX001534 (DAS)) and the National Institutes of Health (grant nos. P01-HL092870 (DAS), R01-HL097163 (DAS), R01HL093096 (CKG), R25-ES025476 (DAS), R33-HL120770 (DAS), T32HL098040 (CAN), UL1TR001105 (CAN) and UH2-HL123442 (DAS)).
Competing interests The authors have in the past received and currently receive funding from the National Institutes of Health.
Provenance and peer review The authors have in the past received and currently receive funding from the National Institutes of Health and the U.S. Department of Veterans Affairs.