Background We aimed to disentangle genetic and environmental causes in lung cancer while considering smoking status.
Methods Four Nordic twin cohorts (43 512 monozygotic (MZ) and 71 895 same sex dizygotic (DZ) twin individuals) had smoking data before cancer diagnosis. We used time-to-event analyses accounting for censoring and competing risk of death to estimate incidence, concordance risk and heritability of liability to develop lung cancer by smoking status.
Results During a median of 28.5 years of follow-up, we recorded 1508 incident lung cancers. Of the 30 MZ and 28 DZ pairs concordant for lung cancer, nearly all were current smokers at baseline and only one concordant pair was seen among never smokers. Among ever smokers, the case-wise concordance of lung cancer, that is the risk before a certain age conditional on lung cancer in the co-twin before that age, was significantly increased compared with the cumulative incidence for both MZ and DZ pairs. This ratio, the relative recurrence risk, significantly decreased by age for MZ but was constant for DZ pairs. Heritability of lung cancer was 0.41 (95% CI 0.26 to 0.56) for currently smoking and 0.37 (95% CI 0.25 to 0.49) for ever smoking pairs. Among smoking discordant pairs, the pairwise HR for lung cancer of the ever smoker twin compared to the never smoker co-twin was 5.4 (95% CI 2.1 to 14.0) in MZ pairs and 5.0 (95% CI 3.2 to 7.9) in DZ pairs.
Conclusions The contribution of familial effects appears to decrease by age. The discordant pair analysis confirms that smoking causes lung cancer.
- Lung Cancer
- Tobacco and the lung
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is the key question?
Is there a significant genetic component to the occurrence of lung cancer and is the genetic influence modified by smoking and age?
What is the bottom line?
The interplay between genes and tobacco smoking in the aetiology of lung cancer has remained controversial, and we disentangle genetic and environmental causes in cancer while taking smoking status into account.
Why read on?
Our study shows that tobacco exposure causes lung cancer even when adjusting for genetic factors. Interactions between genes and environmental exposure in the development of lung cancer are not supported from the largest twin cohort study with longest follow-up ever. Familial effects have decreased influence with increasing age.
Smoking is the primary cause of lung cancer globally, although several other environmental exposures play a role.1 The estimated heritable genetic contribution to variation in risk to lung cancer overall has been modest in family (heritability estimate of 0.08)2 and twin (0.263 and 0.184) studies. Genome-wide association (GWA) studies further suggest that some gene loci are associated with lung cancer in both smokers and non-smokers, while other variants, such as the functional D398N (rs16969968) variant in CHRNA5, are associated with lung cancer only among smokers.5 ,6 Thus, the heritability of lung cancer may vary as a function of smoking, but the differential effect of smoking on genetic variation underlying development of lung cancer has not been quantified.
To this end, our aim is to estimate the heritability of liability to lung cancer based on the largest twin cohort to date, the Nordic Twin Study of Cancer (NorTwinCan)4 which extends the Lichtenstein (2000)3 study with longer follow-up and new birth cohorts and refined methodology. We sought to estimate the heritability in the liability to lung cancer and whether it is modified by smoking or age.
NorTwinCan includes population-based cohorts from the Danish, Finnish, Norwegian and Swedish twin registries.7 Each twin has an individually unique national registration number, allowing for linkage to the national cancer and mortality registries with complete follow-up, drop-out being only due to death or emigration. Lung cancer occurrence was obtained from the national cancer registries and computed from the baseline when smoking status was determined until the end of follow-up (table 1). In all cohorts, zygosity—monozygotic (MZ) or dizygotic (DZ)—was determined at baseline by validated questionnaire methodology, which classifies more than 95% of twin pairs correctly.3 Twins who have not replied to the questionnaires, as well as a minority providing inconsistent responses, are classified as unknown zygosity (UZ). The ethics committees for each country approved the study.
Given the major role of smoking in the aetiology of lung cancer, our analysis includes twin individuals of known zygosity from the Danish, Finnish, Norwegian and Swedish registries, where data on smoking status were available prior to lung cancer diagnosis. We excluded individuals from opposite-sex DZ pairs as data from them have not been as comprehensively collected. For individuals who reported smoking behaviour on more than one questionnaire, we used the earlier information.
The characteristics of the four national twin cohorts included in the analyses are summarised in table 1. We classified the participants as never smokers, ever smokers (former or current at time of questionnaire) and current smokers based on the survey items used to assess smoking status. Smoking data in the Danish cohort came from the eight questionnaire surveys conducted from 1959 to 2002.8–10 In Finland, smoking data came primarily from the first questionnaire survey in 1975, but some twins who had not replied in 1975 responded to a questionnaire survey in 1981.11 ,12 In the Norwegian cohort, smoking data came from three questionnaire surveys in 1980–1982, 1990–1992 and 1998.13 ,14 In the Swedish cohort, smoking data came from questionnaire surveys in 1961, 1967, 1970 and 1973.15 ,16
We included individuals with histologically confirmed lung cancer. Among those with smoking data, we recorded a total of 1508 incident lung cancers with a mean follow-up time of 25.2 years (21.0 years in lung cancer patients).
After defining cohort-specific dates of entry and follow-up, we accounted for left-truncation from variable initiation of cancer registration and right-censoring among those censored at the end of follow-up, and lost to follow-up due to emigration (<2%). We examined the individual risk of lung cancer diagnosis by age by estimating cumulative lung cancer, incidence17 and lifetime risk as the cumulative incidence (the probability of lung cancer) by age 80 years. We modelled potential competing deaths18 ,19 which allows estimation of lung cancer risk in a twin given the occurrence of other disease in his/her co-twin. We obtained the case-wise concordances by age18 ,19 (see online supplementary material for details) as well as relative recurrence risks in MZ and DZ pairs and the multilocus index.20 ,21
We extended standard biometrical modelling methods to address issues of censoring at follow-up.7 ,22 Results would agree with those obtained from standard models for twin data18 ,23 ,24 if no censoring were present. Quantitative models were analysed to estimate the magnitude of variation explained by genetic and environmental influences18 underlying the liability to develop lung cancer by smoking status. The relative magnitude of genetic influences on variation in liability to lung cancer is thus estimated among pairs in which neither had ever smoked, among pairs where both co-twins are ever (former or current) smokers, and among pairs in which both co-twins are current smokers.
We use information on lung cancer incidence in MZ and DZ pairs to decompose variation into additive genetic effects (A), dominant genetic effects (which represent deviations of the heterozygote genotype from the mean of the homozygote genotype) (D), common environmental effects (C) and individually unique environmental effects (E). Within-pair covariance of liability is expressed as κ var(A)+γ var(D)+var(C), where κ=γ=1 for MZ pairs and κ=1/2 and γ=1/4 for DZ pairs.18 We tested a series of models sequentially to assess the significance of specific parameters. We estimated measurement error in E, which is the component of variance that does not contribute to within-pair resemblance. Dominance effects are, typically, biologically implausible in the absence of additive effects. The primary models are thus the ACE and ADE models, as well as their sub-models AE, CE and E. We assessed the fit of the sub-models by the Akaike information criterion.22
We tested for equal thresholds (ie, normal quantiles of prevalence) between MZ and DZ twins, which is equivalent to assuming that the risk of disease does not differ by zygosity. We tested for constant relative recurrence risk (RRR) over age by grouping into 5-year intervals from age 65 to 90 years of age for MZ and DZ pairs. To correct for possible bias due to censoring, individuals were assigned weights obtained by calculating the inverse probability of being censored at time of follow-up.7 ,18 ,19 ,22 Estimates have not been adjusted for the effect of left-truncation that would cause an upwards bias, which is not yet feasible for the approach.
For gene and smoking status interaction, the magnitude on the liability scale could not be estimated due to having one concordant pair among all never–never and never–ever smoking pairs. The presence of genetic interaction with smoking status was therefore investigated by comparing observed concordance in strata of smoking status to that expected when assuming the same variance components on the liability scale as in ever–ever pairs but using smoking-status specific cumulative incidence by age as well as follow-up time of the specific pairs in the cohort. This procedure leads to an approximate test, which we later refer to as the binomial test. It takes into account the smoking-status specific cumulative incidence by age, as well as follow-up time of the specific pairs in the cohort. We then computed the probability that randomly selected pairs were concordant, based on using the dependence parameters of the liability threshold model for the ever-ever pairs.
Among pairs in which one twin was a smoker and the other was not, we computed within-pair HRs for the association of smoking with lung cancer using a Cox model with pair-specific baseline hazard functions. Given that MZ pairs share their genomic sequence, an association of smoking with lung cancer risk within such pairs is independent of genetic liability. This hypothesis has historically competed with the hypothesis25 of shared genes underlying both smoking and lung cancer. The statistical programme R was used for all analyses with the package mets (Holst K, Scheike TH. mets: Analysis of Multivariate Event Times, R package version 0.2.8.1).
Among those with smoking data, we recorded 1508 incident lung cancers among a total of 115 407 twin (43 512 MZ and 71 895 DZ) individuals. Of these, 47% were never smokers (n=54 238), 16% were former smokers (n=18 231) and 37% were current smokers (n=42 938) at baseline. Figure 1 shows the cumulative incidence of lung cancer by smoking status (never, former, current) and sex. The risk of lung cancer diagnosis before 80 years of age is estimated at 0.6% (95% CI 0.5% to 0.7%) among never smokers, 2.0% (1.7% to 2.3%) among former and 5.7% (5.4% to 6.0%) among current smokers adjusting for censoring and competing risk of death. The only sex difference is seen among smokers. There was no difference in risk between MZ and DZ twin individuals.
The numbers of pairs concordant and discordant for lung cancer incidence are presented in table 2 for those with smoking data (n=50 595 pairs with smoking status on both twins) overall and further classified by smoking status.
Among twin pairs where both are ever smokers, the risk of lung cancer in a twin before a given age given that his or her co-twin also had lung cancer before that age, the case-wise concordance by age is depicted in figure 2 in both MZ and DZ pairs. Figure 2 also gives the cumulative incidence of lung cancer by age in individuals. The case-wise concordance risk was larger in MZ twins than the individual cumulative incidence risk, testing for a difference from the cumulative incidence across the 5-year age intervals (χ2=22.1, df=6, p=0.001). For the DZ twins, we found that the case-wise concordances were borderline significantly different from the cumulative incidence (χ2=13.4, df=6, p=0.04). The estimated case-wise concordance at 90 years of age was 0.20 (0.13 to 0.27) for MZ pairs and 0.13 (0.08 to 0.17) for DZ pairs.
This excess risk of MZ and DZ pairs of the case-wise concordance relative to the population-based individual cumulative incidence of lung cancer, the RRR (also known as the lambda value), is depicted in figure 3 and demonstrates the presence of familial effects at all ages. The RRR is higher at younger ages, in fact the lung cancer risk is increased 10.2-fold (3.2 to 17.2) at 65 years of age and decreases significantly to a 3.6-fold (2.3 to 4.9) increase at 90 years of age if a MZ co-twin is diagnosed (p=0.04, test for trend). The RRR is suggested to be constant by age for DZ twins (p=0.25, test for trend) (figure 3). (A table of relative risks by age group is provided in supplemental table 1). We tested if the absolute differences of the MZ and DZ curves at each 5-year interval from age 65 to age 90 years of age were significantly different, and found they were not (p=0.21). Our results are thus consistent with the hypothesis of rather strong familial influences that do not increase across age. We hypothesise that the genetic part of the familial influence may become weaker by age.
We then examined evidence for genetic factors in the liability to develop lung cancer by smoking status. Among pairs in which neither had ever smoked (7871 MZ pairs and 10 768 DZ pairs), there was one lung cancer concordant MZ pair with 43 MZ and 59 DZ lung cancer discordant pairs. Heritability could not be estimated. However, the dependence in the never–never and never–ever pairs was not significantly different from the dependence among the ever–ever pairs (p=0.28, binomial test of observing more than one concordant pair of lung cancer).
The overall estimate of familial aggregation (genetic variance and shared environment component) for lung cancer liability is 44% with 38% (0.05 to 0.72) of variability attributed to genetic effects. When adjusted for smoking status, effects of country and sex, variability attributed to genetic effects was 34% (0.00 to 0.70) (table 3). A comparison of the MZ and DZ tetrachoric within-pair correlations in liability to develop lung cancer (table 3) adjusting for age, sex, country and smoking, and further adjustment for censoring hypothesising equal correlations, gave a p value of 0.07 (Wald test).
Among the pairs where both twins are ever (current or former) smokers, the heritability estimates ranged from 28% (0.00 to 0.66) to 37% (0.25 to 0.49), depending on the assumptions of the genetic model (table 4). A pure environmental model did not fit the data. Among current smokers, the heritability was estimated at 29% (0.00 to 0.74) or 41% (0.26 to 0.56), depending on genetic assumptions (table 4).
Finally, for smoking discordant pairs, we examined whether smoking status was associated with future lung cancer. In the ever smoking discordant pairs (3274 MZ pairs and 8350 DZ pairs), 40 MZ pairs were discordant for lung cancer (table 5). Of these 35 cases were among ever smokers (with their non-smoking co-twin being unaffected) and only five in the never smokers (while their smoking co-twin was unaffected), yielding a paired analysis HR of 5.4. Results for DZ pairs and for current smoking versus never smoking discordant pairs are shown in table 5. Most discordant pairs arose from pairs in which the smoker still smoked at baseline. None of the smoking discordant pairs were concordant for lung cancer.
In the largest study of lung cancer in twins to date, we found that genetic effects account for a significant amount of the variation in the liability to develop lung cancer, and the magnitude of this estimate is independent of smoking status. The largest estimate of heritability in the liability to lung cancer was found in pairs where both were current smokers at baseline. Among twin pairs where both twins were never smokers, only one concordant lung cancer pair was seen and a formal estimate of heritability could not be derived. A test of gene by smoking interaction was not significant suggesting that the relative contribution of genetics does not vary by smoking status. Furthermore, testing suggests that the contribution of familial effects does not increase by age. Our pairwise analysis of smoking discordant pairs confirmed that smoking causes lung cancer independent of genetic liability either to smoking or to lung cancer.
Twin pairs discordant for both lung cancer and smoking status at baseline are informative for causal analyses. In the lung cancer and smoking doubly discordant pairs, the pairwise relative risk for lung cancer was 5.4 among ever smokers in MZ pairs. It is of historical interest that after the landmark papers of Doll and Hill26 and Wynder and Graham27 in the early 1950s, the causality of the relationship between smoking and lung cancer was soon challenged by the great statistician Ronald Fisher.25 He pointed out the greater similarity of MZ versus DZ pairs for smoking, and indicated genetics as a potential confounder. MZ pairs discordant for smoking would help to resolve the issue of causality. Following up on prior twin studies of smoking discordant pairs,28 ,29 we can now finally put this issue to rest, an issue debated for many years because of the tobacco industry's prolonged refusal to acknowledge publicly that smoking causes lung cancer.
Smoking is the most important cause of lung cancer. Taking smoking into account permits us to test for the dependence of genetic effects on smoking status. The overall estimate of familial aggregation (genetic variance and shared environment component) for lung cancer liability is 44%, with most variability attributed to genetic effects (38%), higher but still consistent with the estimate of 26% (95% CI 0% to 49%) by Lichtenstein et al,3 also unadjusted for smoking and for censoring but based on a smaller number of affected pairs. We recently reported on the heritability for liability to lung cancer in the entire NorTwinCan data, with an overall estimate of familial aggregation of 42%.4 The present analysis extends these estimates by accounting for the effect of smoking status prior to disease occurrence and examines heritability among the smoking pairs.
In our analysis, adjustment for smoking eliminates the estimates for shared environmental effects. Shared environmental effects (ie, exposure to smokers in the childhood home and among peers in adolescence) are of importance for the initiation of smoking,30 so it is not surprising that adjustment for smoking controls for this source of variation. The highest estimates of heritability and recurrence risks were seen among current smoking pairs. Among never smokers, we cannot estimate the heritability of lung cancer.
Prior family2 and twin3 ,4 studies of lung cancer have demonstrated familial aggregation and provided very modest estimates for the role of genes. The Swedish multi-generational register family study2 estimated the heritability of lung cancer to be 8% (95% CI 5% to 9%), without information on smoking in the families. The American World War II veterans' study31 followed 12 938 male twin pairs for 44 years for mortality. Among pairs with at least one lung cancer death, only 10 of 269 MZ pairs and 21 of 373 DZ pairs were concordant, and no heritability estimate was provided. Smoking information was not used in the analysis, but smoking-related cancers showed less MZ–DZ differences in similarity than other cancers. Despite the large number of pairs in our present study, the final number of concordant pairs with smoking information was limited. Thus, we could not examine heritability of lung cancer risk in relation to time trends in lung cancer or histological subtypes of lung cancer. Nor did we have information on smoking amount, duration or changes in smoking status comprehensively and comparably assessed in all the twin cohorts.
Since detailed smoking information was not available, it should be acknowledged as a potential limitation that residual confounding might remain in the estimates of heritability estimation. Because MZ twins, who are smokers, are also more similar than DZ pairs in age of smoking initiation, amount smoked and duration of smoking,30 the heritability of lung cancer among smokers may still contain residuals effects of genetics on smoking, and thus on lung cancer risk.
The overall genetic contribution to lung cancer as a function of smoking status is relevant for gene discovery. Since 2007, 21 lung cancer genome-wide analysis (GWA) and genome-wide meta-analysis studies32 (http://www.genome.gov/gwastudies) have found the strongest association with the CHRNA5 functional D398N (rs16969968) variant. The functional changes33 ,34 in nicotinic acetylcholine receptor activity are linked to increased risk for nicotine dependence, higher amount smoked35–38 and higher cotinine levels.39 ,40 Thus, those with a risk allele smoke more, are more tobacco-dependent and are less likely to quit, and therefore are at higher risk of developing lung cancer. However, D398N is not a risk factor for lung cancer in non-smokers, based on a GWA meta-analysis of 14 900 lung cancer cases and 29 485 controls6 and among 56 037 individuals from the HUNT population study in Norway.5 This variant requires exposure to smoking to affect lung cancer risk and thus contributes to the heritability seen among current smokers. In contrast to D398N, associations with other loci found to be significant for lung cancer such as those in 5p15 (TERT and CLPTM1L genes) and 6p21 (BAG6/BAT3) are found also in non-smokers.6 ,32 The existence of a modest familial liability to lung cancer independent of smoking status was also observed in the analysis of Utah genealogical data.41 An increased risk of lung cancer was seen even in distant relatives; the high proportion of non-smoking lung cancer cases (31%) and a large proportion of missing data on smoking status (which was assessed through the death certificate and not prospectively) calls for replication in other populations. A recent large meta-analysis yielded an array-based heritability estimate for lung cancer of 21% (95% CI 14% to 27%).42 This is somewhat smaller than our overall twin estimates suggesting that much of the genetic liability to lung cancer is attributable to common variants, but other genetic effects may exist. The same study estimated that 24% of the heritability of lung cancer is accounted for by genetic determinants of smoking behaviour.
In conclusion, our study extends earlier studies to examine the heritability in liability to lung cancer by smoking status and age. We find no formal evidence for a gene by environmental exposure interaction in lung cancer; more detailed environmental exposures and larger sample sizes may be required. Given that we have shown a rather strong familial influence, we hypothesize that the genetic component of that familiality weakens with age. Studies of genetic factors and hence molecular mechanisms in cancer would benefit by carefully taking into account known environmental risk factors and identifying the population groups at highest genetic risk using environmental stratification. However, the discordant pair analysis conclusively demonstrates that tobacco exposure causes lung cancer even when adjusting for genetic factors.
We acknowledge the contribution of all the researchers and staff of the twin and cancer registries contributing to the NorTwinCan project.
JH and TK contributed equally
Contributors JH designed the study, contributed to developing the statistical methodology, conducted the data analysis, interpreted the data, and wrote the methods section of the manuscript. TK contributed to the design and wrote the manuscript together with JH and JKaprio (JH and TK contributed equally to this article). KH made central contributions to developing the statistical methodology, and took part in conducting the statistical analysis as well as in revising the manuscript. AS was responsible for quality assurance of the combined data set, conducted the data analysis, and reviewed and commented on the manuscript. EP contributed to quality assurance of the combined data set, and reviewed, commented on and edited the manuscript. JKutschke helped to prepare the Norwegian data. JRH helped in the drafting and providing critical comments on the manuscript. LAM reviewed, commented on and edited the manuscript. KChristensen reviewed, commented on and edited the manuscript. H-OA was involved in initiating, designing and funding the study as well as in interpreting the results and editing the manuscript. TS contributed to statistics and took part in revising the manuscript. JKaprio designed the study, contributed to data interpretation, and wrote the manuscript together with JH and TK. KC helped prepare the Swedish data and provided critical comments on the manuscript.
Funding This work was supported by funding from the Ellison Foundation (PI LAM) and the Nordic Union of Cancer (PI JKaprio). LAM is supported by the Prostate Cancer Foundation. The Finnish Twin Cohort was supported by the Academy of Finland (grant numbers 213506, 129680, 265240, 263278) and Karolinska Institutet Distinguished Professor Award to H-OA (Dnr: 2368/10-221). The Ministry for Higher Education financially supports the Swedish Twin Registry.
Competing interests Tellervo Korhonen and Jaakko Kaprio have consulted for Pfizer on nicotine dependence from 2012 to 2015.
Ethics approval The study was approved by the ethical committees for each country.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The data for the present analysis were compiled with the agreement of the participating twin cohorts and national cancer registries. Requests to access additional data need to made through the individual national cohorts and registers who are responsible for the data sets.