Genetic variants affecting cross-sectional lung function in adults show little or no effect on longitudinal lung function decline

Background Genome-wide association studies have identified numerous genetic regions that influence cross-sectional lung function. Longitudinal decline in lung function also includes a heritable component but the genetic determinants have yet to be defined. Objectives We aimed to determine whether regions associated with cross-sectional lung function were also associated with longitudinal decline and to seek novel variants which influence decline. Methods We analysed genome-wide data from 4167 individuals from the Busselton Health Study cohort, who had undergone spirometry (12 695 observations across eight time points). A mixed model was fitted and weighted risk scores were calculated for the joint effect of 26 known regions on baseline and longitudinal changes in FEV1 and FEV1/FVC. Potential additional regions of interest were identified and followed up in two independent cohorts. Results The 26 regions previously associated with cross-sectional lung function jointly showed a strong effect on baseline lung function (p=4.44×10−16 for FEV1/FVC) but no effect on longitudinal decline (p=0.160 for FEV1/FVC). This was replicated in an independent cohort. 39 additional regions of interest (48 variants) were identified; these associations were not replicated in two further cohorts. Conclusions Previously identified genetic variants jointly have a strong effect on cross-sectional lung function in adults but little or no effect on the rate of decline of lung function. It is possible that they influence COPD risk through lung development. Although no genetic variants have yet been associated with lung function decline at stringent genome-wide significance, longitudinal change in lung function is heritable suggesting that there is scope for future discoveries.


INTRODUCTION
Reduction of FEV 1 relative to FVC defines COPD, one of the leading causes of death worldwide. Measures of lung function are also important predictors of morbidity and mortality in the general population. [1][2][3] While environmental factors, particularly smoking, impact lung function, genetic variation is also a major determinant. 4 Genome-wide association studies (GWAS) to date have identified numerous regions associated with lung function measured at a single point in time (ie, cross-sectional lung function). [5][6][7][8][9] The lung function attained at a given time point in adulthood will be influenced by factors that affect either the development of lung function in earlier life or the rate of subsequent decline in lung function or both. Both cross-sectional lung function and longitudinal change in lung function are heritable. Although heritability estimates for longitudinal decline in lung function range between 10% and 39%, 10 the individual genetic determinants have yet to be defined. Identifying the responsible genes could provide a promising route for intervention in COPD, since this is typically diagnosed well after lung function has reached its peak and modifying its further decline could prove to be a feasible therapeutic option.
The objectives of our study were as follows: (1) to examine the association with longitudinal change for those regions previously identified as significantly associated with cross-sectional FEV 1 or FEV 1 /FVC and (2) to seek novel variants which reach genome-wide significance for association with longitudinal change in lung function, using a cohort with multiple lung function measurements over an extended period of up to 40 years and an imputation panel which provides dense coverage of both common and low-frequency variants.

Discovery data source and study population
The Busselton Health Study (BHS) is a longitudinal health survey that began in 1966 in the town of Busselton in the southwestern region of Western Australia. In 1994/1995, a crosssectional community follow-up study was undertaken where blood was taken for DNA extraction. A sample of 1468 individuals with European ancestry were genotyped using the Illumina 610-Quad BeadChip (BHS1) and subsequent genotyping was carried out on an independent group of 3407 individuals with European ancestry using Illumina 660W-Quad (BHS2). Spirometric measures of FEV 1 and FVC were assessed. From 1966 to 1978 (five surveys), FEV 1 and FVC were measured using a McDermott dry spirometer (Pneumoconiosis Research Unit, Penarth, UK) and recorded as the highest values from three measurements ( provided that two recordings were within 10% of each other). Wedge spirometers (Vitalograph, Buckingham, UK) were used in the 1981 survey and pneumotachograph spirometers (Welch Allyn, Skaneateles Falls, New York, USA) were used in 1994/1995. 11 13 From 1981 onwards, spirometric measurements met American Thoracic Society guidelines (52% of the total number of measurements). 14 15 Demographic details and mean lung function measurements for this and replication cohorts are shown in table 1. Observations prior to age 25 were excluded from both discovery and replication analyses. 16 Genotype quality control and imputation Genotype data from BHS1 and BHS2 samples were merged and quality control (QC) was undertaken for both variant quality and sample quality. Prior to imputation, variants were excluded if they had a call rate <95%, deviated from Hardy-Weinberg equilibrium ( p<10 −6 ) or had a minor allele frequency (MAF) of <1% (100 240 variants). Individuals were excluded if their call rate was <95%, if their submitted gender and gender inferred by genotype were inconsistent, if they were a duplicate or if they were an outlier for heterozygosity (169 individuals). Principal components analysis was used to exclude individuals of non-European ancestry (those at least four SDs away from the mean for at least one of the first two principal components) (27 individuals). Analysis of identity by descent (IBD) was compared with the reported pedigree; any inconsistencies were reviewed and individuals were excluded as appropriate (40 individuals; see online supplementary material for further details). QC was undertaken using PLINK V. 1 22 Variants where imputation quality (r 2 ) was <0.3 or MAF <1% were excluded.

Phenotype QC and analysis
Data on age and smoking were checked for consistency over time. Where possible, missing data on height or smoking at one time point were imputed to be the same as that at the nearest time point (407 data items imputed in total). Otherwise, observations with missing data on lung function, age or height or where FEV 1 was greater than FVC were excluded from the analysis. After phenotype and genotype datasets were merged, and after samples failing genotype QC were excluded, 12 863 observations for 4170 individuals and up to eight time points were available prior to further phenotype QC.
A time variable was created for each observation, determined by the difference between the age of the individual at that observation and their age at the first time point for which data were available on that individual. Family was defined based on IBD, such that all individuals in a family were related (IBD >0.2) with at least one other person in the family. A full model with age, age 2 , height, height 2 , sex and time as fixed effects and an intercept varying per individual, an intercept varying per family and a slope for time varying per individual as random effects was fitted to the data for each trait (FEV 1 , FEV 1 /FVC and FVC). Additionally, another four models were fitted, excluding the random intercept varying per family, the random slope for time varying per individual and age 2 and height 2 one at a time and comparing with the full model using Akaike information criterion and Bayesian information criterion. The final model included all the terms in the full model, except for age 2 as this did not improve the fit of the model (see online supplementary table S1).
Outlying observations (with residuals more than four SDs away from the mean) for any of the three traits analysed, FEV 1 , FEV 1 / FVC and FVC, were then excluded from the analysis (168 observations and three individuals excluded). This resulted in 12 695 observations for 4167 individuals and up to eight time points included in the final analysis. The mean number of measurements per participant was three (see online supplementary figure S1) and the mean length of follow-up was 15.5 years.

Genetic risk scores
Previously published studies identified variants in 26 regions significantly associated with cross-sectional FEV 1 and/or FEV 1 / FVC. [5][6][7][8] We compared the effect size estimates for these 26 variants in BHS with the previously published effect size estimates. A weighted risk score was calculated for the joint effect of these 26 regions on baseline FEV 1 and FEV 1 /FVC as well as on change over time in both traits. The single-nucleotide polymorphisms (SNPs) included in this analysis are shown in the online supplementary table S2.
Unbiased (winner's curse-free) effect sizes, as calculated previously but excluding any data from BHS, 8 were used as weights for the 26 variants in the risk score calculations (weights, range 0.2-2.57, provided in online supplementary table S3). The number of risk alleles for each variant was multiplied by its corresponding weight and then summed across variants in order to obtain the risk score for each individual and to create the risk score variable. This risk score variable was then added to the final model described above, together with the risk score by time interaction, in order to obtain both the effect on baseline and on change over time.

Genome-wide association analysis and selection of variants for follow-up analysis
The effect of individual genetic variants (which included SNPs and indels) on the rate of decline in FEV 1 , FEV1/FVC and FVC was tested by adding a variant and variant-by-time interaction into the final phenotypic model described above. Using this model, genome-wide association analyses were undertaken for 29 798 550 variants assuming an additive genetic effect. Genomic control was subsequently applied (genomic inflation factors for slope change were 1.04, 1.03 and 1.06 for FEV 1 , FEV 1 /FVC and FVC, respectively). 23 Figure 1 illustrates our criteria for selection of variants for follow-up analysis. We identified variants potentially associated ( p<5×10 −6 taken as suggestive of association) with longitudinal FEV 1 , FEV 1 /FVC or FVC. We defined regions of association around the most strongly associated variant (sentinel variant) ±500 kb. We examined region plots to assess support from neighbouring variants and cluster plots for the closest genotyped variant to the sentinel variant in order to rule out associations driven by genotyping errors. Regions were selected for follow-up if they had at least one variant (either the top variant or a proxy in linkage disequilibrium with r 2 >0.2) with imputation quality >0.7 and p value <5×10 −5 .
Within each region, we selected variants for follow-up analysis as follows: (1) the sentinel variant, (2) a second variant in the region with imputation quality >0.7 and p value <5×10 −5 where the sentinel variant had imputation quality <0.7 and (3) for regions in which variants showed nominal interaction with smoking ( p <0.05 for interaction term based on a Z-test between ever-smokers and never-smokers), the most significant variant in the group (ever-smokers or never-smokers) where the sentinel variant's effect size estimate was largest.
Sensitivity analyses were also undertaken for the regions which met the criteria for follow-up analysis to assess whether their effect could be mediated through smoking behaviour. The same model was fitted, with an additional term for smoking status (ever-smoked or never-smoked), and effect sizes were compared.
Estimations of power were also obtained for the discovery of new signals and for detection of associations with the 26 variants previously reported to be associated with cross-sectional lung function (see online supplementary materials).

Replication
Follow-up analyses for the 39 new regions potentially associated with longitudinal change were undertaken in the Copenhagen City Heart Study (CCHS), a prospective study of a random sample of the Danish general population, aged ≥20 years, drawn using the Danish Civil Registration System (n=9016) who had lung function measurements at up to three time points between 1976 and 1994, 24 with a further measurement in 2001-03. 25 Follow-up was also undertaken in the Lung Health Study (LHS), a North American cohort of smokers with mild airflow limitation who had annual lung function measurements for 5 years (n=3502). 26 Risk score analyses with the previously reported 26 variants were also undertaken in CCHS. 5-8 QC procedures (including inspection of cluster plots) were applied and the same model was fitted as for BHS, but without adjustment for relatedness, given that there were no related individuals. Demographic details and mean lung function measurements for CCHS and LHS are shown in table 1.
All risk score and association analyses were undertaken in R using the package 'lme4' (Bates D, Maechler M, Bolker B, et al; http://CRAN.R-project.org/package=lme4).

Known regions: calculation of risk scores and replication
Comparison of effect sizes for cross-sectional lung function and longitudinal change and weighted risk score analyses showed that 26 variants previously identified as associated with crosssectional lung function were not significantly associated with longitudinal change in our cohort.
There was a strong correlation between estimated SNP effects on baseline lung function in BHS and in published estimates [6][7][8] for both FEV 1 (r=0.76) and FEV 1 /FVC (r=0.78). However, estimated SNP effects on change in FEV 1 and FEV 1 /FVC were weakly correlated with published estimates for the respective cross-sectional trait (figure 2 and online supplementary figure  S2).
A weighted risk score was calculated for the joint effect of these 26 regions on baseline FEV 1 and FEV 1 /FVC as well as change over time in both traits. This showed a strong effect on baseline FEV 1 ( p=9.75×10 −12 ) and FEV 1 /FVC (p=4.44×10 −16 ) but no effect was observed on change over time for either FEV 1 ( p=0.409) or FEV 1 /FVC (p=0.160) (table 2).
In silico data were available for eight known variants in CCHS and genotyping was undertaken for the remaining 18 variants. All of these passed QC procedures. They showed a joint effect similar to that seen in BHS: a strong association with baseline measurement ( p=4.45×10 −10 ) but no association with change over time in FEV 1 /FVC ( p=0.302) and strong association with baseline measurement ( p=8.92×10 −7 ) but only borderline association with change over time for FEV 1 (p=0.030) (table 2).

New signals: discovery and replication
Genome-wide analysis identified 56 independent regions which were associated with decline in FEV 1 , FEV 1 /FVC and/or FVC at a significance threshold of p<5x10 −6 . Thirty-nine of these regions (48 variants) were selected for follow-up based on the criteria described under 'Methods': 11 regions (13 variants  (rs6502247) showed an effect of 3.93 mL/year on decline in FEV 1 ( p=3.23×10 −9 ). Effect size estimates for these variants were not attenuated after adjusting for smoking status and no variant-by-smoking interaction met a Bonferroni-corrected threshold for the number of regions tested ( p<1.3×10 −3 ).
De novo genotyping of all 48 variants was undertaken in CCHS. Where genotyping failed, tags were identified, if possible. In total, 34 variants in 30 independent regions passed QC procedures (including the inspection of cluster plots) and 14 failed. The analysis included 9016 individuals with up to four time points (25 796 observations). None of these 34 variants showed replicated association with change in lung function (using a significance threshold of p<0.0016, representing a Bonferroni correction for 30 independent regions with α=0.05) (see online supplementary table S4).
Thirty-one variants (in 26 of the 39 regions) for which in silico data were available were also followed up in LHS. None of these SNPs showed significant association with lung function decline in LHS (using a significance threshold of p<0.0019, representing a Bonferroni correction for 26 independent regions with α=0.05) (see online supplementary table S5). Results from both CCHS and LHS are shown in the online supplementary materials. In total, 21 variants were analysed in both follow-up studies, of these, 14 had the same direction of effect as BHS after meta-analysing results from CCHS and LHS ( p=0.095).
For sentinel variants in regions previously reported to show suggestive evidence of association with longitudinal lung function (though none met genome-wide significance and replicated in previous papers), we assessed association with FEV 1 , FVC and/or FEV 1 /FVC in our data. [26][27][28] Of these 51 variants,10 showed nominal evidence of association with at least one of the lung function traits either in the whole cohort or in one of the smoking subgroups. However, none were significant after correction for multiple testing (using a threshold of p<9.8×10 −4 ) (see online supplementary table S6).

DISCUSSION
We analysed the joint effect of regions previously associated with cross-sectional lung function on longitudinal change in the general population, and undertook a GWAS to identify new signals associated with longitudinal change. Regions previously identified as significantly associated with crosssectional FEV 1 and/or FEV 1 /FVC 5-8 were jointly strongly associated with baseline measurements in both discovery (BHS) and replication (CCHS) cohorts. However, although many of these variants have previously shown association with COPD risk (TNS1, RARB, FAM13A, GSTCD, HHIP, HTR4, ADAM19, AGER, GPR126, C10orf11, THSD4), [29][30][31][32][33] we have shown that they are not associated with change in FEV 1 or FEV 1 /FVC over time in our cohorts. We identified novel variants associated with decline in lung function in BHS which did not replicate in either a general population cohort (CCHS) or a cohort of smokers with mild lung function impairment (LHS).
A key strength of our study design is the improved coverage of both common and low-frequency variants achieved through imputation in the 1000 Genomes Project reference panel, 20 compared with previous studies of longitudinal lung function which have used HapMap reference panels. 26 27 34 Another strength is the high number of lung function measurements in BHS: up to eight measurements over a range of up to 40 years. This is longer than other published studies, including most of the individual studies in a meta-analysis by Tang et al, 34 which included a portion of our dataset. Our study also adds value as it is the first to calculate risk scores for longitudinal lung function. Nevertheless, such a long follow-up period brings some challenges. Spirometry equipment changed over time and the earlier surveys in BHS were performed before protocols for standardisation of spirometry were published. 11 14 15 In addition, repeated measurements over a number of years could train participants in optimal technique and underestimate decline in lung function. The detection of known associations with crosssectional lung function provides some reassurance regarding the extent of any potential measurement error in lung function measurement.
The biggest challenge we faced was the availability of large sample sizes for well-characterised longitudinal measures of lung function. The sample size, in combination with possible measurement error, may have limited our ability to identify individual variants which reach stringent genome-wide significant levels and replicate. However, our study will have had much greater power to detect longitudinal effects of aggregate risk scores comprised of the 26 variants previously reported to be associated with cross-sectional lung function. We did not examine risk scores for FVC, as at the time of the analyses there were no published associations with FVC, and our focus was on the determinants of obstructive lung disease. It is possible therefore that there are genetic associations with change in FVC over time which we did not identify. We did not undertake analyses stratified by sex, for reasons of power. However, effect estimates were adjusted for sex. We did not adjust for smoking in the primary analysis, since the analysis also had potential to highlight novel signals for smoking behaviour. We undertook sensitivity analyses to assess whether any top signals could be mediated by smoking behaviour. The signals were not attenuated after adjustment for smoking. However, this was based on binary smoking status and there remains potential for incomplete adjustment.
Our study-the first to our knowledge to examine risk scores for change in lung function over time-complements existing studies which have sought individual SNP associations, [26][27][28] including a large meta-analysis of longitudinal lung function. 34 The meta-analysis by Tang et al 34 (concurrent with our study) incorporated data from 27 349 individuals from 14 populationbased cohorts (including a subset of 1009 individuals from BHS) and identified evidence for two novel regions associated with rate of change in FEV 1 . However, these did not replicate in two further cohort studies (including LHS). The authors noted that the number of lung function measurements and duration of follow-up varied considerably between studies included in the discovery phase and may have affected their ability to detect associations. The failure to replicate novel associations may relate to lack of power or, given that the larger replication cohort in their study (LHS) was composed of smokers with mild COPD, may indicate that the genetic determinants of lung function decline in those with COPD differ from those in healthy individuals. 34 Similarly, a GWAS examining change in FEV 1 % predicted (in a cohort of smokers with mild lung function impairment from LHS) identified two regions reaching genome-wide significance which did not replicate in four general population cohorts or in a cohort with moderate-to-severe COPD. The authors suggested that regions which modify the effect of cigarette smoke on lung function decline (or determine rate of decline at different stages of COPD) may be distinct from those which influence lung function decline in the general population. 26 A small number of studies have begun to explore interaction between genetic variants and smoking in relation to lung function decline, though these have not identified any significant associations which also replicated. 35 36 The suggestion that variants which determine lung function decline may show heterogeneity across different groups is further supported by findings from an earlier GWAS which identified suggestive evidence that different regions were associated with lung function decline in asthmatic and non-asthmatic individuals. Only the signal in non-asthmatic individuals (rs9316500 in DLEU7) showed evidence of replication (at p<0.05) in largely population-based cohorts, but it did not reach genome-wide significance in discovery, replication or twostage meta-analysis. 27 None of the top variants reported in these papers reached significance after correction for multiple testing in our data.
Our findings are also consistent with recent work by Lange et al 37 which identified two distinct trajectories of FEV 1 in people who developed COPD. In their study, approximately half of those diagnosed with COPD by the end of follow-up started with normal lung function in early adulthood (mean age 40) and then showed rapid decline, whereas the remaining half started with low FEV 1 in early adulthood followed by a relatively normal rate of decline. This suggests that rapid lung function decline in later life is not necessary for development of COPD. We hypothesise that the known genetic variants examined in this paper may exert much of their effect in earlier life. Of the 26 regions examined in this paper, 19 have previously been shown to have directions of effect on lung function in children (aged 7-9 years) consistent with that in adults. 8 An additional study has also shown evidence of association with lung function as early as 5-14 weeks of age for variants in 4 of the 26 loci. 38 However, further large GWAS of lung development are required to test this hypothesis.
These findings emphasise the continuing public health importance of focusing on the key environmental determinants of lung function decline, particularly smoking. Nevertheless, genetic determinants of decline may remain a therapeutic target in the half of people with COPD for whom accelerated decline is important in pathogenesis. 37 A potential strategy to identify these would be to focus on older cohorts and adjust for the effect of all known variants which affect cross-sectional lung function. Large sample sizes will also be the key to help confirm or refute our findings. However, as the largest existing meta-analysis identified significant challenges posed by phenotypic heterogeneity, ensuring comparability of the participating studies' approach to measuring longitudinal change must also be a high priority. An alternative approach would be to study longitudinal lung function in large, more homogeneous populations.
In summary, regions previously identified as significantly associated with cross-sectional FEV 1 and/or FEV 1 /FVC 5-8 were jointly strongly associated with baseline measurements in both discovery and replication cohorts but were not associated with change in FEV 1 or FEV 1 /FVC over time. The present study and others to date have identified no regions associated with lung function decline which reach genome-wide significance and replicate in independent cohorts. Genetic variants identified to date that influence cross-sectional lung function, while still relevant in predicting the risk of COPD, appear to have Betas provided correspond to the risk allele. Risk allele here is defined as the allele associated with decreased lung function in BHS. Variants are given in order of Chr and position. β, per-allele change in FEV 1 , FVC or FEV 1 /FVC; Chr, chromosome.
little or no effect on the rate of change in adult lung function over time in our study.