Genetic and observational evidence supports a causal role of sex hormones on the development of asthma

Introduction Males have a higher prevalence of asthma in childhood, whereas females have a higher prevalence in adolescence and adulthood. The ‘adolescent switch’ observed between sexes during puberty has been hypothesised to be due to fluctuating sex hormones. Robust evidence of the involvement of sex hormones in asthma could lead to development of therapeutic interventions. Methods We combine observational evidence using longitudinal data on sex hormone-binding globulin (SHBG), total and bioavailable testosterone and asthma from a subset of males (n=512) in the Avon Longitudinal Study of Parents and Children, and genetic evidence of SHBG and asthma using two-sample Mendelian randomisation (MR), a method of causal inference. We meta-analysed two-sample MR results across two large data sets, the Trans-National Asthma Genetics Consortium genome-wide association study of asthma and UK Biobank (over 460 000 individuals combined). Results Observational evidence indicated weak evidence of a protective effect of increased circulating testosterone on asthma in males in adolescence, but no strong pattern of association with SHBG. Genetic evidence using two-sample MR indicated a protective effect of increased SHBG, with an OR for asthma of 0.86 (95% CI 0.74 to 1.00) for the inverse-variance weighted approach and an OR of 0.83 (95% CI 0.72 to 0.96) for the weighted median estimator, per unit increase in natural log SHBG. A sex-stratified sensitivity analysis suggested the protective effect of SHBG was mostly evident in females. Conclusion We report the first suggestive evidence of a protective effect of genetically elevated SHBG on asthma, which may provide a biological explanation behind the observed asthma sex discordance. Further work is required to disentangle the downstream effects of SHBG on asthma and the molecular pathways involved.

Sex hormones SHBG and total testosterone were measured in peripheral blood in a subset of 513 males (512 singletons) in ALSPAC (3) . Enzyme-linked immunosorbent assays were used to measure plasma concentrations of SHBG and total testosterone in blood samples, (lithium heparin plasma), using commercially available kits. Male total testosterone measures were standardized by time of venepuncture (since testosterone displays a circadian rhythm) using multilevel modelling (in order to predict testosterone at a standard time of day), as described previously (3) . Separate models were fitted for each time-point as the effect of time of venepuncture varied by age. Age was included as a continuous variable in these models. The time-corrected values of total testosterone were those used in all analyses.
Measures of bioavailable testosterone were derived from measures of total testosterone (not corrected for time of venepuncture or exact age) and SHBG as previously described by Khairullah et al. (3) . Briefly, the equation "Total testosterone= Free testosterone + SHBG-bound testosterone + albumin-bound testosterone" was used, reliant on measures of SHBG assayed from the same samples and estimated concentrations of albumin-bound testosterone from a previously described reference sample (4) . The algorithm has been previously reviewed and shown strong correlations with assayed measures of bioavailable testosterone (5) .

Genotypes
Genetic data for the ALSPAC children were generated by Sample Logistics and Genotyping Facilities at the Wellcome Trust Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe with the Illumina Human Hap 550-quad and the Illumina GenomeStudio calling algorithm. SNPs with more than 5% of missingness, a Hardy-Weinberg-Equilibrium P-value lower than 10 −6 or a minor allele frequency of less than 1% were removed during Quality Control (QC). Samples with indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity were also excluded. SNP imputation was carried out against the 1000 Genomes Project database (www.1000genomes.org). The first 20 Principal Components (PCs) were calculated using 1.1 million HapMap3 tag SNPs.

Respiratory phenotypes
Asthma data was extracted from questionnaires completed by the mothers of the study participants at 10.7, 13.1 and 13.8 years of age, and by the study participants themselves at 16.5 and 22.9 years of age. Both the mothers and the study participants were asked the same question about asthma in the last 12 months, except at 22.9 years when study participants were asked only about ever asthma, wheeze in the last 12 months and if they had taken asthma medications in the last 12 months (6) . For this time point, a measure of current asthma was derived from the responses with current asthma defined as a 'yes' answer to ever asthma and a 'yes' answer to either wheeze symptoms in the last 12 months or asthma medications in the last 12 months. The reference group consisted of those who had responded 'no' to all three questions. Additionally, data on wheeze in the last 12 months were extracted from responses to questions about wheeze in the last 12 months from questionnaires completed by the mothers of the study children at an average age of 10.6, 13 and 13.8 years, and from the study children themselves at 16. 5

Genotypes
For UK Biobank, genotypes were assayed using two different arrays, the Affymetrix UK BiLEVE Axiom or Affymetrix UK Biobank Axiom array. Imputation of genetic variants from the Haplotype Reference Consortium 3 (HRC) was also carried out. Individuals who were reported as outliers based on either genotype missingness rate or heterozygosity were excluded. Individuals whose sex inferred from the genotypes did not match their self-reported sex and individuals who demonstrated sex chromosome aneuploidy were also excluded. Finally, individuals whose ancestry was not European or who demonstrated relatedness to other study participants in UK Biobank based on kinship coefficients were removed.

GWAS of SHBG
A previous large-scale GWAS of circulating SHBG has been conducted by Coviello et al. in 2012 (7) . The study was conducted in 21,791 individuals (9,390 women and 12,401 men), from 10 cohorts and validated in 7,046 individuals (4,509 women and 2,537 men) from a further 6 collections of data. Mean age for the discovery cohorts was 19-74 years, whereas for the replication cohorts it was 32-75 years. The study reported 12 genetic variants (SNPs) associated with SHBG. These variants were mapped to SHBG (rs12150660), PRMT6 (rs17496332), GCKR (rs780093), ZBTB10 (rs440837), JMJD1C (rs7910927), SLCO1B1 (rs4149056), NR2F2 (rs8023580), ZNF652 (rs2411984), TDGF3 (rs1573036), as well as 2 conditional SNPs, LHCGR (rs10454142) and BAIAP2L1 (rs3779195), and one sex-specific SNP, UGT2B15 (rs293428). Conditional analysis at the SHBG gene locus identified 4 independent signals at the genome-wide significance threshold (rs12150660, rs6258, rs1641537 and rs1625895). The lead SNP of the SHBG locus (rs12150660) was estimated to account for ∼7.8% and ∼3.3% of the variation in circulating SHBG in men and women, respectively, assuming 50% heritability. For all analyses, summary statistics from the combined discovery and replication cohorts were used, except for the SNPs from the SHBG locus conditional analyses where summary statistics were only available from the discovery cohort (replication was not attempted).

Statistical analyses
Observational analyses Path analysis Path analysis (a type of multiple regression analysis) describes associations that are hypothesized to be causal. Path diagrams represent plausible causal effects between variables whilst taking the temporal relationships of exposures, outcomes and confounders into consideration, with sex hormones considered the exposure and asthma reports the outcome. Each subsequent measure of asthma from the initial measurement and each outcome measure (SHBG or testosterone) is assumed to cause the next, in a chain of causation. Compared to serial cross-sectional regression models, the path analysis introduces additional assumptions as to the hypothesised causal paths and the temporal sequence of measures, therefore describing dependencies between the variables of the dataset. The path analysis has two main analytical advantages over classical regression analysis. Firstly, it is possible to test whether the associations between dependent and independent variables are the same across the time-points (something which we didn't end up testing as the results indicated no evidence of association at any of the five time-points). Secondly, path analysis may have increased statistical power when compared to classical regression if there is incomplete data and a full information maximum likelihood (FIML) approach is used. However, since we used multiple imputation to fill in missing responses in all variables this advantage does not directly apply to our analyses.
All path analysis models were adjusted for maternal confounders; maternal smoking during pregnancy (never, temporary, throughout pregnancy), maternal education (university degree, Alevels, O-levels or lower), parity (nulliparous, multiple pregnancies), gestational age, maternal age at birth and participants age at the time of asthma measurement (as a time-varying covariate), at each of the five analysed time-points.

Multiple imputation
Multiple imputation was used to fill in missing values in the exposure, outcome and covariates, in both the cross-sectional regressions and the path analysis. The ICE procedure in STATA 14.2 (8)(9)(10) was used to perform the imputation. Due to non-normality in some of the sex hormone measures, predictive mean matching (an approach that relaxes normality assumptions in the imputed measures) was used for imputation of SHBG and testosterone values. Additionally, in order to increase imputation efficiency, auxiliary wheeze variables were included in the imputation model: responses to questions about wheeze in the last 12 months from questionnaires completed by the mothers of the study children at 10.6, 13 and 13.8 years, and from the study children themselves at 16.5 and 18.6 years. All maternal confounders as well as participants age at each time-point were included in the imputation model. Each path analysis model was then fit to 100 imputed datasets and estimates were combined using Rubin's rules (11) to obtain an overall effect size and standard error. Results using the imputed dataset were compared to the complete case analysis.

Mendelian randomization
For the inverse-variance weighted (IVW) approach which was used as the primary analysis, fixedeffects estimations were used when using 3 SNPs or fewer and random-effects for 3 SNPs or more.    to the number of individuals with complete data in the analysis of specified hormone (before multiple imputation). Percent (%) missing refers to the fraction of individuals missing some data (either asthma measurements, hormone measurements or covariates, where covariates include any previous measurement of asthma and hormone) at each time-point, that were subject to multiple imputation. † OR for asthma per standard deviation (SD) increase in either SHBG, TT or BT.     (12) †Effect estimate, standard error and p-value from the combined discovery plus follow-up analysis by Coviello et al. (7) for the whole sample, except for the independent SNP analysis where only the discovery sample was used. ‡Main GWAS analysis by Coviello et al. or independent SNPs analysis of the SHBG gene region.   .117) suggested some heterogeneity and horizontal pleiotropy in IV combination B for females. A sensitivity analysis using sex-specific SNP-SHBG effects from the sexstratified GWAS (7) did not indicate any substantial differences in observed effects when compared to the whole-sample GWAS effects (data not shown).