Genetically raised serum bilirubin levels and lung cancer: a cohort study and Mendelian randomisation using UK Biobank

Background Moderately raised serum bilirubin levels are associated with lower rates of lung cancer, particularly among smokers. It is not known whether these relationships reflect antioxidant properties or residual confounding. Objective This study aimed to investigate potential causal relationships between serum total bilirubin and lung cancer incidence using one-sample Mendelian randomisation (MR) and UK Biobank. Methods We instrumented serum total bilirubin level using two variants (rs887829 and rs4149056) that together explain ~40% of population-level variability and are linked to mild hereditary hyperbilirubinaemia. Lung cancer events occurring after recruitment were identified from national cancer registries. Observational and genetically instrumented incidence rate ratios (IRRs) and rate differences per 10 000 person-years (PYs) by smoking status were estimated. Results We included 377 294 participants (median bilirubin 8.1 μmol/L (IQR 6.4–10.4)) and 2002 lung cancer events in the MR analysis. Each 5 μmol/L increase in observed bilirubin levels was associated with 1.2/10 000 PY decrease (95% CI 0.7 to 1.8) in lung cancer incidence. The corresponding MR estimate was a decrease of 0.8/10 000 PY (95% CI 0.1 to 1.4). The strongest associations were in current smokers where a 5 μmol/L increase in observed bilirubin levels was associated with a decrease in lung cancer incidence of 10.2/10 000 PY (95% CI 5.5 to 15.0) and an MR estimate of 6.4/10 000 PY (95% CI 1.4 to 11.5). For heavy smokers (≥20/day), the MR estimate was an incidence decrease of 23.1/10 000 PY (95% CI 7.3 to 38.9). There was no association in never smokers and no mediation by respiratory function. Conclusion Genetically raised serum bilirubin, common across human populations, may protect people exposed to high levels of smoke oxidants against lung cancers.


Data source
Participants attended 22 centres with locations selected to ensure representation of people from different socioeconomic, ethnic and urban-rural backgrounds. This ongoing study collects data from questionnaires, sample assays, physical measures, genome-wide genotyping and follow-up for a wide range of health-related outcomes some of which are linked to national registers and electronic health records. Genome-wide genotype data was available for two microarrays; the Affymetrix UK Biobank Axiom® array for most participants and the Applied Biosystems™ UK BiLEVE Axiom™ Array by Affymetrix for a smaller subset (n=49,950) 1 . Details on the quality control and imputation of SNPs, indels and structural variants are reported elsewhere 1 .

Observational associations -further details
For bilirubin and the time scale (age), we explored non-linear relationships by applying cubic spline-interpolation using Harrell's default percentiles and selecting the transformation that minimised the Akaike and Bayesian information criteria (AIC/BIC) 2 . We applied a userwritten programme for data visualisation 3 . Serum bilirubin data is slightly right-skewed, and we also checked for non-linear relationships following log-transformation. For both the observed and genetically predicted bilirubin levels, we checked for proportionality of associations with age by testing interaction terms. All continuous covariates were parameterised as linear in the regression models and Wald tests were used for calculating p-values for categorical variables and spline transformations.

Genetically instrumented associations -further details
We combined the effects of the two SNPs on bilirubin levels to estimate the incident rate ratios (IRRs) for lung cancer per five μmol/L increase genetically predicted bilirubin using one-sample MR and the two-stage predictor substitution (2SPS) method 4 . In brief, bilirubin levels were regressed against the two SNPs to give the fitted "unconfounded" bilirubin levels. We modelled the SNPs as three-level categories to capture non-additive relationships with serum bilirubin. These fitted values were then used as the exposure in a Poisson model of lung cancer incidence. Robust standard errors were calculated to account for the added uncertainty of using previously fitted values as the exposure in the second stage of the regression 4 .
We examined whether other factors associated with lung cancer (FEV1/COPD/emphysema and family history of lung cancer) were intervening/mediating variables in the relationship between bilirubin and lung cancer. The method of spirometry at baseline is reported in detail elsewhere 5 and we used the maximum value of the measures meeting the assessor's acceptability criteria. We estimated the observational relationship between bilirubin and baseline FEV1 using linear regression. We identified and excluded outlier values of bilirubin and FEV1 using multivariate approach (blocked adaptive computationally efficient outlier nominators algorithm) with a 15% threshold of the chi-squared distribution used to separate outliers from non-outliers 6 .
We used a similar approach, the two stage least squares method (2SLS), to estimate the causal cross-sectional relationship between bilirubin and FEV1 4 . FEV1 was missing for approximately 25% of participants and were missing not at random with respect to other risk factors. We used inverse probability weighting in an attempt to reduce the impact of any selection bias where each participant was weighted by their likelihood of providing an acceptable FEV1 reading. Probability weights were calculated using a logistic regression where missing FEV1 was the outcome and covariates included age, gender, height, weight, smoking status, lung cancer events, genotypes and bilirubin levels. Due even higher levels of missing FEV1 data of around 50% for smokers once applying the ERS/ATS criteria for FEV1 reproducibility, this analysis was not done.
Recent use of respiratory medication was self-reported by participants at baseline and included treatments for asthma, hay fever, emphysema, chronic bronchitis, COPD, cystic fibrosis, alpha-1 antitrypsin deficiency, sarcoidosis, bronchiectasis, idiopathic pulmonary fibrosis, fibrosing alveolitis/unspecified alveolitis, silicosis, asbestosis and tuberculosis.
These medications could affect FEV1 readings and so we assessed the impact of excluding participants reporting to be on these drugs.

Interactions with other variables
Other environmental sources of oxidants include passive smoking at home or in the workplace and air pollution. As a supplemental analysis, we examined whether there were interactions between these variables, serum bilirubin and lung cancer risk. Only participants who reported to not smoke regularly had data available on smoking outside of the home.  Other sensitivity/supplemental analyses Other potential confounding variables for the observational relationships with bilirubin included passive smoking, occupational exposure to smoke, antioxidant supplements (vitamin C, vitamin E and b-carotene), social deprivation, air pollution (NO2 and PM2.5), and liver blood tests (alkaline phosphatase, alanine aminotransferase and gamma glutamyl transferase). We examined the effect of adjustment for these additional variables for a subsample with complete data on all covariates. We also adjusted for the microarray identity (UK BiLEVE) under the caveat that this could introduce collider bias for respiratory outcomes.
Unconjugated bilirubin is the specific endogenous substrate for the UGT1A1 enzyme.
Direct/conjugated bilirubin was recorded for a subset of participants (n=306,070), which means by subtraction (serum total bilirubin minus direct bilirubin=indirect bilirubin) we could BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) also estimate the causal relationships indirect/unconjugated bilirubin. These estimates could be more precise than using total serum bilirubin, which will also capture increases the conjugated fraction due to common diseases.
Serum total bilirubin has been associated with a range of other age-related diseases. We therefore ran supplemental analyses with mortality from any cause and cancer mortality as the outcomes under the caveat that we expected weaker associations due to the inclusion of events unrelated to oxidant exposure. Complete mortality data was available up to 31 st January 2018 for England and Wales and 30 th October 2016 for Scotland.
Finally, we checked the effect of restricting the MR analyses to the UGT1A1 rs887829 variant.

Supplementary results
Including additional covariates (occupational smoke exposure, household smoke exposure, antioxidant supplements, waist circumference, air pollution -NO2 and PM2.5, liver enzymes) also had no meaningful impact on the observational or causal estimates for any outcomes ( Figure S1). Excluding participants on respiratory medication from the analyses of FEV1 had no impact on the estimates. We found no relationships between observed or genetically predicted bilirubin and the negative control cancers though incident rates by smoking status need to be interpreted with caution because smokers may die from lung cancer before they can develop these cancers (Table S4). We reran the analyses using unconjugated bilirubin instead of total bilirubin but this did not improve the precision of causal estimates (Table S5).
Excluding rs4149056 from the MR analysis had a no real impact and changed the IRRs by >= 0.01. There was no strong evidence of interaction with other variables that influence exposure to oxidants, though these variables had a much weaker effect on lung cancer relative to cigarette smoking. Genetically raised bilirubin was weakly associated with lower rates of lung cancer in first degree relatives of smokers and slightly higher prevalence of self-reported COPD/emphysema at baseline (Table S6).
We found non-linear relationships (four-knot cubic spline transformation) between serum total bilirubin levels and mortality from any cause with much higher rates at very low bilirubin levels ( Figure S2). Unlike for lung cancer there was limited evidence of a multiplicative interactions with smoking status and low bilirubin was associated with excess mortality for never and former smokers. For participants with a history of regular smoking, the associations were also non-linear and broadly similar for women and men ( Figure S2).  There was an uptick in rates at higher levels of bilirubin for current smokers ( Figure S2 A & C) but no such trend was apparent after adjusting for smoking intensity and duration (packyears) (Figure S2 B & D). There were negative associations with genetically predicted serum total bilirubin levels and mortality that were stronger in regular smokers (Table S7). There was also an association with cancer mortality in participants with a history of smoking regularly that remained after excluding lung cancer deaths from the analysis (IRR:0.96 (95%CI: 0.92,0.99);p=0.027) suggesting a role for bilirubin in other cancers related to smoking (data not shown).     *Mid-point value of serum bilirubin quintiles plus 17 μmol/L for assessing the rates above the bilirubin level often used to diagnose Gilbert's syndrome.
**Holding age, gender, calendar year, height, weight, ethnicity (first 40 principal components) and recruitment centre at observed values for the full dataset.
***Holding age, gender, calendar year, ethnicity (first 40 principal components) and recruitment centre at observed values for the full dataset.
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)  Estimates derived using a one-sample MR approach and the two-stage predictor substitution (2SPS) method. **Adjusted for pack-years in overall analysis of regular smokers. Participants currently smoking less than 1 cigarette per day at recruitment are excluded from the smoking sub-categories but included in the overall analysis of regular smokers if they had formerly smoked one or more per day and it was possible to calculate pack-years.    , recruitment centre and smoking status. Participants preferring not to report smoking status were excluded due to low events. **Adjusted for pack-years in overall analysis of regular smokers. Participants currently smoking less than 1 cigarette per day at recruitment are excluded from the smoking sub-categories but included in the overall analysis of regular smokers if they had formerly smoked one or more per day and it was possible to calculate pack-years.
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)