Article Text

Download PDFPDF

Original research
Genetically raised serum bilirubin levels and lung cancer: a cohort study and Mendelian randomisation using UK Biobank
  1. Laura Jane Horsfall1,
  2. Stephen Burgess2,3,
  3. Ian Hall4,
  4. Irwin Nazareth1
  1. 1Department of Primary Care & Population Health, UCL, London, UK
  2. 2MRC Biostatistics Unit, University of Cambridge Institute of Public Health, Cambridge, Cambridgeshire, UK
  3. 3BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, Cambridge University, Cambridge, Cambridgeshire, UK
  4. 4Division of Respiratory Medicine, University of Nottingham, Nottingham, Nottinghamshire, UK
  1. Correspondence to Dr Laura Jane Horsfall, Department of Primary Care & Population Health, UCL, London, London NW3 2PF, UK; laura.horsfall{at}


Background Moderately raised serum bilirubin levels are associated with lower rates of lung cancer, particularly among smokers. It is not known whether these relationships reflect antioxidant properties or residual confounding.

Objective This study aimed to investigate potential causal relationships between serum total bilirubin and lung cancer incidence using one-sample Mendelian randomisation (MR) and UK Biobank.

Methods We instrumented serum total bilirubin level using two variants (rs887829 and rs4149056) that together explain ~40% of population-level variability and are linked to mild hereditary hyperbilirubinaemia. Lung cancer events occurring after recruitment were identified from national cancer registries. Observational and genetically instrumented incidence rate ratios (IRRs) and rate differences per 10 000 person-years (PYs) by smoking status were estimated.

Results We included 377 294 participants (median bilirubin 8.1 μmol/L (IQR 6.4–10.4)) and 2002 lung cancer events in the MR analysis. Each 5 μmol/L increase in observed bilirubin levels was associated with 1.2/10 000 PY decrease (95% CI 0.7 to 1.8) in lung cancer incidence. The corresponding MR estimate was a decrease of 0.8/10 000 PY (95% CI 0.1 to 1.4). The strongest associations were in current smokers where a 5 μmol/L increase in observed bilirubin levels was associated with a decrease in lung cancer incidence of 10.2/10 000 PY (95% CI 5.5 to 15.0) and an MR estimate of 6.4/10 000 PY (95% CI 1.4 to 11.5). For heavy smokers (≥20/day), the MR estimate was an incidence decrease of 23.1/10 000 PY (95% CI 7.3 to 38.9). There was no association in never smokers and no mediation by respiratory function.

Conclusion Genetically raised serum bilirubin, common across human populations, may protect people exposed to high levels of smoke oxidants against lung cancers.

  • lung cancer
  • oxidative stress
  • smoking cessation
  • tobacco and the lung

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

View Full Text

Statistics from

Key messages

What is the key question?

  • Is there evidence that moderately raised serum bilirubin, a purported antioxidant, can protect against lung cancer and does the relationship differ according to smoking status?

What is the bottom line?

  • Current and former cigarette smokers genetically predisposed to raised levels of serum total bilirubin have lower rates of lung cancer with the association strongest in current heavy smokers. There was no relationship between genetically instrumented or observed bilirubin levels for never smokers.

Why read on?

  • This is the first study to support a potential causal association between serum bilirubin and the incidence of lung cancer in cigarette smokers.


Lung cancer is the leading cause of cancer death globally due to the high incidence and poor prognosis.1 Risk assessment for lung cancer in primary care settings relies on self-reported smoking status, which can be challenging to quantify accurately. Identifying simple blood biomarkers for lung cancer, comparable with cholesterol in cardiovascular disease, could improve the cost-effectiveness of new screening programmes2 and potentially lead to new treatments for oxidative stress-mediated diseases.

Following the natural death of red blood cells, a series of reactions initiated by the haem oxygenase enzymes one and two (HMOX1/HMOX2) generates the yellow pigment bilirubin. At first, bilirubin is water insoluble and is transported in the blood serum to the liver where it is converted to a soluble (conjugated) form for elimination by the enzyme uridine diphosphate-glucuronosyltransferase 1–1 (UGT1A1). Bilirubin levels are frequently tested in primary healthcare settings to assess liver function and guide prescribing decisions for drugs including statins. Various lines of evidence, such as the high frequency of UGT1A1 alleles causing mild hereditary hyperbilirubinaemia (Gilbert’s syndrome; OMIM#143500), has led to speculation that moderately raised levels may have a physiological benefit for humans.3–5 Investigations using blood serum, tissue cultures and animal models suggest that bilirubin has potent antioxidant properties that may help protect respiratory tissues against oxidative stress.5–8 For example, the enzymes that generate bilirubin (biliverdin reductase, HMOX1) are found at their highest concentrations in respiratory tissues 9 and a recent network analysis of the effect of cigarette smoke on the lung tissue of mice identified dramatic increases in bilirubin production implying a protective role.10 Population-based cohort studies have also linked moderately raised serum bilirubin levels with lower rates of respiratory diseases and lung cancer with the strongest associations in cigarette smokers.11–14 Although these observations are consistent with an endogenous antioxidant function for bilirubin, the relationships with lung cancer could reflect residual confounding or reverse causation. For instance, liver diseases, infections, acute events such as heart attacks, certain drugs and environmental exposures can influence bilirubin levels.11 15 16

Mendelian randomisation (MR) is an approach that, with important assumptions, could address certain limitations of observational studies of bilirubin and provide more robust support of a causal mechanism.17 First, the random allocation of alleles during gamete formation and conception should balance observed and unobserved confounders across genetic groups. Second, genotype status is unaffected by disease processes and reverse causation is unlikely. Finally, genotyping is less prone to random error and regression dilution bias. Here, we present a large-scale MR examining the relationship between genetically raised serum bilirubin and lung cancer in adults participating in UK Biobank. We also report the cross-sectional associations between bilirubin and respiratory function as a potential mediator of the relationship with lung cancer.


Data source

The UK Biobank Resource is a prospective cohort study of over 500 000 participants aged 40–69 years and recruited between 2006 and 2010 from different regions of the UK.18 Further information on UK Biobank is provided in the online supplementary methods and at the website (

Study design

Initially, to verify previous studies, we analysed the observational relationship between serum total bilirubin levels measured at baseline and incident lung cancer. We also examined the cross-sectional relationship between observed serum bilirubin and FEV1. Second, using MR approach, we estimated the potential causal relationships between bilirubin levels and these outcomes using individual-level data. The study protocol was approved by UK Biobank in July 2018 (ID: 5167) and adequacy of sample size was checked using online tools (

Inclusion/exclusion criteria

After excluding people no longer wishing to participate, the total number of participants available for analysis was 502 527. We applied several genetic exclusions including those recommended by UK Biobank, outliers for genotype missingness or excess heterozygosity, sex aneuploidy and sex discordance (n=2200). We used a published algorithm to retain unrelated participants19 (n=39 642) and finally restricted the sample to ‘white British’ participants using a combination of self-reported ethnic identity and the results of an existing principal components analysis available in the dataset (n=88 341).20 Participants entered the cohort on the date they attended the research centre and were censored at the earliest date of lung cancer diagnosis, loss to follow-up, death or end of the follow-up period. The most recent date of complete monitoring for incident cancers at the time of analysis was 31 March 2016 for England and Wales and 31 October 2015 for Scotland. Participants with a history of lung cancer at the time of recruitment were excluded from the primary analysis of lung cancer incidence (n=527).


Blood samples were collected at baseline from all participants, and serum total bilirubin was assayed by photometric colour (Beckman Coulter AU5800). Genome-wide association studies (GWAS) have repeatedly identified UGT1A1 at 2q37 as the major locus underlying serum bilirubin levels across human populations.21–23 A GWAS of British subjects found the single nucleotide polymorphism (SNP) rs887829 explained 39%–43% of the variation in bilirubin levels.24 This SNP is in almost complete linkage disequilibrium (r2=0.99) with the functional repeat-length polymorphism of the UGT1A1 promoter region underlying Gilbert’s syndrome.3 25 Bilirubin GWAS have consistently replicated another signal at 12p12.22 25 26 The nonsynonymous SNP rs4149056 (Val174Ala) of the Solute Carrier Organic Anion Transporter Family Member 1B1 (SLCO1B1) is a lead causal SNP in this region and transports bilirubin from blood into the liver. Both rs887829 and rs4149056 were included on the microarrays and we used these variants to instrument bilirubin levels. Missing genotypes were replaced with imputed data where available.20 Based on known substrates for these enzymes and searches of online GWAS repositories, we did not expect any important confounding of the genetic relationships by other mechanisms (pleiotropy).


The primary outcome is incident lung cancer recorded following study recruitment. Prevalent and incident cancer diagnoses in UK Biobank are provided by The Health & Social Care Information Centre for participants residing in England and Wales, and the NHS Central Register for participants residing in Scotland. These national cancer registries obtain information from a range of sources including hospitals, treatment centres, hospices and nursing homes, private hospitals, general practices, death certificates and Hospital Episode Statistics. Underlying (primary) cause of death is also provided from central registers. Diagnoses and causes of death are coded using the International Classification of Disease (ICD) version 9 and 10. Malignant neoplasms of the trachea and bronchus (ICD10: C33-C34) are the cancers where smoking has the strongest pathophysiological role and highest attributable risk at >70%, and we selected these cancers as our primary outcome.27 A self-reported cancer diagnosis is also available and used in addition to ICD codes to identify participants’ cancer history.

Other risk factors for lung cancer including FEV1 and family history of lung cancer, and comorbidity for COPD or emphysema are potentially on the causal pathway between bilirubin antioxidant activity and lung cancer. We therefore examined these separately as secondary outcomes in a series of cross-sectional analyses (online supplementary methods). Oxidative stress is thought to have a pathophysiological role in many age-related diseases, and as further supplemental analyses, we also explored relationships with mortality from any cause and cancer mortality.


Basic characteristics and known predictors of lung cancer were included in analyses to improve the precision of the estimates.17 28 Depending on the outcome, these included age, calendar year, genetic sex, recruitment centre, height, weight and self-reported smoking status. We also included the top 40 principal genetic components to account for any remaining population substructure. Other variables we examined in supplemental analyses of a subsample with complete data on all covariates included passive smoking, occupational exposure to smoke, antioxidant supplements (vitamin C, vitamin E and β-carotene), social deprivation, air pollution (NO2 and PM2.5), waist circumference and liver blood tests (alkaline phosphatase, alanine aminotransferase and gamma-glutamyl transferase).


Studies have reported negative associations with bilirubin and associated UGT1A1 genotypes that are strongest in current and heavy smokers, which is consistent with endogenous antioxidant properties.12 29 30 Therefore, we examined interactions with self-reported smoking behaviour and categorised participants into never, former and current smokers. To examine smoking intensity more closely, we further divided smokers who indicated they currently or previously smoked tobacco on most or all days as ever light-moderate smokers (1–19 cigarettes per day) or heavy smokers (20 or more cigarettes per day). Pack-years of smoking had previously been calculated for participants who smoked tobacco on most or all days and also reported their smoking duration.

Statistical analyses

Observational relationships

Serum bilirubin levels were divided into sex-specific quintile categories to describe the relationships with other covariates. We used multivariable Poisson regression with age as the time scale to estimate the lung cancer incidence rate ratios (IRRs) with robust SEs per 5 μmol/L increase in serum total bilirubin (entered as a continuous variable). We also explored non-linear relationships with serum bilirubin and age by applying cubic spline-interpolation and selecting the transformation that minimised the Akaike and Bayesian information criteria (AIC/BIC). The goodness of fit to the Poisson distribution was checked using the deviance statistic and by fitting negative binomial models and comparing outputs. We ran an overall analysis and then added multiplicative interaction terms to derive the smoking-specific IRRs. Multiplicative interaction terms on the ratio scale are difficult to interpret for non-linear models such as Poisson. Also, the risk of lung cancer varies dramatically by smoking status and estimates on the incidence rate (additive) scale are also useful for interpretation. We therefore calculated the margins of response as adjusted incidence rates and rate differences at different levels of bilirubin (1–30 μmol/L) while holding all other variables at their observed values. SEs for marginal effects were calculated using the delta method. Further details on methods and various sensitivity and cross-sectional analyses of potential mediators are reported in the online supplementary methods.

Genetically instrumented relationships

We calculated the crude lung cancer incidence rates across genotypes stratified by smoking status. For the UGT1A1 variant (rs887829), we estimated the relationship between homozygosity for the T allele linked to Gilbert’s syndrome and lung cancer on the rate ratio scale and incidence rate difference scale. We then combined the effects of rs887829 and rs4149056 on bilirubin levels to estimate the IRRs for lung cancer per 5 μmol/L increase genetically predicted bilirubin using one-sample MR and the two‐stage predictor substitution (2SPS) method.17 Further details are reported in the online supplementary methods.

All statistical analyses were conducted using Stata V.16.1 (Stata Corporation, College Station, Texas, USA) except for the exclusion of relatives, which was done using R (V.3.5.1).


After excluding outlier values (n=414), serum total bilirubin levels were available for 357 802 participants with a median value of 8.1 μmol/L (IQR 6.4–10.4) (table 1). There were 2.5 million person-years of follow-up, 1917 incident cases of lung cancer diagnosed after recruitment, 15 532 deaths from any cause and 768 participants were lost to follow-up for various reasons including leaving the UK. Men and women with low serum bilirubin were heavier, more likely to smoke and live in socially deprived areas (table 1). A diagnosis of lung cancer or COPD/emphysema prior to recruitment was twice as common the lowest bilirubin quintile versus the highest (table 1).

Table 1

Baseline characteristics of UK Biobank participants by sex-specific quintiles of serum bilirubin levels

There was a disagreement between the Akaike and Bayesian information criteria on the functional form for bilirubin levels and we have therefore reported the estimates assuming a linear relationship with lung cancer (BIC) and a three-knot cubic spline transformation (AIC) (figure 1A and table 2). Assuming a linear relationship, each 5 μmol/L increase in adjusted serum bilirubin was associated with a 1.2 per 10 000 PY decrease in lung cancer (95% CI 0.7 to 1.8) (table 2 and online supplementary table S1). The strongest relationships were in current heavy smokers with an estimated decrease in lung cancer of 18.4 per 10 000 PY (95% CI 3.4 to 33) per 5 μmol/L increase in serum bilirubin (table 2, figure 1A and online supplementary table S1). The predicted incidence using the cubic-spline transformation and adjusting for confounders shows that the negative association for smokers is steepest at lower bilirubin levels (figure 1A). The independent associations between serum bilirubin and lung cancer remained after restricting the analysis to participants with a history of smoking at least one cigarette per day and adjusting for pack-years of smoking (table 2 and online supplementary table S1). Based on the predictive margins, the incidence of lung cancer in current smokers with a bilirubin level >17 μmol/L (often used to diagnose Gilbert’s syndrome) is around 35%–50% lower relative to a similar group of smokers in the lowest bilirubin quintile (figure 1 and online supplementary table S3). The AIC and BIC favoured a three-knot cubic spline transformation of bilirubin levels for participants with a history of smoking regularly (figure 1B). Including further covariates for a subset with complete data (1666 events) slightly attenuated the relationships (online supplementary figure S1). Serum bilirubin levels contributed to all models with spline interpolation (Wald test; p<0.0001).

Figure 1

Margins of response by smoking status as adjusted lung cancer incidence rates with 95% CIs (shaded) at different levels of observed bilirubin (panels A and B) and genetically predicted bilirubin (panels C and D) while holding all other variables at their observed values. Variables included age, gender, calendar year, ethnicity (first 40 principal components) and recruitment centre. Predictions for observed bilirubin also include height and weight. Non-linear relationships were captured using cubic spline transformation with three knots placed at the 10th, 50th and 90th percentiles of bilirubin levels. Values with 95%CIs are reported in online supplementary table S3.

Table 2

Observational relationships between mean serum total bilirubin and lung cancer incidence by smoking status in white British unrelated participants in UK Biobank

The genetic analysis included 377 294 participants and 2002 lung cancer events. We confirmed that the selected SNPs were strongly associated with bilirubin levels with the UGT1A1 rs887829 variant explaining 37% (F statistic=57 913) of the variability and a non-additive effect of UGT1A1 rs887829 on bilirubin levels (table 3). In contrast to observed bilirubin levels (table 1), there were no associations between UGT1A1 rs887829 status and the selected covariates (online supplementary table S2). Across most smoking strata, genotypes associated with the highest average bilirubin levels were associated with the lowest rates of lung cancer, particularly for current smokers (table 3, figure 2). In current smokers, the incidence of lung cancer in participants with the rs887829 TT genotype linked to Gilbert’s syndrome were estimated to be 11.1 per 10 000 PYs lower (95% CI 4.1 to 18.1) compared with the other genotypes with the association even stronger in current heavy smokers at 38.7 per 10 000 PYs lower (20.6 to 56.9) (table 4).

Figure 2

Nelson-Aalen estimate of the cumulative hazard function for the UGT1A1 rs887829 genotype and lung cancer by smoking status.

Table 3

Genotype frequency and association with serum bilirubin levels and crude lung cancer incidence rates for SNPs used in genetic instrumentation

Table 4

Lung cancer incidence by UGT1A1 rs887829 genotype and smoking status reported on the ratio and rate difference scales

The associations between genetically predicted bilirubin and lung cancer using the two SNPs was similar, although weaker than the observational relationships with levels measured at baseline (figure 1, table 5, online supplementary tables S1 and S3). The strongest associations were in current heavy smokers with each 5 μmol/L increase in genetically predicted serum bilirubin associated with a 23.1 per 10 000 PY decrease in incidence (95% CI 7.3 to 38.9). Adjusting for covariates made no material difference to the MR estimates (online supplementary table S1).

Table 5

Associations between genetically predicted serum total bilirubin and lung cancer incidence by smoking status in white British unrelated participants of UK Biobank

We found weak cross-sectional relationships between raised serum bilirubin measured at baseline and higher baseline FEV1 after adjustment for covariates (table 6, figure 3 and online supplementary figure S1). A three-knot cubic spline transformation was the best fit to the data with a stronger positive relationship at lower values of bilirubin (figure 3). Genetically increased bilirubin was associated with a slightly lower FEV1 for most smoking categories (table 6). The highest levels of missing data for FEV1 were in smokers and people preferring not to declare smoking status (table 6). Adding inverse probability weights to the regression models to try and account for selection bias reduced the strength of the relationship across most smoking strata (table 6).

Figure 3

Margins of response by smoking status for FEV1 with 95% CIs (shaded) at different levels of bilirubin while holding all other variables at their observed values. Predictions are for age, gender, calendar year, ethnicity (first 40 principal components), height, weight and recruitment centre. Non-linear relationships were captured using cubic spline transformation with three knots placed at the 10th, 50th and 90th percentiles of bilirubin levels.

Table 6

Observational and genetically instrumented relationships between bilirubin and FEV1 overall and by smoking status

Genetically higher bilirubin was associated with slightly lower odds of a family history of lung cancer in smokers and a slightly higher odds of COPD overall (online supplementary table S6). No observational or causal relationships were evident for the negative control cancer outcome (online supplementary table S5). There were non-linear relationships for observed serum total bilirubin with mortality from any cause (online supplementary figure S2) with bilirubin levels below 7 μmol/L associated with substantially raised rates. There was evidence that genetically increased bilirubin was associated with lower all-cause mortality (IRR 0.95 (95% CI 0.91 to 1.0) p=0.034) and cancer mortality (IRR 0.92 (95% CI 0.87 to 0.98); p=0.0075) in participants with a history of smoking regularly (online supplementary table S7). The association with cancer mortality weakened after excluding lung cancer mortality (IRR 0.96 (95% CI 0.92 to 0.99); p=0.027). The remaining sensitivity/supplemental analyses had no meaningful impact on the overall conclusions (supplementary results).


In the present study, we confirm earlier reports of a negative observational relationship between serum total bilirubin and lung cancer that remains after adjustment for smoking pack-years and many other variables. We also report for the first time, that smokers with genetically raised serum bilirubin levels have lower rates of lung cancer and these relationships are strongest in current heavy smokers. These findings are potentially consistent with a second-line antioxidant function where raised serum bilirubin levels are beneficial once the first-line antioxidant defences of the epithelial lining fluid of the lungs are depleted.

Strengths and limitations

The main advantages of the present study are the large sample size, the use of a powerful and specific genetic instrument, and the longitudinal analysis for lung cancer. The limitations include the short length of follow-up, use of self-report for smoking status and some other variables. Adjustment for variables such as diet, blood counts and cholesterol levels may have further attenuated the observational relationships with our primary and supplementary outcomes. The lowest predicted level for bilirubin was 7 μmol/L and we were unable to determine whether the non-linear relationships seen for observed bilirubin, with much higher disease/mortality rates at very low levels of bilirubin, were potentially causal. The older age at study recruitment and low rates of participation could also be problematic in terms of selection bias. The UK Biobank participants tend to be taller, leaner and there are fewer smokers relative to the UK population.31 Rates of smoking-related diseases are therefore reported to be much lower. For example, the prevalence of self-reported COPD in UK Biobank is 0.1% in middle-aged men versus 1% in the Health Survey for England and lung cancer rates are around 50% lower compared with the general population.31 Within smokers, heavy smokers are over-represented relative to the general population meaning UK Biobank could be enriched for ‘resistant’ smokers. Selection bias is a recognised problem for UK Biobank and is potentially a problem for our study if people with genetically low bilirubin are less likely to recruited into UK Biobank due to death or poor health.32 In contrast to our findings for lung function, two other studies, including a British cohort followed since birth and at lower risk of selection bias, reported improved lung function for people with alleles linked to Gilbert’s syndrome.29 33 Considering also the direction of our MR estimates for COPD and a family history of lung cancer, it seems plausible that selection bias could have diluted any relationship between genetically raised bilirubin and improved respiratory outcomes. The increase in the MR-effect estimates for both lung cancer and family history of lung cancer as smoking levels increased is reassuring against chance findings, although some of the smoking specific strata have few events and replicating our results in similarly large cohorts is desirable. Finally, horizontal pleiotropy, whereby another substrate is the real causal agent, is a possibility particularly given the position of UGT1A1 in a gene complex expressing nine other isoforms with different and overlapping substrates for glucuronidation. Known phenotypes for the low activity SNPs we selected are toxicity to irinotecan chemotherapy (UGT1A1) and statin therapy (SLCO1B1) but these seem unlikely to explain our results. We can find no other reductions in substrate elimination that could explain the observed relationships with lung cancer.24

Comparison with other studies

Several large cohorts have reported observational relationships between bilirubin and lung cancer. A recent global metabolomic profiling approach in ‘Caucasian’ lung cancer cases and controls identified bilirubin from 403 known metabolites as one of the only consistently significant biomarkers.30 The same study validated this finding in a large cohort of 425 660 Taiwanese adults. They reported significant effect modification by smoking status, and for every 0.1 mg/dL (1.71 µmol/L in standard international units) increase of bilirubin, the risks for lung cancer incidence and mortality decreased by 5% and 6% in male smokers, respectively (both p<0.001). There was no association in women possibly due to fewer smokers and cancer events. A similarly large cohort of British adults (n=504 206) reported an 8% decrease for every 0.1 mg/dL (1.71 µmol/L) increase in serum bilirubin levels in men (p<0.001).11 The relationship was consistent across genders and no effect modification by smoking was detected (p for interaction >0.05). A third recent study reported a negative relationship with lung cancer with rates 50% lower in the higher serum bilirubin quintile (>11 µmol/L) but this was not statistically significant.13 We have now shown that these relationships with serum bilirubin remain in regular smokers after adjusting for additional variables not always accounted for in the earlier analyses including pack-years, passive smoking, occupational smoke exposure and air quality. At this time, we are not aware of any other large-scale prospective studies of UGT1A1 variation and lung cancer specifically. Several studies have examined the relationship between genetically predicted bilirubin and other clinical outcomes with mixed results that possibly reflects differences in the pathophysiological role for oxidative stress, smoking case-mix, age at recruitment, low power for rare outcomes and study design.34 35 Genome-wide association studies have not identified the UGT1A1 locus in studies of lung cancer, although the number of smokers required to reach the high threshold for genome-wide statistical significance exceeds that available in most cohorts.

Future directions

Our findings support further studies on the utility of serum bilirubin as a low-cost biomarker for lung cancer risk stratification.12 20 Accurate risk stratification is key to the clinical and cost-effectiveness of low-dose CT screening programmes being adopted in the USA and piloted in the UK.2 Given the high frequency of lung cancer and the high-cost of false positive results of CT-screening for the patient and provider, even a small improvement in risk prediction could have a meaningful impact and is worth exploring. Further preclinical work could examine whether competitive inhibitors of the UGT1A1 enzyme are legitimate drug targets.

Gilbert’s syndrome is a common condition characterised by moderately elevated levels of unconjugated bilirubin and intermittent episodes of jaundice. Ten per cent of people with European or East Asian ancestry and 25% of people with equatorial African descent have at least two functional variants of UGT1A1 associated with the condition, although not all will meet the bilirubin threshold for diagnosing Gilbert’s syndrome.3 36 The high frequency of different UGT1A1 variants causing mild hereditary hyperbilirubinaemia has led to speculation of balancing natural selection whereby physiological benefits of hyperbilirubinaemia are countered by the neurotoxic impact on infants. Under this scenario, functional alleles reach a high frequency but never fixate.3 Our results suggest that for UK Biobank participants self-reporting as current smokers and with a phenotype consistent with Gilbert’s syndrome (serum total bilirubin >17 μmol/L), rates of lung cancer are 35%–50% lower and mortality 20%–40% lower than a similar group of smokers with bilirubin levels <5 μmol/L. Protection against smoke oxidants could have been advantageous for early humans once fire was discovered and widely used for lighting, warmth, cooking in poorly ventilated caves or dwellings.37 However, protection against infectious diseases is strongly implicated as the primary driver of existing examples of balancing selection and raised serum bilirubin does seem to inhibit contagious agents including group B streptococcus,38 hepatitis C39 and malaria parasites.40 As more events accrue in UK Biobank and the data become fully linked with primary healthcare records, it will be feasible to investigate a role for serum bilirubin in other infectious and non-communicable diseases.


We have shown that adult smokers in UK Biobank with genetically raised bilirubin have lower rates of lung cancer, supporting endogenous antioxidant properties. Further work is required to establish whether bilirubin is a useful low-cost biomarker for improving risk prediction and a legitimate therapeutic target for disease prevention.


We thank the participants of UK Biobank for making this research possible.


View Abstract


  • Contributors LJH, IN and SB contributed to the study design. LJH conducted statistical analyses with input and code checking from SB. LJH wrote the initial draft of the manuscript. All authors participated in the data interpretation and contributed to the final draft of the manuscript with intellectual importance.

  • Funding LJH is supported by a Wellcome Trust Fellowship (Grant Number 209207/Z/17/Z). SB is supported by Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (Grant Number 204623/Z/16/Z) and the UK National Institute for Health Research Cambridge Biomedical Research Centre. IH is supported by an NIHR Senior Investigator award.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval UK Biobank received ethics approval from the National Health Service National Research Ethics Service (Ref 11/NW/0382).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Details about UK Biobank and how to access the resource can be found at the following:

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.