Article Text

Genetic-informed proteome-wide scan reveals potential causal plasma proteins for idiopathic pulmonary fibrosis
Free
  1. Jiahao Zhu1,
  2. Houpu Liu1,
  3. Rui Gao1,
  4. Ruicheng Gong1,
  5. Jing Wang1,
  6. Dan Zhou2,3,
  7. Min Yu4,
  8. Yingjun Li1
  1. 1Department of Epidemiology and Health Statistics, School of Public Health, Hangzhou Medical College, Hangzhou, China
  2. 2Department of Big Data in Health Science, School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
  3. 3Vanderbilt University Medical Center, Nashville, Tennessee, USA
  4. 4Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
  1. Correspondence to Professor Min Yu, Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, People's Republic of China; myu{at}cdc.zj.cn; Dr Yingjun Li, Department of Epidemiology and Health Statistics, School of Public Health, Hangzhou Medical College, Hangzhou, People's Republic of China; 2016034036{at}hmc.edu.cn

Abstract

Idiopathic pulmonary fibrosis (IPF) is a lethal lung disease for which there are no reliable biomarkers or disease-modifying drugs. Here, we integrated human genomics and proteomics to investigate the causal associations between 2769 plasma proteins and IPF. Our Mendelian randomisation analysis identified nine proteins associated with IPF, of which three (FUT3, ADAM15 and USP28) were colocalised. ADAM15 emerged as the top candidate, supported by expression quantitative trait locus analysis in both blood and lung tissue. These findings provide novel insights into the aetiology of IPF and offer translational opportunities in response to the clinical challenges of this devastating disease.

  • Idiopathic pulmonary fibrosis

Data availability statement

The GWAS summary statistics from the International IPF Genetics Consortium and Global Biobank Meta-analysis Initiative are available at https://github.com/genomicsITER/PFgenetics and https://www.globalbiobankmeta.org, respectively. The GWAS summary statistics for 2,941 Olink assays and 4,907 SomaScan assays are available at http://ukb-ppp.gwas.eu and https://www.decode.com/summarydata, respectively. The single-cell RNA sequencing data of human lung tissue are available in the Gene Expression Omnibus at https://www.ncbi.nlm.nih.gov/geo/https://www.ncbi.nlm.nih.gov/geo/, under the accession code GSE227136.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Idiopathic pulmonary fibrosis (IPF) is an irreversible lung disease characterised by progressive parenchymal scarring, leading to deteriorating respiratory symptoms and premature mortality. Despite substantial progress in our understanding the pathogenesis of IPF, the quest for molecular biomarkers capable of predicting IPF risk and disease-modifying agents remains unmet.1

Circulating proteins play pivotal roles in a multitude of biological processes and serve as primary sources of biomarkers and therapeutic targets. While several proteins, such as MMP7, have been linked to IPF, none are currently integrated into clinical practice.1 Indeed, most studies examining proteins and IPF are observational and retrospective in design, posing challenges for causal inference.

Combining human genomics with high-throughput, population-scale proteomics could help to bridge the gap between cause and effect. Recent plasma proteogenomic studies have identified over 1900 independent cis-acting protein quantitative trait loci (cis-pQTLs) located in the vicinity of the encoding genes.2 3 These extensive genetic associations offer a unique opportunity to elucidate the causal impacts of thousands of proteins on human complex diseases through Mendelian randomisation (MR) and colocalisation.

Here, we report an integrative analysis of human genomics and plasma proteomics to investigate the causal associations between 2769 plasma proteins and the risk of IPF.

Methods

Our study design is illustrated in figure 1. Briefly, we integrated cis-pQTL summary statistics from two recently published plasma proteome genome-wide association studies (GWASs) by Sun et al (34 557 white British individuals)2 and Ferkingstad et al (35 559 Icelanders).3 We obtained GWAS summary statistics for IPF from two large meta-analyses in individuals of European ancestry: the International IPF Genetics Consortium (4125 cases and 20 464 controls)4 for the discovery phase, and the Global Biobank Meta-analysis Initiative (6257 cases and 947 616 controls)5 for external replication. More details of these datasets can be found in online supplemental methods.

Supplemental material

Figure 1

Flowchart of the study design. eQTL, expression quantitative trait loci; FDR, false discovery rate; HEIDI, heterogeneity in dependent instruments; IPF, idiopathic pulmonary fibrosis; MHC, major histocompatibility complex; PP4, posterior probability for the same shared causal variant; pQTL, protein quantitative trait loci; scRNA-seq, single-cell RNA sequencing; SMR, summary-data-based Mendelian randomisation.

To assess the causal association between plasma protein levels and IPF risk, we implemented the summary-data-based MR (SMR) test using the top associated cis-pQTLs as instrumental variables.6 Benjamini-Hochberg method was applied to adjust for multiple testing across 2769 proteins, with the false discovery rate (FDR) set at ɑ=0.05. Proteins showing a consistent direction of effect in the replication analysis were further evaluated via colocalisation analyses. The heterogeneity in the dependent instrument (HEIDI) test, which leverages multiple variants, was used to distinguish pleiotropy from linkage.6 Additionally, we performed Bayesian colocalisation analysis to assess whether two association signals (in this case, a protein and IPF) are consistent with a shared causal variant.7 Strong evidence of colocalisation was defined as a p value >0.05 in the HEIDI test and a posterior probability for the same shared variant (PP4) >0.80 in Bayesian colocalisation.

Complementary analyses were performed for proteins supported by MR and colocalisation. We employed Steiger filtering to determine whether a cis-pQTL explained more phenotypic variance in plasma protein levels than in IPF. Reverse-direction MR was carried out by constructing a polygenetic instrument for IPF to assess its potential causal influence on plasma protein levels. Additionally, we assessed whether the identified cis-pQTLs exhibited pleiotropic associations and whether candidate causal genes could potentially serve as drug targets through database searches. To evaluate the identified protein-coding genes at the transcriptional level, we retrieved tissue-specific cis-acting expression QTLs (cis-eQTLs) in human peripheral blood and lung tissue and compared their associations with IPF. For candidate genes with significant cis-eQTLs in lung tissue, we further explored the cell-type-specific expression profiles using single-cell RNA sequencing (scRNA-seq) data. Detailed methods are described in online supplemental methods. This study adheres to the Strengthening the Reporting of Observational Studies in Epidemiology-MR guidelines for reporting (online supplemental table 1). A glossary of key terms used in our study is provided in online supplemental table 2.

Results

Following instrument selection, a total of 2769 independent cis-pQTLs were included in the SMR test. The F-statistic for top associated cis-pQTLs ranged from 16 to 134 173, and the variance in plasma protein levels explained by these variants varied between 0.05% and 81.66%. After correction for multiple testing, genetically predicted plasma levels of nine proteins were significantly associated with the risk of IPF (PFDR<0.05) (figure 2A, refer to online supplemental table 3 for details). Eight proteins remained directionally concordant using the replication IPF dataset (figure 2B). Among these, three proteins (FUT3, ADAM15 and USP28) passed the HEIDI test (p>0.05) and had strong support for Bayesian colocalisation (PP4>0.80) (figure 2C and online supplemental figure 1). Some previously reported biomarkers for IPF were also covered in our SMR analysis,8 but none showed reliable causal effects on IPF (figure 2B).

Figure 2

SMR and colocalisation results. (A) Volcano plot for the causal associations between 2769 plasma proteins and the risk of IPF in the discovery SMR analysis. Each point represents a protein, with the estimate from the SMR test on the x-axis and the −log(p value) on the y-axis. Red points denote significant positive associations of proteins with the risk of IPF, blue points significant negative associations, and grey points non-significant associations. Proteins with FDR-corrected p value <0.05 are labelled with the gene name. (B) Forest plot for the nine proteins identified in SMR test with FDR-corrected p value <0.05 and 17 previously reported IPF protein biomarkers. Estimates on the x-axis were obtained from the SMR test using IPF summary statistics from the International IPF Genetics Consortium (discovery) and Global Biobank Meta-analysis Initiative (replication). P values of both the discovery and replication analyses are original (without FDR correction). (C) Regional association plots and colocalisation analyses results for the three proteins (FUT3, ADAM15 and USP28) that showed strong evidence of colocalisation (PP4>0.80 and PHEIDI>0.05). Each point represents a variant, with chromosomal position on the x-axis (within 500 kb regions of each top variant for candidate proteins) and the −log10(p value) on the y-axis. Variants are coloured according to linkage disequilibrium with the top variant. Blue lines depict the recombination rate, while gene locations are displayed at the bottom of the plot. FDR, false discovery rate; HEIDI, heterogeneity in dependent instruments; IPF, idiopathic pulmonary fibrosis; PP4, posterior probability the same shared causal variant; SMR, summary-data-based Mendelian randomisation.

For the three colocalised proteins, Steiger filtering and reverse-direction MR yielded no reliable evidence of reverse causality (p>0.070) (figure 3A). Through database cross-reference, pleiotropic associations of top cis-pQTLs with several other traits were found (figure 3B). However, subsequent MR analysis indicated that none of these traits were significantly associated with IPF after correction for multiple testing (PFDR>0.220). Regarding druggability status, FUT3 (Rebmab-100) was identified as a clinical trial target for the treatment of fallopian tube cancer, whereas no druggable information was available for the other two candidate genes. Significant cis-eQTLs for ADAM15 were identified in peripheral blood and lung tissue, which also displayed significant associations with IPF (figure 3C). Notably, the top plasma cis-pQTL for ADAM15 (rs11589479) was shared with the blood cis-eQTL, whereas a distinct top cis-eQTL (rs6427128) was observed in lung tissue. No significant cis-eQTLs were identified for FUT3 and USP28 in either tissue. Moreover, transcriptomic analysis using scRNA-seq data revealed that ADAM15 was mainly enriched in endothelial cells in the lungs (figure 3D). Endothelial cells were notably decreased in IPF compared with normal lung tissue (figure 3E). Within the subpopulations of endothelial cells, the expression of ADAM15 in venule cells tended to be lower in IPF than normal lungs (average log2 fold change=−1.54 and p=0.011) (figure 3F).

Figure 3

Complementary analysis results for the three putative causal proteins (FUT3, ADAM15 and USP28). (A) Forest plot for the reverse-direction MR analysis of the effect of IPF on plasma protein level changes. Estimates on the x-axis were obtained from primary IVW method, as well as pleiotropy-robust methods including weighted median and MR Egger. (B) Database search results for pleiotropic associations of pQTLs (left panel) and MR results for the effect of the identified traits on IPF (right panel). Pleiotropic associations with p values were identified through database searches in GWAS Catalog and PhenoScanner. MR estimates on the x-axis were obtained from primary IVW method, as well as pleiotropy-robust methods including weighted median and MR Egger. (C) Regional association plot for IPF with the top ADAM15 cis-eQTLs in peripheral blood (rs11589479) and lung tissue (rs6427128). Each point represents a variant, with chromosomal position on the x-axis (within 500 kb regions of the top variant for ADAM15) and the −log10(p value) on the y-axis. Variants are coloured according to linkage disequilibrium with the index variant. rs11589479 (in purple) and rs6427128 (in green) are in linkage disequilibrium (r2=0.582). Blue lines depict the recombination rate, while gene locations are displayed at the bottom of the plot. (D) Cell-type-specific ADAM15 expression in lung tissue. In left panel, graph-based clustering was performed using the UMAP. Each point represents a cell, and clusters are labelled using different colours. In upper right panel, each point represents a cell unstratified and is coloured according to normalised read counts. In lower right panel, each point represents a cell stratified by disease status (IPF vs control). (E) Comparison of cell type proportions between IPF and control lungs. (F) Comparison of ADAM15 expression between IPF and control lung endothelial cells. Y-axis represents average of normalised read counts per subject. Differential expression analysis was performed using the Wilcoxon rank-sum test, treating the average ADAM15 expression per subject as representative for each cell type. eQTL, expression quantitative trait loci; IPF, idiopathic pulmonary fibrosis; IVW, inverse-variance weighted; MR, Mendelian randomisation; PMID, PubMed identifier; PheWAS, phenome-wide association study; UMAP, uniform manifold approximation and projection.

Discussion

Leveraging the largest available human genomics and plasma proteomics resource, our comprehensive analysis pinpointed three proteins (FUT3, ADAM15 and USP28) that may causally influence IPF, with ADAM15 emerging as the most likely candidate.

Our findings corroborate the results from a previous MR study, indicating an inverse association between genetically predicted circulating protein levels of FUT3 and IPF risk.9 Additionally, we uncovered two novel putative causal proteins, with ADAM15 emerging as the most notable candidate, supported by eQTL analysis. ADAM15, a member of the disintegrin and metalloproteinase protein family, has been implicated in endothelial permeability and lung inflammation.10 In cell experiments, physiological shear stress induced KLF2-mediated ADAM15 expression, contributing to the survival of human pulmonary microvascular endothelial cells.11 ADAM15-deficient mice exposed to cigarette smoke exhibited greater emphysema, small airway fibrosis and lung inflammation compared with wild-type mice.12 Accumulating evidence suggests an important role of vascular abnormalities in IPF, including abnormal spatial distribution of endothelial cells, dysregulation of endothelial protective pathways and increased frequency of vascular and metabolic comorbidities.13 The mechanism by which ADAM15 influences IPF remains poorly understood, but it is plausible that ADAM15 deficiency may lead to increased endothelial damage and repair defects and pulmonary vascular inflammation, thereby contributing to the development of fibrosis.

Our study also assessed several protein biomarkers for IPF that have been previously identified. The negative MR findings for the causal effect of these established biomarkers align with the frustrating situation wherein most of them currently lack robust assay validation or replication in independent prospective cohorts and are infrequently used in clinical practice.8 Indeed, these known biomarkers are more useful in defining prognosis once IPF has occurred rather than predicting the risk of IPF development.

Through database searches, we noted potential pleiotropic effects of cis-pQTLs. For example, the top cis-pQTL for FUT3, rs708686, was also associated with plasma levels of CA19-9, a previously described biomarker for IPF. Nevertheless, our MR analysis provided little evidence supporting causal effects of these traits on IPF. Given that cis-pQTLs are thought to have great biological relevance, these observed pleiotropic associations are more likely to reflect vertical pleiotropy, where genetic variants influencing protein levels in turn affect other traits. Vertical pleiotropy does not violate the assumptions of MR.

Two-thirds of the proteins identified in the MR analysis lack strong evidence of colocalisation, suggesting that a substantial proportion of MR findings may be driven by linkage disequilibrium (LD). For instance, the most prominent IPF signal, rs35705950 at MUC5B, is in weak LD with the top cis-pQTLs for BRSK2, rs7932863 (r2=0.021) and MUC2, rs12416873 (r2=0.017). Of note, a negative colocalisation result does not necessarily imply that the target gene is not valid; it may also arise from other complex situations such as inadequate power of IPF GWAS and the existence of multiple causal variants.14

One important limitation of this work is that protein levels are known to differ across cell types and states. While our study estimated the role of proteins measured in plasma, the relevance of these protein levels in other tissues, particularly in the lung, could not be assessed. Another potential limitation is that our analysis may have overlooked certain important proteins associated with IPF, due to our inability to comprehensively capture the entire proteome using cis-pQTLs. Moreover, the restriction of our analysis to individuals of European ancestry, although necessary to mitigate population stratification bias, may limit the generalisability of our findings to other ancestry groups. As such, ongoing expansions in the scale, diversity, and availability of pQTL and IPF GWAS data will be crucial in precisely mapping causal proteins for IPF.

In summary, our study identified three potential causal proteins associated with IPF risk. These findings offer novel insights into the pathophysiology of IPF and hold potential translational relevance by prioritising targets for the development of biomarkers, diagnostics and medicines.

Data availability statement

The GWAS summary statistics from the International IPF Genetics Consortium and Global Biobank Meta-analysis Initiative are available at https://github.com/genomicsITER/PFgenetics and https://www.globalbiobankmeta.org, respectively. The GWAS summary statistics for 2,941 Olink assays and 4,907 SomaScan assays are available at http://ukb-ppp.gwas.eu and https://www.decode.com/summarydata, respectively. The single-cell RNA sequencing data of human lung tissue are available in the Gene Expression Omnibus at https://www.ncbi.nlm.nih.gov/geo/https://www.ncbi.nlm.nih.gov/geo/, under the accession code GSE227136.

Ethics statements

Patient consent for publication

Acknowledgments

The authors thank the proteomic and transcriptomic GWASs, International IPF Genetics Consortium and Global Biobank Meta-analysis Initiative for sharing the full summary statistics.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors JZ contributed to the analysis and interpretation of the data and drafting of the manuscript. HL contributed to data analysis and visualisation. DZ contributed to methodology and interpretation of data. YL contributed to conceptualisation and data acquisition. All authors participated in revisions and approved the final version. YL is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

  • Funding This study was supported by the Key Discipline of Zhejiang Province in Public Health and Preventative Medicine (First Class, Category A), Hangzhou Medical College; Natural Science Foundation of Zhejiang Province (No. LTGY23H260009), China; and Zhejiang Province Key Science and Technology Plan for Medicine and Health (No. WKJ-ZJ-2333).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles