Article Text

Download PDFPDF
Serum proteomic profiling of rheumatoid arthritis–interstitial lung disease with a comparison to idiopathic pulmonary fibrosis
  1. Xiaoping Wu1,
  2. Yunju Jeong2,3,
  3. Sergio Poli de Frías4,
  4. Imaani Easthausen5,
  5. Katherine Hoffman5,
  6. Clara Oromendia5,
  7. Shahrad Taheri6,
  8. Anthony J Esposito2,7,
  9. Luisa Quesada Arias4,
  10. Ehab A Ayaub2,
  11. Rie Maurer8,
  12. Ritu R Gill9,
  13. Hiroto Hatabu3,10,
  14. Mizuki Nishino3,10,
  15. Michelle L Frits11,
  16. Christine K Iannaccone11,
  17. Michael E Weinblatt3,11,
  18. Nancy A Shadick3,11,
  19. Paul F Dellaripa3,11,
  20. Augustine M K Choi1,
  21. Edy Y Kim2,3,
  22. Ivan O Rosas12,
  23. Fernando J Martinez1,
  24. Tracy J Doyle2,3
  1. 1 Department of Medicine, Division of Pulmonary and Critical Care Medicine, Weill Cornell Medicine, New York, New York, USA
  2. 2 Department of Medicine, Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
  3. 3 Harvard Medical School, Boston, Massachusetts, USA
  4. 4 Department of Medicine, Mount Sinai Medical Center, Miami Beach, Florida, USA
  5. 5 Department of Population Health Science, Division of Biostatistics, Weill Cornell Medicine, New York, New York, USA
  6. 6 Department of Medicine, Weill Cornell Medicine–Qatar, Ar-Rayyan, Qatar
  7. 7 Department of Medicine, Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
  8. 8 Center for Clinical Investigation, Brigham and Women's Hospital, Boston, Massachusetts, USA
  9. 9 Department of Radiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
  10. 10 Department of Radiology, Brigham and Women's Hospital, Boston, Massachusetts, USA
  11. 11 Department of Medicine, Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, Massachusetts, USA
  12. 12 Department of Medicine, Division of Pulmonary, Critical Care and Sleep Medicine, Baylor College of Medicine, Houston, Texas, USA
  1. Correspondence to Dr Tracy J Doyle, Department of Medicine, Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA 02115; tjdoyle{at}


Although interstitial lung disease (ILD) causes significant morbidity and mortality in rheumatoid arthritis (RA), it is difficult to predict the development or progression of ILD, emphasising the need for improved discovery through minimally invasive diagnostic tests. Aptamer-based proteomic profiling was used to assess 1321 proteins from 159 patients with rheumatoid arthritis with interstitial lung disease (RA-ILD), RA without ILD, idiopathic pulmonary fibrosis and healthy controls. Differential expression and gene set enrichment analyses revealed molecular signatures that are strongly associated with the presence and severity of RA-ILD and provided insight into unexplored pathways of disease. These warrant further study as non-invasive diagnostic tools and future therapeutic targets.

  • Interstitial Fibrosis
  • Rheumatoid lung disease

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Interstitial lung disease (ILD) is a frequent and severe lung manifestation of rheumatoid arthritis (RA) with 5-year mortality estimated at 35%.1 Although the spectrum of rheumatoid arthritis with interstitial lung disease (RA-ILD) is present in approximately 40% of patients with RA, it frequently goes unrecognised until the advanced stages, emphasising the need for improved discovery through non-invasive diagnostic tests.2 Much of our knowledge regarding the pathogenesis of ILD is derived from research in idiopathic pulmonary fibrosis (IPF),3 but unique immunological features and heterogeneity in disease phenotype in RA require a dedicated inquiry into this distinct patient population.

Large-scale omics platforms4 can accelerate the discovery process by simultaneously analysing thousands of molecular markers and mapping groups of proteins to common biological pathways.5 We conducted the first proteomic analysis in RA-ILD, examining differential protein expression, heatmap clustering and gene set enrichment analysis to highlight new pathways of disease biology to develop as future diagnostic and therapeutic targets.

Materials and methods

Subjects with RA with ILD (RA-ILD, n=39) and without ILD (rheumatoid arthritis without interstitial lung disease (RA-noILD), n=36) were included in this study, along with IPF (n=42) and healthy controls (HCs, n=42) matched by age, gender and smoking history to RA-ILD. Quantification of 1321 proteins from serum was performed by SOMAscan assay4 with protein expression levels compared between clinical phenotypes of interest using linear regression with empirical Bayes SEs. Differential expression was assessed by p value (Benjamini-Hochberg) adjusted for gender, age and smoking status. The top two proteins increased in RA-ILD compared with RA-noILD were subsequently analysed with ELISA. Clustering patterns were explored using two-way unsupervised hierarchical clustering with Pearson’s correlation distance metric. Associations between lung function measures and proteins of interest were examined using adjusted linear regression models. Lastly, gene set enrichment analysis (GSEA)5 identified the most significant functional groups by normalised enrichment score (NES) which were visualised in a network map.


Demographic and clinical characteristics are summarised in table 1 (RA-ILD and RA-noILD) and online supplemental table S1 (IPF and HC). SOMAscan revealed 234 proteins that were differentially expressed between RA-ILD and RA-noILD (figure 1A and online supplemental table S2A) and 98 between IPF and HC (figure 1B and online supplemental table S2B). Of 25 proteins that were differentially expressed in both the RA-ILD (to RA-noILD) and IPF (to HC) comparisons (bolded in online supplemental tables S2A and S2B), 16 proteins (64%) were overexpressed and 5 (20%) were underexpressed in both RA-ILD and IPF, while 4 proteins (16%) were overexpressed in RA-ILD but not IPF. In direct comparison of patients with RA-ILD and IPF, five genes were differentially expressed (online supplemental table S2C).

Supplemental material

Figure 1

Distinct peripheral blood proteomic profiles of rheumatoid arthritis interstitial lung disease and idiopathic pulmonary fibrosis. Volcano plot shows expected mean difference in SD units (x-axis) and significance level (y-axis) of peripheral blood proteins measured by SOMAscan assay. Blue dots indicate proteins that are underexpressed and red dots indicate proteins that are overexpressed in (A) RA-ILD compared with RA-noILD or (B) IPF compared with HC. The top 24 are labelled by Uniprot ID. Differential expression is defined as Benjamini-Hochberg corrected p value of <0.05 after adjustment for gender, age ≤65 and smoking status (ever vs never). HC, healthy control; IPF, idiopathic pulmonary fibrosis; RA-ILD, rheumatoid arthritis with interstitial lung disease; RA-noILD, rheumatoid arthritis without interstitial lung disease.

Table 1

Clinical characteristics of RA-ILD and RA-noILD

Hierarchical clustering of proteins distinguished RA-ILD and RA-noILD subjects (online supplemental figure S1). Of the top five significant proteins with differential expression between RA-ILD and RA-noILD (online supplemental table S2A), paired immunoglobulin-like type two receptor-associated neural protein (PIANP) (mean difference 1.34, fold change 2.53, p<0.00001) and secretory leukocyte peptidase inhibitor (SLPI) (mean difference 1.36, fold change 2.57, p<0.00001) were associated with per cent predicted carbon monoxide diffusing capacity independent of gender, age and smoking; associations with per cent predicted forced vital capacity and per cent predicted total lung capacity were not observed (online supplemental table S3). ELISAs for PIANP (21.4 ng/mL vs 16.9 ng/mL, p=0.070) and SLPI (61.4 ng/mL vs 44.8 ng/mL, p<0.001) were performed on a subset of RA-ILD (n=29) versus RA-noILD (n=28) samples (online supplemental figure S2).

Figure 2

Gene set enrichment and network analysis of peripheral blood proteome in rheumatoid arthritis–interstitial lung disease. (A) The top 10 gene sets with greatest enrichment in RA-ILD compared with RA-noILD are shown with adjusted p value, number of core genes (count) and NES. (B) Network map depicts connections between functional groups associated with RA-ILD compared with RA-noILD. The three non-redundant functional groups identified by GSEA with the strongest enrichment for RA-ILD in each subontology are shown: cell–cell signalling (NES 2.56, adjusted p value of 0.0077), extracellular matrix (NES 2.61, adjusted p value of 0.0077) and G protein-coupled receptor binding (NES 2.70, adjusted p value of 0.0077). Darker colour indicates a higher fold-change expression level. Node size indicates the number of genes in the leading edge subset of the functional group. Edge thickness indicates the strength of correlation between the connected proteins. GSEA, gene set enrichment analysis; NES, normalised enrichment score; RA-ILD, rheumatoid arthritis with interstitial lung disease; RA-noILD, rheumatoid arthritis with no interstitial lung disease.

GSEA demonstrated 50 significantly enriched functional groups in RA-ILD compared with RA-noILD (top 10 represented in figure 2A and all in online supplemental table S4). Some of the strongest associations with RA-ILD included signaling receptor binding (NES 3.06, p=0.0077), extracellular matrix (NES 2.61, p=0.0077) and negative regulation of proteolysis (NES 2.62, p=0.0113). The network map highlighted functional groups extracellular matrix, cell–cell signalling and G-coupled receptor binding with leading edge interactions through CCL18, IL-17A, FGF4 and FGF7 (figure 2B). Although radiological subtype comparisons yielded no individual differentially expressed proteins, GSEA identified 39 enriched functional groups in comparisons of non-fibrotic RA-ILD versus fibrotic RA-ILD (online supplemental table S5A, S5B), and 188 enriched in non-fibrotic RA-ILD versus IPF (online supplemental table S6). GSEA yielded 65 enriched functional groups in IPF versus HC (online supplemental table S7) and no significant functional groups for comparisons of RA-ILD versus IPF or fibrotic RA-ILD versus IPF.


Through proteomic profiling, we identified molecular signatures strongly associated with the presence and severity of RA-ILD and provided insight into unexplored disease pathways. The 234 proteins differentially expressed between RA-ILD and RA-noILD span a wide range of functions including cell signalling and regulation of apoptosis. They include familiar proteins like matrix metalloproteinases (MMPs) that are involved in IPF6 and erosive disease in RA,7 but also novel proteins correlated with ILD severity. PIANP displays moderate expression in respiratory epithelial cells and appears to be necessary for regulation of tissue damage in response to neutrophil mediated inflammation.8 SLPI, one of the major defenses against destruction of lung tissue by neutrophil elastase, is overexpressed in scleroderma-related ILD9 and may signify a compensatory response to inflammatory lung injury.

GSEA revealed enrichment of proteins involved in cell–cell signalling and extracellular matrix in RA-ILD compared with RA-noILD. The leading-edge proteins driving these gene sets in RA-ILD included several proteins implicated in pulmonary fibrosis: cytokines (CCL1810 and interleukin-1711), chemokines (CXCL12 and CCL5), FGF family members12 (FGF4 and FGF7) and the S-type lectin galectin-3 (LGALS3)13 that plays a role in regulating myofibroblast proliferation, fibrogenesis and tissue repair. Based on encouraging data from both murine models of pulmonary fibrosis and human subjects, some of the aforementioned proteins have become new therapeutic targets for IPF ( identifier NCT03832946) or RA-ILD (NCT05246293 and NCT04311567).

Lastly, while the focus of this study was RA-ILD, we observed with great interest the comparative peripheral blood proteome signatures of RA-ILD subjects with their most closely related ILD, IPF. While the vast majority of differentially expressed genes (286 of 307, 93%) did not overlap between the comparisons of RA-ILD and IPF with their respective controls, the 16 overlapping proteins overexpressed in both RA-ILD and IPF may reflect shared disease mechanisms. Among the overlapping proteins, intercellular adhesion molecule 5 (ICAM5), spondin-1 (SPON1) and roundabout homolog 2 (ROBO2) have been linked to disease severity in the IPF-PRO cohort.14

No individual proteins were differentially expressed in direct comparison of the radiological subtypes of RA-ILD by proteomic analysis. However, GSEA detected several distinct functional groups in the comparison of fibrotic and nonfibrotic RA-ILD. Fibrotic RA-ILD was enriched for functional group extracellular matrix and receptor signalling. The leading-edge gene sets driving this enrichment included proteins, such as MMP7 and galectin-3, that were also enriched in the comparison of all RA-ILD with RA-noILD. This result suggests that there are differences between the pathogenesis of fibrotic and non-fibrotic ILD subtypes, which may have implications for clinical trials of antifibrotic or anti-inflammatory treatments for RA-ILD.

A few limitations to our study include the following: (1) The limited number of subjects in each group, coupled with the heterogeneity between groups and missing data inherent in retrospective cohorts. This restricts our ability to perform subgroup analyses and to understand the impact of disease-modifying therapies, the timing of ILD development, and other RA disease metrics on the measured outcomes. (2) Although the single-centre design and lack of an independent validation cohort limit our ability to draw conclusions regarding use of the differential protein signature as a diagnostic biomarker, the main goal of our study was to generate hypotheses by surveying a broad network of protein markers. (3) The inclusion of subjects with clinically indicated CT scans, as well as a predominance of nonspecific interstitial pneumonia pattern, may affect the generalisability of our results. (4) Difference in the timing between when the clinical data was obtained and biological sampling occurred is a potential limitation as ILD could have progressed in that time and individuals without ILD could have developed ILD during that time. (5) Lastly, while antibody-based techniques predominated earlier proteomic research, the SOMAscan platform has been shown to have high reproducibility, specificity, and strong concordance with genomic studies.4 15 16

To our knowledge, this is the first study to apply a proteomic analysis of peripheral blood to patients with RA-ILD, a novel approach that can provide valuable diagnostic and therapeutic targets through a noninvasive evaluation.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by Brigham and Women’s Hospital (IRB 2010P002840). The participants gave informed consent to participate before taking part.


We offer our sincere thanks to the patients with rheumatoid arthritis who participated and to the staff of BRASS and the Arthritis Center at Brigham and Women’s Hospital for their efforts in this study.



  • XW and YJ contributed equally.

  • Contributors XW, TJD, FJM, AMKC and IOR contributed to the conception and design of the study. XW, TJD, YJ, EYK, RM, SPdF, CO, KH, and IE were involved in the analysis and interpretation of data. All authors participated in drafting the work or critically revising it and provided approval of the manuscript to be submitted for publication.

  • Funding This work was supported by the National Institutes of Health (grant numbers K23 HL119558 and R03 HL148484) and Stony Wold-Herbert Fund Fellowship Grant (no grant number applicable). The Brigham Rheumatoid Arthritis Sequential Study is funded by grants from Bristol-Myers Squibb, Amgen, Crescendo Bioscience and Sanofi/Regeneron. The authors would like to thank the Clinical Research and Proteomics Cores at Weill Cornell Medicine–Qatar, supported by the Biomedical Research Program funded by Qatar Foundation. The funders had no role in study design, data collection, analysis, decision to publish or preparation of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard University, Weill Cornell Medicine, their affiliated academic healthcare centers or the National Institutes of Health.

  • Competing interests The authors have reported the following conflicts of interest, all outside the submitted work: ST reports medical advisory group membership of Novo Nordisk and board membership at Droobi Health, Qatar. RG receives grant support from Canon Medical Systems. HH reports grants from Canon Medical Systems and Konica Minolta, and personal fees from Mitsubishi Chemical Co and Canon Medical Systems Inc. MN reports grants from AstraZeneca, Daiichi Sankyo, Canon Medical Systems, Merck investigator studies program; personal fees from Daiichi Sankyo and AstraZeneca. MEW receives research support from Amgen, Bristol Myers Squibb and Eli Lilly and consultation fees from AbbVie, Aclaris, Amgen, Arena, Bayer, Bristol Myers Squibb, Corvitas, Eqrx, Genosco, GSK, Gilead, Horizon, Johnson & Johnson, Kiniksa, Lilly, Novartis, Pfizer, Rami Therapeutics, R Pharma, Roche, Sanofi, Scipher, Sci Rhom, Set Point and Tremeau. He holds stock/stock options of CanFite, Inmedix, Vorso and Scipher. NAS reports grants and other support from Bristol-Myers Squibb, grants from Mallinckrodt, Sanofi, Crescendo Biosciences, Lilly and Amgen. PFD reports grants from Bristol-Myers Squibb and Genentech, and other support from Boehringer Ingelheim. AMKC is a cofounder and equity stock holder for Proterris, which develops therapeutic uses for carbon monoxide, and also has a use patent on CO and a patent in chronic obstructive pulmonary disease. EYK is a member of the steering committees for and receives no financial remuneration from NCT04409834 (Prevention of arteriovenous thrombotic events in critically ill COVID-19 patients, TIMI group) and REMAP-CAP ACE2 renin–angiotensin system modulation domain, and receives unrelated research funding from Bayer AG, Roche Pharma Research and Early Development, Windtree Therapeutics, the US National Institutes of Health, the US Agency for International Development, the American Heart Association, American Lung Association and the Bell Family Fund. IOR reports grants from Genentech. FJM reports personal fees, non-financial support and other support from AstraZeneca, other support from Afferent/Merck, personal fees, non-financial support and other support from Boehringer Ingelheim, other support from Bristol Myers Squibb, other support from Chiesi, personal fees and non-financial support from the Canadian Respiratory Society, personal fees and non-financial support from CME Outfitters, personal fees and non-financial support from CSL Behring, personal fees from Dartmouth University, personal fees from France Foundation, personal fees from Gala, personal fees and non-financial support from Genentech, grants, personal fees, non-financial support and other support from GlaxoSmithKline, personal fees and non-financial support from Inova Fairfax, personal fees and non-financial support from MD Magazine, personal fees and non-financial support from NYP Methodist Hospital Brooklyn, personal fees and non-financial support from Miller Communications, personal fees and non-financial support from National Association for Continuing Education/Integritas, other support from Nitto, personal fees and non-financial support from Novartis, personal fees from New York University, personal fees and non-financial support from Patara/Respivant, personal fees from Pearl, personal fees and non-financial support from Peer View, personal fees from Physicians Education Resource, personal fees from ProMedior, personal fees and non-financial support from Rare Diseases Healthcare Communications, personal fees from Rockpointe Communications, personal fees and non-financial support from Sanofi/Regeneron, other support from Biogen, personal fees and non-financial support from Sunovion, personal fees and non-financial support from Teva, other support from two XAR, personal fees from University of Birmingham Alabama, personal fees from UpToDate, non-financial support from Veracyte, personal fees from Vindico, personal fees and non-financial support from WebMD/MedScape, non-financial support and other support from Zambon, non-financial support from ProTerrix Bio, and personal fees from IQVIA, Raziel, Abvie and Verona. TJD has received grant support from Bristol Myers Squibb, consulting fees from Boehringer Ingelheim and L.E.K. consulting, and has been part of a clinical trial funded by Genentech. The remaining authors have reported no conflicts of interest.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles