Rheumatoid arthritis and idiopathic pulmonary fibrosis: a bidirectional Mendelian randomisation study

Background A usual interstitial pneumonia (UIP) pattern of lung injury is a key feature of idiopathic pulmonary fibrosis (IPF) and is also observed in up to 40% of individuals with rheumatoid arthritis (RA)-associated interstitial lung disease (RA-ILD). The RA-UIP phenotype could result from either a causal relationship of RA on UIP or vice versa, or from a simple co-occurrence of RA and IPF due to shared demographic, genetic or environmental risk factors. Methods We used two-sample bidirectional Mendelian randomisation (MR) to test the hypothesis of a causal effect of RA on UIP and of UIP on RA, using variants from genome-wide association studies (GWAS) of RA (separately for seropositive (18 019 cases and 991 604 controls) and seronegative (8515 cases and 1 015 471 controls) RA) and of IPF (4125 cases and 20 464 controls) as genetic instruments. Sensitivity analyses were conducted to assess the robustness of the results to violations of the MR assumptions. Findings IPF showed a significant causal effect on seropositive RA, with developing IPF increasing the risk of seropositive RA (OR=1.06, 95% CI: 1.04 to 1.08, p<0.001) which was robust under all models. For the MR in the other direction, seropositive RA showed a significant protective effect on IPF (OR=0.93; 95% CI: 0.87 to 0.99; p=0.032), but the effect was not significant when sensitivity analyses were applied. This was likely because of bias due to exclusion of patients with RA from among the cases in the IPF GWAS, or possibly because our genetic instruments did not fully capture the effect of the complex human leucocyte antigen region, the strongest RA genetic risk factor. Interpretation Our findings support the hypothesis that RA-UIP may be due to a cause–effect relationship between UIP and RA, rather than due to a coincidental occurrence of IPF in patients with RA. The significant causal effect of IPF on seropositive RA suggests that pathomechanisms involved in the development of UIP may promote RA, and this may help inform future guidelines on screening for ILD in patients with RA.

UIP is also the histopathological pattern of idiopathic pulmonary fibrosis (IPF), a progressive and fatal scarring disease of the lungs. RA-UIP and IPF share several clinical features such as a male sex predominance, older age at onset (around the 6 th and 7 th decade, respectively), indistinguishable patterns of ILD on HRCT, a poor prognosis (8)(9)(10)(11)(12)(13)) and a similar magnitude of response to anti-fibrotic therapy (14,15).
RA-UIP and IPF also share risk factors, including smoking and the MUC5B rs35705950 genetic variant (T risk allele), suggesting common pathogenic pathways (16)(17)(18). Indeed, the association between RA-ILD and MUC5B rs35705950 is restricted to the RA-UIP subtype of RA-ILD with a similar magnitude and direction to that reported in IPF (16,19). Of note, MUC5B rs35705950 was not found to contribute to the risk of RA without ILD (16). However, provocatively, the restricted association of MUC5B rs35705950 with RA and a UIP pattern of injury (but not NSIP) raises the hypothesis that RA-UIP might in fact be a coincidental occurrence of IPF in individuals who also have RA, rather than UIP being a direct consequence of RA (20).
Observational studies cannot provide strong evidence on causal relationships, nor the direction of causation, as they are vulnerable to confounding and reverse causation. Mendelian Randomisation (MR) is a statistical approach that can infer causal relationships between two traits and the direction of causality using genetic variants as instrumental variables (IVs). MR can be considered as a "natural" randomised control trial as the genetic variants an individual holds are randomly assigned at conception and do not vary during their lifetime, thus being not subject to confounding or reverse causation. For an MR study to be valid three key assumptions about the IVs should hold; 1) they should be associated with the exposure (risk factor) of interest, 2) they should not be associated with confounders of the exposure and outcome relationship, and 3) they should not be associated with the outcome other than through the exposure (i.e., no horizontal pleiotropy). MR results can be biased by horizontal pleiotropy, but there are methods available to detect and allow for pleiotropy.
We hypothesised that the RA-UIP phenotype was a simple co-occurrence of RA and IPF, rather than the result of either a causal relationship of RA on UIP or vice versa. To test this, we undertook a MR analysis to estimate the causal relationship between RA and IPF. We used a two-sample approach where we derived causal estimates from separate studies of RA and IPF. Our analysis was bidirectional, testing both for a causal effect of RA on IPF, and for a causal effect of IPF on RA. We undertook separate analyses for seropositive and seronegative (for Rheumatoid Factor (RF) and/or anti-citrullinated protein/peptide antibody (ACPA)) RA.

Methods
For our bidirectional MR analysis, we used a two-sample approach where summary statistics (i.e. effect estimates and standard errors) for the gene-exposure ("G-X") and gene-outcome ("G-Y") associations were obtained from separate studies. For the MR of the effect of RA on IPF, G-X refers to genetic associations with RA and G-Y to genetic associations with IPF, and vice versa for the MR of IPF on RA (Figure 1).

Study populations
Genetic association estimates for seropositive and seronegative RA were taken from a published study of RA (21) that reported separate genome-wide association studies (GWAS) of seropositive (18,019 cases and 991,604 controls) and seronegative (8,515 cases and 1,015,471 controls) RA. One hundred and thirty-five autosomal single nucleotide polymorphisms (SNPs) that were associated with RA (either seropositive, seronegative or both) in previous GWAS of European ancestry or multiancestry were selected as IVs for RA (22). These IVs were used for testing the causal effect of RA as the exposure and on IPF as the outcome.
Genetic association estimates for IPF were obtained from a previously published GWAS comprising 4,125 IPF cases and 20,464 controls (23). Nineteen common SNPs reported as being genome-wide significantly associated with IPF were selected as IVs for IPF (23). These IVs were used for testing the causal effect of IPF as the exposure and on RA as the outcome.
SNPs were excluded from the analyses if they were not present in the relevant outcome dataset and no suitable proxy (linkage disequilibrium r 2 >0.8) could be found. For palindromic SNPs (i.e., A/T, C/G SNPs), non-palindromic proxies (r 2 >0.8) were selected. For correlated SNPs (r 2 >0.01, determined using 1000 Genomes Project EUR population using LDlink (24)), the SNP with the least significant association with the exposure was excluded.
As the use of "weak" instruments can bias the results of MR, SNPs with an F-statistic < 10 were excluded, where the F-statistic represents a measure of instrument strength (25,26).

Statistical analyses
The inverse-variance weighted (27) fixed effect (IVW-FE) method is a fixed-effect meta-analysis of MR estimates across SNPs, where SNP-specific MR estimates are obtained using the Wald estimator (G-Y/G-X). This was used for the primary analysis for the MR in both directions, as it is the most powerful MR method in the absence of pleiotropy.
To investigate presence and magnitude of pleiotropy, the Cochran's Q statistic and I 2 statistic were used, respectively. Individual variant contributions to Cochran's Q heterogeneity statistic were plotted to identify pleiotropic SNPs (28). In the presence of pleiotropy, a series of sensitivity analyses were conducted to account for it; inverse-variance weighted random-effect (IVW-RE) method, MR-PRESSO (MR Pleiotropy RESidual Sum and Outlier) (29), weighted median (30), weighted modebased estimation (31) and MR-Egger (32). These different methods perform better in different scenarios as they make different assumptions about the nature of the underlying pleiotropy. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 27, 2022. ; https://doi.org/10.1101/2022.09.27.22280286 doi: medRxiv preprint IVW-RE is an inverse-variance weighted method where the fixed-effect meta-analysis model of IVW-FE is substituted by a random-effects model to allow for heterogeneity (as a proxy for pleiotropy). In particular, this method allows for balanced pleiotropy (random effects have a mean of zero), in the presence of which the point estimate is equivalent to the IVW-FE point estimate, but IVW-RE will have wider 95% CIs.
MR-Egger allows all SNPs to have pleiotropic effects, however pleiotropic effects should be independent of the G-X associations. The method is affected by outliers, particularly when the G-X estimates are similar across different SNP, which in turn can cause there to be low power to detect a causal effect. When the variation in the strength of the instruments is limited, MR-Egger is susceptible to dilution bias, which biases the MR results towards the null. The I 2 of a meta-analysis of G-X estimates (I 2 GX) can be used to assess this, with lower values suggesting stronger dilution. Ideally, the I 2 GX measure should be >90%; when this is not the case, MR-Egger should be performed using simulation extrapolation (SIMEX) to correct for the dilution bias (33).
The weighted median method makes weaker assumptions about valid IVs, as it only assumes that at least half of the variants are valid instruments. This method is robust to outliers and is not as affected by the presence of a small number of pleiotropic variants as the IVW and MR-Egger methods.
The weighted mode method is also robust to outliers and it makes even weaker assumptions, only assuming that the largest (weighted) contribution of similar SNP-specific MR estimates comes from valid instruments.
MR-PRESSO can be used to identify and remove possible pleiotropic SNPs which have been detected in the MR analysis as outliers. However, the outlier test requires at least 50% of the genetic variants used as valid IVs.
We also performed a leave-one-out analysis, where each IV is excluded in turn and the analysis repeated to identify whether the results are highly influenced by a single IV, to determine whether any of the causal estimates were heavily influenced by individual instruments.
All analyses were performed using packages in R (version 4.1.0), specifically "MendelianRandomization" (for IVW-FE, IVW-RE, MR-Egger, weighted mode and weighted median), "MRPRESSO" (for MR-PRESSO), "simex" (for MR-Egger with SIMEX extension) and "TwoSampleMR" (to harmonise the RA and IPF summary data and to perform the leave-one-out analyses). Estimates for Cochran's Q test and I 2 were obtained using IVW-FE analysis.

Selection of genetic instruments
Of the 135 IVs initially selected as IVs for RA, two were associated with seronegative RA only, one was associated with seronegative and combined RA, 93 were associated with combined RA and 39 were associated with seropositive RA. For the seropositive analysis we selected IVs associated with seropositive RA or combined RA and for the seronegative analysis IVs associated with seronegative RA or combined RA. In total, 70 were strong instruments (F-statistic >=10) for seropositive RA (Supplementary Table S1) and 16 were strong instruments for seronegative RA (Supplementary is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 27, 2022. ; https://doi.org/10.1101/2022.09.27.22280286 doi: medRxiv preprint Table S2). For IPF, all 19 association signals reported by Allen et al 2022 were strong instruments for IPF (Supplementary Tables S3 and S4).

Causal estimate for seropositive RA on IPF
The primary IVW-FE analysis gave a nominally significant result for a protective causal effect of seropositive RA on IPF (odds ratio (OR) 0.93; 95% confidence interval (CI) 0.87-0.99; p=0.032) ( Figure  2a and Supplementary Table S5a). Although there was statistically significant evidence of pleiotropy (Q test p-value = 2×10 -4 , and MR-PRESSO global test p-value = 2×10 -4 ), this was of moderate magnitude (I 2 = 41.3%, 95% CI = 22%-56%) and no SNPs were highlighted as outliers when using MR-PRESSO or when plotting individual contributions to Cochran's Q heterogeneity (Supplementary Figure S1). Moreover, significant estimates of a protective causal effect of seropositive RA on IPF were also obtained using the weighted median, weighted mode and MR-Egger analyses, and in the leave-one-out analysis, no exclusions resulted in a change in the direction of effect (Supplementary Figure S2).

Causal estimate for seronegative RA on IPF
Whilst the IVW-FE point estimate was similar to that of seropositive RA, the confidence intervals were very wide and the result was non-significant (95% CI 0.82-1.11, p=0.556) (Figure 2b and Supplementary Table S5a). The SNP rs6910071 was identified as an outlier by MR-PRESSO and there was statistically significant evidence of pleiotropy (Q test p-value = 0.002, and MR-PRESSO global test p-value = 0.0087, I 2 = 57.8%, I 2 95% CI = 27%-76%). Even after removing the SNP rs6910071 (positioned in the HLA region of chromosome 6) from the MR analysis (leave-one out analysis), the results remained null (Supplementary Figure S3).
In the leave one out analyses, no individual variant exclusions substantially attenuated the original result (Supplementary Figure S4).

Causal estimate for IPF on seronegative RA
None of the analyses suggested a significant causal effect of IPF on seronegative RA (Figure 2d and  Supplementary Table S5b) although all point estimates were greater than one, consistent with the seropositive analysis. There was no evidence of pleiotropy, MR-PRESSO did not detect any outliers and the leave-one-out analysis did not suggest that the result was influenced by any single IV (Supplementary Figure S5). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Discussion
This bidirectional two-sample MR study did not support the hypothesis that RA-UIP is a coincidental occurrence of IPF in patients with RA, and instead provides evidence of a causal effect of IPF on the development of seropositive RA. There was also evidence for a protective causal effect of seropositive RA on development of IPF, albeit this was considerably weaker.
The rationale for a causal effect of RA on UIP has been driven by the temporal relationship between the two conditions with RA often being diagnosed before pulmonary fibrosis. However, several arguments could suggest a causal effect of IPF on RA. The loss of immune tolerance that occurs when the lungs are chronically damaged may suggest that IPF could be a risk factor for RA, as suggested by RA developing after IPF diagnosis (34). An independent study, the Multi-Ethnic Study of Atherosclerosis, measured RA-related auto-antibodies and obtained cardiac CT scans which were assessed for subclinical ILD. This study demonstrated an association between elevated RF, ACPA and subclinical ILD, suggesting autoantibody production and pulmonary inflammation develop prior to clinical RA (35). However, as it was unknown if participants had RA, this may represent an association of antibodies with subclinical ILD. An assessment of patients with ILD who did not fulfil the classification criteria for RA revealed a third of patients who were ACPA-positive developed RA less than three years of their ILD diagnosis (36). Furthermore, there is a significant proportion of patients with RA (27% to 48%) for whom the diagnosis of ILD precedes or occurs at the same time as the onset of RA (5,37,38). Of note, a majority of individuals who developed ILD prior to RA were found to have a radiological UIP pattern (37). Finally, it has been shown that IgA-ACPA are elevated in up to 25% of patients with IPF and associated with changes in pathology (i.e., lymphoid aggregates) also seen in established RA-UIP (39).
The mechanism by which pulmonary fibrosis may promote RA is likely via breaching immune tolerance against citrullinated peptides. Indeed, several indirect arguments have led to the hypothesis of a mucosal origin of seropositive RA (40), positioning the lung as the site of initiation of the loss-of-tolerance against citrullinated peptides; 1) most of the environmental risk actors in seropositive RA are inhaled (smoking, silica exposure), 2) in patients with early ACPA positivity (up to 15 years before the onset of the first joint manifestations)(41), the IgA isotype, predominates (42), 3) the presence of peptidyl arginine deiminase 2 (PAD2) in lung tissue, an enzyme responsible for citrullination, and local production of ACPA (43)(44)(45), and 4) the existence of shared citrullinated peptide targets in lungs and joints of RA patients (46). These data suggest the possibility that in a subset of patients, the citrullinated protein targets of ACPA are lung-specific, leading to lung injury and fibrosis and, through a broadening of the ACPA repertoire, eventual synovitis and clinical RA (47,48).
Mucosal inflammation has long been considered the source of ACPA associated with seropositive RA, particularly IgA isotypes, and lymphoid follicles are common in both IPF and RA (40,45). It is thought that chronic infection or changes in the microbiome can promote protein citrullination via chronic inflammation and NETosis (49,50). Indeed, the predominant microbiota of both IPF and RA has been found to be the phyla Firmicutes (51)(52)(53)(54).
A fundamental principle of clinical management is to treat the underlying cause of any disease. Therefore, understanding the direction of causality between two associated conditions is crucial. RA has been considered causal for UIP for many years and has guided therapeutic decisions such as prioritising the use of immunomodulatory therapy in patients with RA-UIP. Our data would suggest . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 27, 2022. ; https://doi.org/10.1101/2022.09.27.22280286 doi: medRxiv preprint re-evaluating the therapeutic paradigm for treating RA-UIP. Indeed, the first randomised, doubleblind, placebo-controlled trial dedicated to patients with RA-ILD identified that pirfenidone had a greater effect on slowing the decline of FVC in patients with a UIP pattern on HRCT (14). Secondly, our findings raise the intriguing hypothesis that early identification and treatment of UIP in patients with RA may offer a novel strategy for managing RA. Historically, rheumatologists have been reluctant to screen asymptomatic patients with RA for ILD at point of diagnosis and these results would suggest that this approach should be reconsidered. Lastly, our findings also suggests that pulmonologists should carefully follow the outcome of patients with IPF, paying attention to the apparition of a RA specific autoimmunity as well as articular manifestations.
Whilst MR provides a framework to assess causality by using genetic instruments to remove the effects of confounders and reverse causation, it does have limitations. Possible pleiotropy was detected in most of our MR analyses, although the consistency of the results across different methods allowing for pleiotropy suggests robustness of our findings. The limited number of IVs and sample size of the source GWAS that were used to derive the causal estimates for IPF and seronegative RA, compared with the seropositive RA GWAS, may have impacted power to detect a causal relationship. Our hypothesis motivating this study was that RA-UIP is actually a co-occurrence of RA and IPF in the same patient, and so we used SNP IVs derived from IPF GWAS to model this.
Having detected a causal relationship between IPF and seropositive RA, it would be relevant to understand whether the causal effect was due to mechanisms promoting the UIP pattern of lung damage. However, no GWAS specifically for UIP exists and so the IPF SNP IVs are the best proxies for UIP IVs at this time. As the presence of UIP in the RA cases included in the RA source GWAS cannot be excluded, for the MR on the effect of RA on IPF, it is possible that some of the RA SNP IVs ("G-X") could be specifically associated with RA-UIP. These SNPs might also be associated with IPF (UIP) in which case they could introduce a bias towards a causal effect of RA on IPF; however, this was not seen. As for the MR on the effect of IPF on RA, it is also possible that RA-UIP cases in the RA source GWAS might lead to an association with the IPF SNP IVs ("G-Y") thereby introducing a bias towards a causal effect of IPF on RA. However, our leave-one-out analyses show that excluding the MUC5B SNP rs35705950, which is known to be associated with RA-UIP (16) with an effect that is similar in magnitude to the effect on IPF, did not change the causal effect estimate (Supplementary Figure S4). Whilst the direction and significance of the point estimates support a causal effect of IPF on seropositive RA, the magnitude of the effect should be interpreted with caution (55).
In spite of these limitations, our data suggest that patho-mechanisms involved in the development of UIP may promote RA. This has implications for the management of patients with RA-UIP. In addition, the causal effect of IPF on the development of seropositive RA would provide additional support for screening for ILD in patients with RA, especially in subgroups of patients identified as being at high-risk for pulmonary fibrosis (56,57). Whilst we found no support for a causal effect of RA on IPF, the opposite finding of a significant protective effect of RA against development of IPF was unexpected and requires further investigation. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 27, 2022.

Conflicts of Interest
LVW reports funding from GSK, Pfizer, Orion Pharma and Genentech, outside of the submitted work. LVW reports consultancy for Galapagos and Boehringer-Ingelheim.
RB reports funding from Roche and Boehringer-Ingelheim and personal fees for meeting attendance/travel, speaking fees or consulting fees from Sanofi, Roche and Boehringer-Ingelheim outside the submitted work.
PD reports funding from Bristol Myers Squibb, Pfizer, Galapagos and Chugai and personal fees for advisory board participation, speaking fees or consulting fees from Boehringer-Ingelheim, Bristol Myers Squibb, Janssen, Abbvie, Pfizer, Novartis and Galapagos outside the submitted work.
JKQ reports funding from MRC, HDR UK, GlaxoSmithKline, Bayer, BI, asthma+lung, Chiesi and AstraZeneca and consultancy fees from GlaxoSmithKline, Boehringer Ingelheim, AstraZeneca, Chiesi, Teva, Insmed and Bayer, outside the submitted work. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 27, 2022.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 27, 2022.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 27, 2022. ; https://doi.org/10.1101/2022.09.27.22280286 doi: medRxiv preprint