Table 1

Summary information on the publicly available datasets that were included in this study, as well as summary statistics for all individuals whose data were included in the analysis.

Discovery stageValidation stage
GEO accession numberGSE38958GSE33566GSE93606GSE132607GSE27957GSE28042
ReferenceHuang et al34Yang et al35Molyneaux et al36*1111
Disease statusIPFControlIPFControlIPFControlIPFIPFIPF
Sample size704593305720744575
Age (years, SD)68.2 (7.2)69.3 (9.3)67.2 (11.4)62.4 (14.3)67.4 (8.0)66.0 (10.6)66.6 (7.6)67.1 (8.2)68.9 (8.1)
Sex (% male)82.6%60.0%65.6%46.7%66.7%60.0%70.3%88.9%69.3%
Ancestry (% European)82.8%71.1%UnknownUnknownUnknownUnknown94.6%82.2%97.3%
FVC % predicted (SD)62.4 (15.0)Unknown62.0 (28.8)Unknown72.2 (20.3)Unknown69.7 (18.4)60.6 (14.3)65.4 (16.7)
DLCO % predicted (SD)43.3 (18.7)Unknown52.1 (27.9)Unknown39.2 (14.1)Unknown45.6 (15.4)43.4 (17.7)48.9 (18.6)
Mortality (%)UnknownUnknownUnknownUnknown40.4%UnknownUnknown37.8%32.0%
MUC5B genotype (% GG)UnknownUnknown28.0%53.8%40.0%Unknown18.8%UnknownUnknown
MUC5B genotype (% GT)UnknownUnknown66.0%42.3%50.0%Unknown78.1%UnknownUnknown
MUC5B genotype (% TT)UnknownUnknown6.0%3.8%10.0%Unknown3.1%UnknownUnknown
Immunosuppressive therapy (%)UnknownUnknown0.0%Unknown0.0%UnknownUnknown4.4%14.7%
  • *As of March 2022, the dataset with GEO accession number GSE132607 had not been associated with any published study.

  • †The datasets with GEO accession numbers GSE27957 and GSE28042 originated from the same study,11 where the data in GSE27957 were used in discovery and the data in GSE28042 were used as independent validation data.

  • DLCO, diffusing capacity of lung for carbon monoxide; GEO, Gene Expression Omnibus; MUC5B genotype, genotype for the MUC5B promoter polymorphism rs35705950.