Article Text


Original article
Ethnicity and mycobacterial lineage as determinants of tuberculosis disease phenotype
  1. Manish Pareek1,2,3,
  2. Jason Evans4,
  3. John Innes5,
  4. Grace Smith4,
  5. Suzie Hingley-Wilson1,
  6. Kathryn E Lougheed6,
  7. Saranya Sridhar1,
  8. Martin Dedicoat5,
  9. Peter Hawkey4,7,
  10. Ajit Lalvani1
  1. 1Tuberculosis Research Unit, National Heart and Lung Institute, Imperial College London, London, UK
  2. 2Department of Infectious Disease Epidemiology, Imperial College London, London, UK
  3. 3Department of Infection, Immunity and Inflammation, University of Leicester, Leicester, UK
  4. 4Health Protection Agency Midlands Laboratory, Heart of England NHS Foundation Trust, Bordesley Green East, Birmingham, UK
  5. 5Department of Infection and Tropical Medicine, Heart of England NHS Foundation Trust, Bordesley Green East, Birmingham, UK
  6. 6Department of Integrative Cell Biology, Division of Cell & Molecular Biology, Imperial College London, London, UK
  7. 7School of Immunity and Infection, The Medical School, University of Birmingham, Edgbaston, Birmingham, UK
  1. Correspondence to Prof Ajit Lalvani, Tuberculosis Research Unit, National Heart and Lung Institute, Imperial College London, London W21PG, UK; a.lalvani{at}


Background Emerging evidence suggests that Mycobacterium tuberculosis (Mtb) lineage and host ethnicity can determine tuberculosis (TB) clinical disease patterns but their relative importance and interaction are unknown.

Methods We evaluated prospectively collected TB surveillance and Mtb strain typing data in an ethnically heterogeneous UK population. Lineage assignment was denoted using 15-loci mycobacterial interspersed repetitive units containing variable numbers of tandem repeats (MIRU-VNTR) and MIRU-VNTRplus. Geographical and ethnic associations of the six global Mtb lineages were identified and the influence of lineage and demographic factors on clinical phenotype were assessed using multivariate logistic regression.

Results Data were available for 1070 individuals with active TB which was pulmonary only, extrapulmonary only and concurrent pulmonary–extrapulmonary in 52.1%, 36.9% and 11.0% respectively. The most prevalent lineages were Euro-American (43.7%), East African Indian (30.2%), Indo-Oceanic (13.6%) and East Asian (12.2%) and were geo-ethnically restricted with, for example, Indian subcontinent ethnicity inversely associated with Euro-American lineage (OR 0.23; 95% CI 0.14 to 0.39) and positively associated with the East African-Indian lineage (OR 4.04; 95% CI 2.19 to 7.45). Disease phenotype was most strongly associated with ethnicity (OR for extrathoracic disease 21.14 (95% CI 6.08 to 73.48) for Indian subcontinent and 14.05 (3.97 to 49.65) for Afro-Caribbean), after adjusting for lineage. With East Asian lineage as the reference category, the Euro-American (OR 0.54; 95% CI 0.32 to 0.91) and East-African Indian (OR 0.50; 95% CI 0.29 to 0.86) lineages were negatively associated with extrathoracic disease, compared with pulmonary disease, after adjusting for ethnicity.

Conclusions Ethnicity is a powerful determinant of clinical TB phenotype independently of mycobacterial lineage and the role of ethnicity-associated factors in pathogenesis warrants investigation.

  • Tuberculosis

Statistics from

Key messages

What is the key question?

  • Mycobacterium tuberculosis (Mtb) presents as a spectrum of disease but the relative importance of mycobacterial lineage and host ethnicity in determining clinical phenotype is unknown.

What is the bottom line?

  • Ethnicity is a powerful determinant of clinical tuberculosis phenotype independently of mycobacterial lineage.

Why read on?

  • This large UK-based study provides the first evidence for the predominance of host ethnicity over mycobacterial lineage in determining clinical disease phenotype and highlights the need to investigate ethnicity-associated factors in the pathogenesis of tuberculosis.


Worldwide, Mycobacterium tuberculosis (Mtb) continues to cause significant morbidity and mortality with an estimated 8.8 million cases and 1.1 million deaths in 2011.1 Genetic strain typing has identified six major lineages of Mtb which epidemiologically coassociate with distinct geographical regions.2–5 Moreover, lineages can differ in the host response that they induce6 with, for example, the Beijing (East Asian) lineage inducing lower levels of inflammatory cytokines7 ,8 while exhibiting enhanced intracellular growth.9

Although these in vitro differences in host response might be expected to result in the well recognised different clinical phenotypes observed in distinct demographic groups,10–12 the impact and influence of different lineages on clinical patterns of active tuberculosis (TB) still remains unclear. A notable exception is the Beijing (East Asian) lineage which, in certain ethnic groups, has been associated with a more aggressive clinical course13 and extrapulmonary forms of TB,14–16 although other studies have failed to document these associations.17–20 More recently, European studies19 ,21 have suggested that the Central Asian lineage is associated with extrapulmonary disease whilst two Pakistani22 ,23 studies found the opposite. However, studies conducted to date have, with one recent exception,20 mainly examined homogeneous populations where the relative importance of ethnicity, compared with lineage, on disease phenotype was not independently examined.

Given the recognised association of different lineages and ethnic groups,2–5 and the known link between extrapulmonary disease and non-white ethnic groups,10–12 we reasoned that to determine whether lineages influence clinical patterns of disease it would be necessary to dissect out the relative influence of the lineage itself from ethnicity. We therefore prospectively collated data on host demographic factors, clinical patterns of disease and mycobacterial genotype for culture-confirmed cases of TB in a single-centre UK setting, with a unique ethnic mix, to assess the global phylogeography of the strains in our dataset, identify the demographic factors associated with different lineages and to establish which, if any, factors, including lineage, were associated with distinct clinical patterns of disease.


Study design and data source

We analysed data from a routine, anonymised, centralised clinical surveillance database maintained as part of usual care for all patients diagnosed with active TB in two closely related metropolitan areas in the UK (Birmingham and Solihull—combined population approximately 1.2 million;24 TB incidence varies from 9–88 per 100 000 population per year25).

MIRU-VNTR typing

Since mid-2003 all culture-positive Mtb isolates received, and identified, as Mtb complex by the Health Protection Agency Midlands Regional Centre for Mycobacteriology have been routinely analysed by 15-loci mycobacterial interspersed repetitive units containing variable numbers of tandem repeats (MIRU-VNTR) typing (ETR-A, B, C, D, E and MIRU-02, 10, 16, 20, 23, 24, 26, 27, 39, 40). MIRU-VNTR typing analyses the number of repetitive DNA sequences at multiple independent genetic loci.26 ,27 The values generated are automatically recorded on the clinical surveillance database.

Study population, variables and assignment to lineage

We included all cases of active TB diagnosed between 1 July 2003 and 28 February 2010 with complete information on age, sex, country of birth (local or foreign born), ethnicity, site of disease and MIRU-VNTR value; patients with missing information for one or more of these variables or culture-negative disease were excluded from analysis.

Information on year of notification, age, sex, local/foreign born, country of birth if foreign born, ethnicity, time since entry (only applies to foreign-born immigrants and calculated as time between arrival in the UK and diagnosis of active TB), employment status, HIV status, previous history of TB, site of disease, specific organs involved, duration of symptoms before treatment commencement, drug sensitivity and MIRU-VNTR value were retrieved from the database.

We used the MIRU-VNTRplus database28 ( to assign strains with MIRU-VNTR values to one of six global lineages as defined by Gagneux et al 2: East Asian, Euro-American, East African Indian, Indo-Oceanic, West African-1 and West African-2. Strains which could not be assigned to any lineage and those which were identified as being non Mtb (such as Mycobacterium bovis) were excluded from analysis.

Ethnicity was defined in line with the Office for National Statistics classification29 (White, Indian subcontinent, Afro-Caribbean, Oriental/Other Asia, Other). We explored the patterns of disease in local-born ethnic groups versus foreign-born ethnic groups by creating a composite variable ‘location of birth/ethnicity’ with individuals categorised as being local or foreign born non-white ethnicity.

HIV testing of patients diagnosed with active TB is highly inconsistent in the UK and this was reflected by this variable being inconsistently recorded in our dataset. Therefore, we considered patients either as HIV positive (if recorded) or as negative/unknown (for all other individuals).

Definitions of TB phenotypes

We classified all cases of active TB into the following clinical phenotypes: extrathoracic disease only (only extrapulmonary site(s) outside the thoracic cavity); extrapulmonary only (only one or more extrapulmonary site(s) involved but including pleural TB and intrathoracic lymph node TB20); pulmonary only (only lung parenchyma involved); pulmonary and extrapulmonary (pulmonary disease and one or more extrapulmonary sites involved concurrently); and lymph node disease only (intrathoracic or extrathoracic). For comparative analyses we excluded all patients with concurrent pulmonary–extrapulmonary disease, as others10 ,11 ,30 have done, to minimise misclassification bias as the dominant disease site is unclear.

Data analysis

Details of the data analysis are presented in the online supplementary information.


Description of cohort

The study flowchart is presented in figure 1 and demographic and clinical details of the cohort in table 1. Median time between arrival in the UK and development of active TB was 5 years (IQR 2–21). Individuals with MIRU-VNTR values (n=1070) were not significantly different, in terms of demographic and clinical features, to individuals without MIRU-VNTR values and to individuals with MIRU-VNTR values that could not be assigned to a lineage (table 1).

Table 1

Demographic characteristics of individuals with active tuberculosis compared with those with no mycobacterial interspersed repetitive units containing variable numbers of tandem repeats (MIRU-VNTR) value and when no mycobacterial lineage could be assigned

Figure 1

Flow chart of cases included in data analysis.

Distribution, and temporal trends, of lineages in cohort

In our cohort, Euro-American was the commonest lineage (43.7%), followed by East African Indian (30.2%), Indo-Oceanic (13.6%) and Beijing (12.2%). West African-1 (0.0%) and West African-2 (0.7%) lineages were rarely found in this cohort. The proportion of cases caused by any specific lineage in any given year did not change significantly over the course of the study period (see online supplementary figure S1).

Distribution of lineages by geographical origin and ethnic group

Table 2 and online supplementary figure S2 present the global distribution of lineages stratified by region, and country, of origin respectively. In general, lineages were geographically restricted and associated with specific regions. For example, the Euro-American lineage was predominantly found in Europe, the Americas and Africa, which together accounted for 74.7% of all cases caused by this lineage. East African Indian and Indo-Oceanic lineages were mainly found in the Indian subcontinent and East Africa (75.7%) and Indian subcontinent, East Africa and Asia (83.3%) respectively. By contrast, the East Asian strain appeared more geographically widespread, being found in the Indian subcontinent and East Africa and accounted for 55.6% of cases in the small number of individuals (nine) from East Asia.

Table 2

Distribution of Mycobacterium tuberculosis lineages by different world regions of birth*

Stratification of lineages by ethnic group (see online supplementary table S1) illustrated that, within specific regions, individuals of certain ethnicity were preferentially associated with particular lineages. Individuals of White (OR 4.56; 95% CI 3.08 to 6.75) and Afro-Caribbean (OR 3.21; 95% CI 2.42 to 4.28) ethnicity were significantly more likely to present with disease caused by the Euro-American lineage. By contrast, individuals in the Indian subcontinent group were significantly associated with the East African Indian (OR 5.89; 95% CI 4.30 to 8.06) and Indo-Oceanic lineages (OR 1.69; 95% CI 1.17 to 2.43) whereas the Oriental/Other Asia group was associated with the East Asian lineage (OR 3.29; 95% CI 1.33 to 8.15).

Differences in lineage were also evident when comparing UK-born and foreign-born individuals of non-white ethnicity (see online supplementary table S1). The Euro-American lineage was significantly less common among foreign-born (compared with the UK-born) individuals of Indian subcontinent (OR 0.38; 95% CI 0.25 to 0.58) and Afro-Caribbean (OR 0.40; 95% CI 0.20 to 0.82) ethnicity.

Association of time since arrival and development of active TB for different lineages

Online supplementary figure S3 illustrates, for the foreign-born cohorts, the time between arrival in the UK and the development of active TB for the lineages in our cohort. Patients infected with the East African Indian lineage presented later (median 7 years, IQR 3–29) than the Euro-American lineage (median 5 years, IQR 2–18) (p=0.001).

Demographic factors associated with mycobacterial lineage

Table 3 depicts results of the univariate and multivariate analysis of the demographic and clinical factors associated with the four main lineages in our cohort (East Asian, Euro-American, East African Indian and Indo-Oceanic). On multivariate analysis, foreign birth and, in comparison to White ethnicity, certain non-white ethnic groups were inversely associated with Euro-American lineage (table 3). By contrast, Afro-Caribbean individuals were equally likely as White individuals to harbour the Euro-American lineage. Certain ethnic groups were either associated (Indian subcontinent) or inversely associated (Afro-Caribbean) with the East African Indian lineage. Foreign-birth and non-white ethnicity were significantly associated with the Indo-Oceanic lineage (table 3).

Table 3

Univariate and multivariate association of demographic, clinical and temporal factors with different Mycobacterium tuberculosis lineages

Factors associated with disease patterns on univariate analysis

Table 4 illustrates the clinical phenotypes of active TB (data for pulmonary only and extrathoracic disease only presented in table 4) stratified by lineage and ethnic group. In general, extrapulmonary only, extrathoracic only and lymph node only disease were more frequent in individuals who were foreign born (p<0.0001 for all three) and of non-white ethnicity (p<0.0001 for all three). UK-born individuals of White ethnicity had a significantly lower proportion of extrapulmonary only (7.7%), extrathoracic only (2.4%) and lymph node only (0.8%) disease compared with individuals of UK-born non-white ethnicity (31.9%, 25.9% and 20.4% respectively) (p<0.0001 for all three) and both these groups had significantly lower proportions of all three extrapulmonary phenotypes compared with individuals of foreign-born non-white ethnicity (51.1%, 46.5% and 35.4%; p<0.0001 for UK-born White and UK-born non-white).

Table 4

Different clinical phenotypes of active tuberculosis stratified by lineage and demographic group

East Asian and Indo-Oceanic lineages were significantly more likely to cause extrapulmonary only, extrathoracic only and lymph node only disease compared with pulmonary only disease (table 4). By contrast, the Euro-American lineage was significantly more likely to cause pulmonary disease and was inversely associated with extrapulmonary only, extrathoracic only and lymph node only disease (table 4).

Factors associated with disease patterns on multivariate analysis

Multivariate associations of factors associated with extrathoracic disease are presented in table 5. Certain lineages (with East Asian as the referent category) were negatively associated with extrathoracic only disease (East African Indian and Euro-American). With Euro-American lineage (the most frequent lineage in this cohort) as the reference category, the East Asian lineage (OR 1.86; 95% CI 1.10 to 3.15) was positively associated with extrathoracic disease but the East African Indian lineage was not (OR 0.94; 95% CI 0.62 to 1.42) (data not shown).

Table 5

Univariate and multivariate association of demographic, clinical and temporal factors with extrathoracic tuberculosis for four main lineages (excluding West African-1 and 2)

However, ethnicity was most strongly associated with clinical phenotype with Afro-Caribbean and Indian subcontinent ethnicities conferring adjusted ORs of 14 to 21, respectively, for extrathoracic disease.

Additional factors independently associated with the different extrapulmonary phenotypes were increasing age, female gender, foreign birth and time since UK entry (table 5).


TB can present as a disease spectrum usually involving the lungs but not uncommonly purely extrapulmonary or extrathoracic sites. Although these clinical disease patterns must represent the end result of an interaction between host, environment and mycobacterial lineage the relative importance of these three factors is unknown and can only be dissected in an ethnically heterogeneous population with detailed demographic, clinical and strain typing data, as presented here.

We found that Mtb lineages mapped with patients’ geographical origins and were associated with distinct ethnic groups, as previously described.2–5 After adjustment for demographics, certain lineages, such as Euro-American and East African Indian, were significantly less likely to cause extrapulmonary TB whereas others, such as East Asian, were associated with extrapulmonary disease. However, ethnicity was the factor most strongly associated with clinical phenotype. This was exemplified by the large population of Indian subcontinent ethnicity who, as previously described,2 ,3 harboured predominantly the East African Indian lineage which was preferentially associated with pulmonary disease; however, ethnicity overrode this association resulting in a predominantly extrathoracic and extrapulmonary pattern of disease. Indeed, with mutual multivariate adjustment, ethnicity in this ethnically diverse UK setting was more strongly associated with clinical phenotype than Mtb lineage, suggesting that the host is a relatively more powerful determinant of disease phenotype than mycobacterial lineage. Although non-white ethnicity has previously been recognised to be associated with extrapulmonary disease, the impact of Mtb lineage was not taken into account. Interpretation of such associations has therefore been difficult, given the association of different lineages with different ethnic groups and patients’ geographical origins.

Only Click and colleagues, in the USA, assessed the association between Mtb lineage and disease phenotype taking ethnicity and geographical origins into account.20 They found that while lineage was clearly associated with disease phenotype only weak associations with ethnicity were observed.20 The reason for the weaker influence of ethnicity compared with our study is unclear but may lie in differences in the ethnicities of non-white ethnic groups in the USA compared with the UK. Specifically, Indian subcontinent ethnicity was not included as an ethnic grouping in the US study whereas in our study it was the largest subgroup and the most strongly associated with extrathoracic disease. This strong association of Indian subcontinent ethnicity with extrathoracic disease, independently of lineage, has broad epidemiological relevance because not only is this ethnic group the most important in terms of national TB burden in the UK,31 but it also accounts for more than a quarter of TB cases globally.1 Although our findings highlight that the host is dominant in shaping clinical disease phenotype, we cannot currently distinguish whether the key determinant is the host per se (ie, host genotype) or whether ethnicity serves as a proxy for another host-related factor.32

Previous studies of different lineages with clinical disease pattern have generated conflicting results. Associations identified in a given ethnically homogeneous population have frequently not been replicable in other populations while studies in ethnically heterogeneous populations have hitherto failed to control for ethnicity, with the exception of Click et al.20

We found that the Euro-American and East African Indian lineages were associated with pulmonary disease whereas Click and colleagues found these lineages were associated with extrapulmonary disease.20 Previous authors have come to conflicting conclusions about the phenotypic associations of the Euro-American19 ,33 and East African Indian lineages.19 ,21–23 With the Euro-American lineage as the reference, we found the East Asian lineage was significantly associated, on multivariate analysis, with extrapulmonary disease. The mechanism underlying this propensity to disseminate is not known but may involve the ability of the Beijing (East Asian) lineage to produce a bioactive phenolic glycolipid which inhibits the innate immune system.33–35 As with other lineages, earlier studies have come to conflicting conclusions about the phenotypic association of the East Asian lineage.14–21

The predominant lineages in our study were geographically associated (table 2): Euro-American with Europe/Americas/Africa, East African Indian with the Indian subcontinent/East Africa, Indo-Oceanic with the Indian subcontinent/East Africa/Asia and East Asian with the Indian subcontinent/East Africa/Asia. Our findings are in line with previous studies,2 ,3 ,5 although the high proportion of individuals in our cohort from the Indian subcontinent, compared with the small proportion from East, and Southeast, Asia may have masked the association of these latter regions with the Indo-Oceanic and East Asian lineages.4 ,5

Lineages were also associated with distinct ethnic groups, with the Euro-American lineage associated with White and Afro-Caribbean individuals, East African Indian and Indo-Oceanic with Indian subcontinent ethnicity and East Asian with Oriental and Other Asia ethnicity. These findings are consistent with those of Evans et al 4 who assigned ethnicity by name and Gagneux et al 2 who found that US-born individuals of non-white ethnicity (mainly Chinese and Filipino) had the same lineages as individuals born in China and the Philippines. These close associations of lineages with self-reported ethnicity, an accurate marker of ancestry,36 are consistent with coevolution of mycobacteria and their human hosts. Alternatively, these associations could reflect infection with similar lineages following transmission within ethnic communities in the host countries through assortative mixing, after travel to countries of origin or visits from family based overseas.37

Our study had several limitations. One of these was lineage assignment. Although we used 15 loci MIRU-VNTR (the standard DNA fingerprinting method at the time) an online database (MIRU-VNTRplus) and published literature to assign lineages to isolates, a proportion could not be definitively classified into any of the six major lineages either due to the lower resolution of 15-MIRU-VNTR or because the online database did not contain data on enough lineages. However, the demographic and clinical profile of these individuals did not differ from the cohort that was analysed; sensitivity analyses including the non-lineage-assigned isolates did not provide any evidence that these isolates formed a distinct lineage with a separate clinical phenotype. In addition, the 15-MIRU-VNTR method was recently effectively used for lineage assignment by Click et al.20 An improved version of 15-MIRU-VNTR typing has subsequently been described that analyses 24-loci and provides greater resolution; an alternative would be to base lineage assignment on large sequence polymorphisms or whole-genome sequencing.

Although national guidelines now state that patients with TB should be tested for HIV,38 this was inconsistently recorded in our dataset as in many national surveillance systems.10 ,11 ,21 ,39 While untreated HIV infection is well recognised to be an important factor in determining an extrapulmonary clinical phenotype,40 its influence on clinical phenotype in our predominantly Indian subcontinent study population is probably very small as TB-HIV coinfection in this segment of the UK population is very low.41 Moreover, there is no association between HIV and clinical disease pattern in the post-antiretroviral therapy era.12 Other factors which may affect clinical presentation include age, iatrogenic immunosuppression and diabetes mellitus.42–44 To ensure unambiguous clinical phenotypes for our regression analyses, we classified cases as exclusively pulmonary and exclusively extrathoracic; the resulting exclusion of individuals with concurrent disease might therefore have resulted in some associations being missed. Future work should explore the factors (including ethnicity) that are associated with different extrathoracic disease subtypes. Whilst our study focused primarily on host factors and mycobacterial lineage, an alternative area of potential relevance, and future study, is the impact of environment on clinical phenotype with, for example, the recency of Mtb transmission potentially playing a role in determining clinical phenotype.

In conclusion, although Mtb lineages, which are geo-ethnically restricted, can be associated with specific clinical phenotypes, host ethnicity appears to be more important in determining the clinical pattern of tuberculous disease, at least in the case of Indian subcontinent ethnicity and extrathoracic and extrapulmonary disease. Future work should integrate host demographic and genotypic data with environmental data for factors associated with TB, such as smoking and vitamin D levels.


This publication made use of the MIRU-VNTRplus database website ( developed by D Harmsen, S Nieman, P Supply and T Weniger.


View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors AL, MP, JI and GS conceived of the idea for the study; JI: collected the clinical data as part of routine care; JE, EGS, PH: undertook the MIRU-VNTR typing and maintained the genotyping database; MP: undertook the data analysis; MP, AL: wrote the first draft of the manuscript with subsequent revisions made by all other coauthors. All coauthors had sight of the submitted paper. AL: guarantor for the paper.

  • Competing interests AL is inventor for patents underpinning T-cell-based diagnosis. The ESAT-6/CFP-10 ELISpot was commercialised by an Oxford University spin-out company (Oxford Immunotec Ltd, Abingdon, UK) in which Oxford University and Professor Lalvani have a minority share of equity. MP, JE, JI, EGS, SHW, KL, SS, MD, PH have no conflict of interest.

  • Funding MP is currently a NIHR Academic Clinical Lecturer in Infectious Diseases; his PhD was funded by a Medical Research Council Capacity Building Studentship. AL is a Wellcome Senior Research Fellow in Clinical Science and NIHR Senior Investigator.

  • Ethicas approval No patient-specific data or personal identifiers were used in the preparation of this report which was an analysis of routine data collected as part of service evaluation.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.