Main

Bacterial species consist of genetically related strains that have evolved from a common ancestor. These strains belong to the same clonal lineage, and evolve through mutation and horizontal gene transfer events that involve both homologous and non-homologous recombination1. In naturally competent Helicobacter pylori , which has a high rate of import of unusually short pieces of DNA through homologous recombination2, sequence variation is so extensive that within a chronically infected host who has been infected by more than one strain, so many different subclones emerge that virtually every progeny organism is genetically distinct2. Other species, such as Streptococcus pneumoniae and Neisseria spp., also have such a high rate of recombination that most of the genetic differences observed between different clonal types are thought to be due to recombination rather than mutation3,4. By contrast, Mycobacterium spp. have evolved primarily through mutation, and exhibit only a low level of horizontal gene transfer5,6 (Box 1). However, the contribution of horizontal gene transfer to the evolution of the Mycobacterium tuberculosis complex is not completely understood7.

Modelling the diversification of a uniform bacterial population8 revealed that when recombination is infrequent in a bacterial species, for example, in M. tuberculosis, distinct clonal populations appear within approximately 250,000 generations. However, for organisms in which recombination is much more frequent than mutation, transient, more diffuse clonal clusters emerge, but do not establish themselves as distinct clusters and instead become incorporated into the main cluster through recombination within the population. Only when sequence diversity markedly reduces homologous recombination can distinct clusters emerge and avoid being reabsorbed by recombination8. Consequently, clonal clusters evolve and disappear in this model in the absence of external selection. As homologous recombination involves the replacement of fragments in the bacterial chromosome, which leads to strain diversification, species in which such recombination events occur extremely rarely, such as M. tuberculosis, have a highly clonal population structure, whereas species in which recombination events occur frequently, such as H. pylori, are almost non-clonal9.

Mutation frequencies and mutation spectra also differ among bacterial species, and isolates of H. pylori have a higher spontaneous mutation frequency than most other bacterial species10. Furthermore, within a given species, different isolates can have significantly different spontaneous mutation frequencies11,12. In some cases, hypermutability among clinical isolates (bacteria with a mutator phenotype) have been associated with dysfunctional mismatch-repair systems13. Even though a mutator phenotype can increase adaptability to novel environmental conditions, it can also cause mutations that affect bacterial fitness to accumulate. It has been suggested that long-term maintenance of high bacterial mutation rates is driven by rapidly changing selection pressures, such as those imposed by bacteriophages14.

The presence of insertion (IS) elements and other repetitive sequences contributes to duplications, inversions and translocations within single genomes. One interesting feature that is unique to the pneumococcal genome, in contrast to 51 other analysed bacterial genomes, is the abundance of repeat regions, such as IS elements, which constitute approximately 5% of the pneumococcal genome, and non-IS elements, such as the repetitive DNA-sequence motifs Box and RUP elements15,16. Box elements are of unknown function, are typically 100–200 bp in length and are located in intergenic regions, whereas RUP repeats are usually 107 bp in length and are related to IS elements15,16. The high density of these repeats suggests they might have a role in intragenomic and intergenomic recombination events.

Clonal evolution and the ecological niche

In addition to mutation and recombination, clonal evolution is also affected by the population size, the size of the gene pool and the genetic content of the ecological niche. S. pneumoniae is primarily a human-specific organism that normally colonizes the nasopharynx of healthy preschool children17,18. The nasopharynx also contains non-virulent streptococci, such as Streptococcus oralis and Streptococcus mitis , as well as other competing organisms, such as Haemophilus influenzae and Moraxella catarralis , and could therefore provide genetic information that could facilitate the development of antibiotic resistance17,18. The clonal evolution of S. pneumoniae depends on selective forces in the upper airways, such as competition and collaboration with other pneumococcal strains or strains of other bacterial species to retrieve nutrients for growth, direct negative (for example, bacteriocins and hydrogen peroxide) or positive interactions for growth, as well as antibiotics taken by the host, and clear responses from the innate and adaptive immune system of the host. A particularly interesting example of pneumococcal competition in the nasopharynx is the phenomenon of pneumococcal fratricide, in which transformation-competent cells are able to kill non-competent cells, thereby allowing released DNA to be retrieved from lysed cells. This process has been shown to considerably increase horizontal gene transfer between different strains of pneumococci, as well as between related commensal streptococci19,20.

Selective pressures, in combination with available DNA from the same or different bacterial species, competence for transformation and mutation-promoting factors all contribute to the rise and fall of pneumococcal clonal lineages (Fig. 1). Furthermore, bacterial properties (for example, those that promote adhesion), environmental ecological factors and host responses all influence clonal diversification. The main factors that affect pneumococcal population structure (for example, clonal evolution) are depicted in Fig. 1.

Figure 1: Bacterial and host factors that influence the evolution of clones.
figure 1

Although recombination is the most important mechanism for the evolution of Streptococcus pneumoniae, spontaneous mutations also have a role in the clonal evolution of clinical isolates. The availability of DNA from the same or different bacterial species and bacterial factors that influence competence for transformation, as well as mutation-promoting factors, such as the production of reactive oxygen or nitrogen species from the bacteria and/or from infected immune cells, all contribute to the clonal population structure. Other bacterial factors, as well as environmental and host factors, also influence whether clonal diversification or clonal conservation (which leads to a genetically homogenous population) occurs.

Similar to pneumococci, H. pylori has a restricted habitat, but unlike pneumococci thrives in a genetically poorer environment, which acts as a constraint on genetic evolution. Although recombination through transformation is common in H. pylori, genetic diversity must be created before amplification can occur through horizontal gene transfer. This can be achieved by infection with multiple H. pylori strains during childhood. The decreased abundance of H. pylori in the human population, especially in developed countries21,22,23, is likely to decrease the incidence of multiple infections in the same individual, a factor that might seriously limit the genetic evolution and adaptability of this pathogen.

Transmission of both pneumococci and H. pylori requires close contact between individuals. The transmission of pneumococcal strains that belong to different clonal types is facilitated by keeping carriers and non-carriers in confined areas, such as at child-care centres, where the likelihood of transmission is dependent on the number of children who attend the centre, the number of bacteria in the nasopharynx of carriers, the ability of a clonal type to establish itself in a new host and host susceptibility. For H. pylori, transmission primarily occurs from mother to child during the first 1–2 years after birth24.

Other bacterial pathogens have more complex ecological niches compared with pneumococci and H. pylori that involve both animate and inanimate environments. Staphylococcus aureus , for example, can dwell on a number of human and animal surfaces, such as the skin and nares, and is able to survive in the environment for an appreciable length of time, allowing both nosocomial spread and transmission within the community25 (Box 2).

H. pylori, S. aureus and pneumococci are examples of organisms that colonize humans and, in most cases, can be carried in healthy individuals. Disease is frequently associated with underlying genetic and/or predisposing factors in the host; for example, cancer, HIV-1, cardiovascular diseases26 and influenza virus infection27. Recent data suggest that the sensitization of influenza virus infection to subsequent pneumococcal pneumonia is mediated by viral induction of interferon-γ, which decreases the clearing responses of alveolar macrophages28, but it has also been suggested that influenza virus neuraminidase directly promotes pneumococcal adherence and infection29. However, it is not yet known if influenza virus infection of humans increases host susceptibility for all clonal types of pneumococci. Infection with pneumococci and other opportunistic pathogens can also occur in previously healthy individuals30, who can possess particular genetic haplotypes at loci that affect their susceptibility to infection and/or disease. For example, polymorphisms in the cytosolic muropeptide recognition proteins nucleotide-binding oligomerization domain 1 (NOD1) and NOD2 affect the outcome of H. pylori infection31, and specific human leukocyte antigen class II haplotypes affected severity of group A streptococcal disease32. Likewise, homozygosity for maltose-binding-protein codon variants, which are associated with low production of this innate immune mediator, is associated with a higher frequency of invasive pneumococcal disease33. However, a recent prospective cohort study of individuals from Denmark reveals that the genetic constitution of the human host only has a small role in the development of invasive pneumococcal disease34. Instead, other factors, such as premature birth and the level of crowdedness at child-care centres and in households, play major parts in predisposition towards invasive infection35,36.

Correlation of bacterial properties with disease

Our increased ability to compare clinical isolates from well defined carrier or patient groups genetically has allowed us to determine specific intraclonal (rapidly evolving through selection or high mutation frequency), clonal (evolving without external selection) or horizontally acquired bacterial traits that can be linked to infectivity or transmissibility, disease likelihood and disease type. Thus, isolates of H. pylori that express specific binding variants of the BabA adhesin can be correlated with the infection of different ethnic groups37. In another well-characterized example of functional adaptation, specific sequence variants of the pilus-associated FimH adhesin of uropathogenic Escherichia coli correlated with adhesion to the uroepithelium and subsequent infection of the lower urinary tract38. Such high-binding FimH variants are selected at a higher rate than housekeeping genes, which allows intraclonal FimH diversification39. Clonally related isolates of Neisseria meningitidis serotype C were recently shown to carry an insertion sequence at a site that controls capsular expression and were therefore resistant to human serum antibody killing40.

In S. pneumoniae, the capsular polysaccharide is used to categorize pneumococci into at least 91 capsular serotypes41,42,43. Several epidemiological studies have shown that serotype distribution differs depending on the geographical area and the time period studied, as well as on the isolation site of the bacteria (for example, blood, cerebrospinal fluid, ear or nasopharynx) and the disease being considered18,44,45,46,47,48,49. In addition, a number of molecular epidemiological studies have been performed on clinical strains of S. pneumoniae to compare the serotype distribution of nasopharyngeal isolates from healthy carriers and isolates from sterile sites, including blood and/or cerebrospinal fluid44,50,51,52. These studies, which included a meta-analysis, all showed that pneumococci which belong to different capsular serotypes have different odds ratios of causing invasive disease44,50,51. Thus, in these studies, pneumococci of serotypes 1, 4, 5 and 7F were more likely to cause invasive disease and were rarely found in the nasopharynx of healthy carriers. By contrast, isolates of serotypes 3, 6A, 19F and 23F had low invasive-disease potential and were predominantly found in healthy carriers44,50. Similarly, in the Finnish study by Hanage et al.52, serotypes 14, 18C, 19A and 6B had odds ratios of >1 for invasive disease, and were therefore associated with invasive disease, whereas serotypes 6A and 11A had odds ratios of <1. However, isolates with a low invasive-disease potential, such as type 19F isolates, may still cause invasive disease and are so predominant in the carrier population that they may be just as common in invasive disease as isolates that belong to serotypes associated with a higher invasive-disease potential53.

By including patient data in an invasive disease study it became apparent that S. pneumoniae serotypes with a higher invasive-disease potential (such as serotypes 1, 4 and 7F) were more likely to cause disease in previously healthy individuals, and therefore act as primary pathogens. This is in contrast to serotypes with low invasive-disease potential, which preferentially caused invasive disease in patients with underlying diseases and therefore act as opportunistic pathogens30. Interestingly, in this study, serotypes associated with the highest invasive-disease potential were associated with the lowest mortality30. This was partially because the healthier group of patients became infected by these serotypes. Mortality was also high in previously healthy individuals, however, at least for serotype 3, which is associated with a low odds ratio for invasive disease30. By contrast, an international study by Alanee et al.54, who studied pneumococcal bacteraemia in ten countries, did not find an association between disease severity or mortality and serotype, and suggested that host factors might be more important than serotype for disease outcome. However, in this study, clonal type (the genetic relatedness of strains within one serotype; discussed below) was not known, which might affect the interpretation of the results. In summary, these and other epidemiological studies reveal a more complex interpretation of bacterial virulence than that obtained through conventional virulence studies in animal models, and bring new parameters into play, such as the potential for invasive disease in healthy and compromised individuals and disease severity.

Clonal analyses of pneumococcal isolates have allowed studies of genetic relatedness between clinical isolates (a comparison of molecular typing schemes is provided in Fig. 2). Such studies have revealed that isolates from serotypes with high invasive-disease potential in Sweden, such as serotypes 1 and 7F, are more clonally related than isolates of serotypes with a lower invasive-disease potential30,51,53,55. Thus, based on multilocus sequence typing (MLST), Swedish isolates of invasive serotypes 1 and 7F seemed to belong mainly to clonal complex (CC) 306 and CC191, respectively (Figs 3, 4). Furthermore, using whole pneumococcus genome-based microarrays (TIGR4 (of serotype 4) and R6 (unencapsulated derivative of D39, a serotype 2 strain)), fewer genetic differences were found among isolates that belonged to sequence type (ST) 306 and ST191 than in other STs tested55. Interestingly, sub-Saharan isolates of serotype 1 belong to an unrelated epidemic CC (ST217)56 that, unlike the current Swedish CC ST306 (discussed above), is associated with severe disease (meningitis) and high mortality30,56.

Figure 2: Common typing methods used to study relationships between pneumococcal isolates.
figure 2

Serotyping, which is based on the expression of capsular polysaccharides, is the most common method used to study relatedness between pneumococcal isolates. However, serotyping does not provide information about the genetic relatedness between pneumococcal strains, and therefore other molecular epidemiological methods are being used to study bacterial chromosome content. MLST (multilocus sequence typing), which relies on sequencing and comparing seven housekeeping genes, has no more resolving power than other DNA-based techniques described here, but can make national and international comparisons of strains. In pulsed-field gel electrophoresis (PFGE), the pneumococcal chromosomal DNA is cleaved by a restriction enzyme and then run on a pulsed-field gel, thereby separating large bands. This method has good discriminatory power, especially when used to investigate outbreaks. Using microarray technology, genes from sequenced pneumococcal genomes are spotted on an array, and the presence or absence of these genes can be studied by overlaying DNA from pneumococcal isolates of interest. The discriminatory power of microarrays depends on the number of genes to be investigated. Finally, the method with the most discriminatory power is whole-genome sequencing, which allows all genomic differences between clinical isolates to be identified.

Figure 3: Relationship between pneumococcal serotype, clonal complex (CC) and sequence type (ST) in ST156 and ST306.
figure 3

Approximately 95% of CC306 and 98% of ST306 isolates are from serotype 1, and few serotype switches have therefore occurred within CC306. By contrast, only 50% of CC156 isolates and 70% of ST156 isolates are of serotype 9V and several serotype switches have occurred; for example, to serotypes 14 and 19F. In total, 30% of serotype 1 isolates belong to CC306 and 50% of CC306 isolates belong to ST306. By contrast, 80% of serotype 9V isolates belong to CC156 and 20% of CC156 isolates belong to ST156. Thus, there are major differences in the genetic relatedness between clinical pneumococcal isolates from different serotypes and clonal types, as determined using MLST (multilocus sequence typing), and this influences the interpretation of typing results. Based on data from the MLST database (see Further information).

Figure 4: eBURST groups in the MLST database.
figure 4

A population snap-shot based on the entire Streptococcus pneumoniae MLST (multilocus sequence typing) database, from May 2008. The 30 largest eBURST (see Further information for a link to eBURST on the MLST database) groups are highlighted. The predicted founders of the clonal complexes (CCs) are shown in blue and the size of the circles reflects the frequency of that sequence type (ST) in the data set. Connected STs differ at only one of the seven loci. An eBURST group (or CC) is defined as a group of STs in a population that share 6–7 alleles with at least one other ST in the group. Isolates from a CC are assumed to have a recent common ancestor. With eBURST it is possible to study strain microevolution based on the entire MLST database127,128.

Whether serotype-dependent differences in the likelihood of invasive pneumococcal disease depend only on different capsules that confer varying virulence or whether the different serotypes are composed of clonal types with different invasive-disease potentials, which would suggest that other bacterial factors are also important for disease outcome, has been debated. In a Finnish study of children aged ≤2 years old it was shown that strains of serotype 14 (ST156) had a higher invasive-disease potential than strains of serotype 9V (ST162), which are derived from the same CC. This suggests that the serotype 14 capsule provides increased invasiveness compared with the serotype 9V capsule52. However, isolates from the ST156 CC can differ by up to 40 genes57, and therefore a direct correlation between virulence and capsular type cannot be drawn.

Despite uncertainty in the human setting, clinical isolates of S. pneumoniae have been shown to possess dramatic differences in colonization and/or invasive disease after intranasal challenge of mice that correlate both with different capsular types and different clonal types of the same capsular type53. In this study, intraclonal differences were also found for two serotype 1 isolates of the same CC (ST228) that did not exhibit the same virulence after intraperitoneal challenge.

Thus, it seems reasonable to conclude that bacterial factors, such as the amount and type of pneumococcal capsule, affect bacterial virulence in humans in combination with other properties that may differ among different clonal lineages or even within single clones.

The diverse pneumococcal genome

Sequence analyses, as well as overall genome comparisons, have shown that S. pneumoniae belongs to a phylogenetic lineage from a large cluster of otherwise commensal streptococci. Included in this cluster are species such as S. mitis and Streptococcus pseudopneumoniae; S. pneumoniae is no more divergent from other members of the cluster than individual lineages of S. mitis are from each other58. Yet the S. pneumoniae lineage is by far the most virulent lineage of the cluster. The gene pool available for pneumococci might therefore be considerably larger than previously anticipated. The pneumococcal genome is highly diverse55,59,60. After complete sequencing of 17 pneumococcal strains, coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which only 1,454 (46%) were conserved among all 17 strains; the rest were accessory genes that could be present or absent in different isolates61. Using microarrays to study 40 pneumococcal clinical isolates that belonged to 12 serotypes and 33 STs, we found that most of the accessory genes were localized to 39 different regions of diversity or accessory regions55 that are localized in small gene clusters around the genome15,55,62,63. Unlike the well characterized pathogenicity islands of Gram-negative bacteria64, pneumococcal accessory regions are not integrated at the sites of tRNA genes, and only some differ significantly in G+C content from the rest of the genome. Some accessory regions might have evolved through phage integration, as in Streptococcus pyogenes , and some are flanked by insertion elements, which is indicative of site-specific recombination events. However, most cannot be distinguished from genes that belong to the core genome, suggesting that they represent a pool of genes that is common to both pneumococci and related streptococci55.

As expected from the finding of Feil and colleagues65 that diversity in the S. pneumoniae genus is primarily created by recombination, the pattern of accessory regions in individual pneumococcal strains from the same clonal cluster is similar or even identical55. The presence or absence of pneumococcal accessory regions can, for the most part, be explained by homologous recombination within the genus, but interspecies recombination probably also occurs. Strains that have been assigned to different clonal clusters based on MLST differ in their pattern of accessory regions, irrespective of whether they belong to the same capsular serotype or not55. Thus, it can be inferred that the distribution and expression of accessory regions and their expression pattern will affect virulence and disease outcome for a given isolate. However, there are substantial redundancies among virulence attributes, meaning that the genetic background of the bacteria will decide the impact of a single virulence factor.

Strains belonging to clonal types are frequently found in the nasopharynx, and are therefore associated with carriage and opportunistic infections and would be expected to harbour accessory regions that allow colonization and growth on mucosal surfaces. One example of such a region is the rlrA islet, which encodes pneumococcal pili66,67. rlrA is present in a restricted number of clonal lineages and, depending on differences in sequence, can be divided into three different clades, with an overall homology of 88–92%68. Pneumococcal pili have been shown to enhance adhesion to human respiratory epithelial cells, promote colonization of the murine upper airways and enhance the inflammatory response during the systemic phase of pneumococcal disease67. Despite the competitive advantage of pili expression, only approximately 20–30% of randomly selected isolates carry the rlrA pilus-encoding islet68,69,70. It is possible that immunogenic pili71 generate immunity in the population, thereby selecting against piliated pneumococcal strains in the community.

A second pilus type was recently discovered in S. pneumoniae72. As for rlrA-encoded pili (pilus islet 1 (PI-1)), PI-2 is present in a restricted number of clonal types that are associated with serotypes 1, 2, 7F, 19A and 19F, which are considered to be emerging serotypes. Furthermore, second-type pili contribute to epithelial adhesion, but their role in pneumonia and invasive disease has not been defined. However, the presence of PI-1 and/or PI-2 in isolates from the highly invasive CC ST306 (serotype 1), ST205 (serotype 4) and ST191 (serotype 7F) suggests that PI-1 and PI-2 pili play a part not only in adhesion but also in lung infection and invasive disease in humans.

Specific CCs of pneumococci can also carry other adhesive islets. For example, the pneumococcal serine-rich repeat protein (PsrP)73 is encoded in a large accessory region that is present in many isolates with high predicted invasiveness, such as ST306 of serotype 1, but is absent in most other isolates73. Pneumococcal accessory regions not only encode adhesins and other potential virulence attributes but also encode genes that are involved in transport and metabolism. Interestingly, a number of genes identified as being necessary for in vivo growth in mice belong to accessory regions that can be absent from clonal types which are capable of causing invasive disease in humans, suggesting significant redundancy in accessory gene functions (B.H-N., C.B., J.D. and S.N., unpublished observations). By comparing pneumococcal clonal types of serotypes 1, 4 and 7F, all of which have high invasive-disease potential, with all other clonal types, we found that only one accessory region is present in the invasive clonal types but missing in isolates of most other clonal types (B.H-N., C.B., J.D. and S.N., unpublished observations). This region, which was identified as important for invasive disease in a signature-tagged mutagenesis screen74 (a high-throughput method that identifies mutants with reduced or increased adaptation to certain environments), encodes a family 1 phospho-β-glucosidase, which suggests that complex-carbohydrate utilization contributes to high invasive-disease potential (B.H-N., C.B., J.D. and S.N., unpublished observations).

Even though accessory regions are only present in specific clonal lineages of S. pneumoniae, their expression and regulation is probably interconnected with regulatory networks that involve housekeeping genes. One recent example from the Tuomanen group75 showed that the rlrA regulatory locus of PI-1 is integrated into transcriptional regulatory networks that control the expression of virulence factors that are important in pneumococcal adhesion and invasion.

In addition to the presence and absence of accessory regions, sequenced isolates can contain genes that although present in all isolates possess considerable sequence variation. Such variation could be important for virulence, as shown by the ability of PspC (also called SpsA, CbpA, PbcA and Hic), a surface protein of S. pneumoniae, to bind secretory immunoglobulin A, C3 and complement factor H, and act as an adhesin76,77. However, whether or not allelic variation of pspC is associated with different clonal types has not yet been shown.

Little is known about the contribution to virulence of naturally occurring mutations in single genes. The pore-forming toxin pneumolysin of S. pneumoniae is thought to be particularly important for virulence78. Yet one of the most successful invasive clones of serotype 1 (ST306) expresses a non-haemolytic pneumolysin79,80. This suggests that a deficiency in active pneumolysin production in the context of the ST306 genetic background is neutral or perhaps even advantageous for human pneumonia, which is frequently associated with serotype 1 infections. Researchers have recently observed an increased incidence of empyemas, which are associated with childhood pneumonia and are preferentially caused by serotype 1 pneumococci both in England and France81,82, but it is unknown whether these more severe pneumonias are caused by serotype 1 isolates that produce a non-haemolytic form of pneumolysin.

Spread of penicillin non-susceptible clones

S. pneumoniae, like Neisseria spp., has evolved decreased susceptibility to penicillin by remodelling the penicillin-binding proteins through a series of horizontal gene transfer events and mutations83,84. The occurrence of penicillin-non-susceptible pneumococci (PNSP) in several different geographical areas, which led to resistance in >50% of the pneumococcal population in some regions, was mainly due to the spread of a limited number of clonal types85,86,87. Such clones appear both in countries with high and low antibiotic consumption. In the few cases in which the proposed susceptible variants of the same clones were studied, the susceptible ancestors also demonstrate high transmission ability in the community. Therefore, when antibiotic resistance evolves in pneumococcal strains that can be transmitted to and colonize the human nasopharynx there is a risk that such strains could emerge as international clones for global spread.

One such PNSP clonal cluster is ST156 (also referred to as Spain9V-3 by the Pneumococcal Molecular Epidemiology Network86 (PMEN; see Further information)). This clonal cluster is associated with several capsular serotypes, such as 9V, 14 and 19F, and has been isolated from humans on four continents57,88. In several countries, including Sweden, ST156 has been the dominating PNSP clone for several years57,88,89. Both PNSP and penicillin-susceptible pneumococci isolates (from ST162) that belong to the ST156 clonal cluster were found to carry PI-1, the rlrA pilus-encoding islet (Fig. 4; Table 1). Interestingly, the internationally spread Spain6B-2 clone, which caused a dramatic increase of penicillin non-susceptibility in Iceland when it spread among children87, also carries PI-1, but produces pili of a different clade than pili produced by pneumococci of the ST156 clonal cluster. When all PNSP isolates in Sweden were characterized, 70% belonged to piliated clones of the three different clades57, which is higher than the incidence of piliation among randomly collected strains57,68,69,70. To study the in vivo importance of pneumococcal pili for colonization and spread, mice were inoculated intranasally with a low dose of piliated and non-piliated pneumococci of two clinical isolates that differed only in their possession of PI-1, as determined by microarray analysis. In this competition experiment, the piliated clinical isolate 'outcompeted' the non-piliated isolate for colonization57. No correlation to penicillin non-susceptibility has been observed between the second pneumococcal pilus islet, PI-2, even though the globally spreading PNSP clone Taiwan (19F)-14 contains both pilus islets68,72. Available data therefore suggest that penicillin non-susceptibility is built up in clones that already carry the rlrA (PI-1) islet, resulting in PNSP strains that are particularly capable of global dissemination.

Table 1 Predicted founders of the 30 largest eBURST groups*

Competence and clonal evolution

S. pneumoniae is a naturally transformable species. This competence for uptake and incorporation of DNA is induced by the secretion of competence-stimulating peptide (CSP), of which two (CSP1 and CSP2) have been reported to date90,91. CSP interacts with the membrane-bound histidine kinase receptor (ComD), which leads to phosphorylation of the cognate response regulator ComE. In its phosphorylated state, ComE activates 20 so-called early genes, of which the alternative sigma factor ComX directs the transcription of around 60 genes, including those involved in DNA uptake92,93. Microarray analyses revealed that most, if not all, of the genes involved in transformation belong to the core genome (B.H-N., C.B., J.D. and S.N., unpublished observations). Nevertheless, as many as 50% of the randomly selected encapsulated clinical isolates were poorly transformable or non-transformable under laboratory conditions after induction with either CSP1 or CSP2 (Refs 90, 94, 95). These strains belonged to several serotypes, and isolates of the same serotype often differed in their competence induction by CSP1 or CSP2 (Ref. 90). We used clonally defined pneumococcal isolates to revisit this issue and found that a number of isolates that could not be transformed after induction by CSP1 and CSP2 belonged to highly successful clonal clusters, such as ST306 of serotype 1, that exhibit epidemic behaviour when causing invasive disease but rarely colonize the nasopharynx96. Furthermore, isolates of ST124 from serotype 14, one of the most prevalent penicillin-susceptible clones in Sweden for a number of years, and isolates of ST180 from serotype 3, a serotype that has one of the highest rate of mortality in humans, had low transformability30,45 (B.H-N., P.B. and S.N., unpublished observations). Clonal clusters associated with low transformability were also rarely found to be PNSP. These results suggest that a subset of non-transformable pneumococcal clonal clusters may not have access to the gene pool that is available for fully transformable pneumococcal isolates, a factor that might affect their evolution through recombination.

Clonal selection after vaccination

The seven-valent pneumococcal conjugate polysaccharide vaccine (PCV7) was licensed for use in 2000 for infants and young children in the United States97. It is clear that owing to herd immunity in the United States and in other countries where PCV7 has been introduced, the vaccine has been beneficial and has significantly reduced the incidence of pneumococcal invasive disease and pneumococcal carriage caused by vaccine serotypes (serotypes 4, 6B, 9V, 14, 19F, 18C and 23F) in both vaccinated children and in the unvaccinated community97,98,99,100,101,102. However, an increase in non-vaccine serotypes, such as serotypes 3, 15, 19A, 22F and 33F, among children has been noted, and the non-vaccine serotype 19A has become the predominant cause of invasive pneumococcal disease in children in the United States103. In a study from 2003–2004, Beall and colleagues104 showed that CC199 predominated among serotype 19A isolates in children less than 5 years old, which represented approximately 70% of invasive serotype 19A isolates. Brueggemann et al.105 found that this increase in serotype 19A was partly due to the prevalence of this genotype prior to introduction of the vaccine and partly owing to the emergence of a genotype that had previously only been associated with serotype 4. This novel vaccine-escape strain evolved following a single genetic event in which a 39 kb fragment was transferred from the serotype 19A strain into the type 4 strain, resulting in both a capsular switch and penicillin non-susceptibility; two penicillin-binding proteins were also involved in this recombination event105. In Alaska, an increase was observed in non-vaccine types, predominantly of serotype 19A, that cause invasive disease. Most of the expansion of this serotype was attributed to an increase in one genotype, ST199 (Ref. 101). Furthermore, serotype replacement after vaccine introduction has emerged in invasive pneumococcal disease with serotypes not included in the vaccine; for example, serotypes 3, 7F, 15B/C/F, 19A, 22F, 33F and 38 have been described in various surveillance systems103,106,107,108,109,110,111. In Portugal, the most common non-vaccine invasive serotypes in children after vaccine introduction were types 1, 19A, 7F, 3 and 33F112.

Particularly worrisome has been the resistant nature of some non-vaccine isolates111,113. The incidence of invasive pneumococcal disease owing to penicillin-resistant 19A isolates has increased considerably in the post-vaccine era in the United States. Of 151 penicillin-resistant 19A isolates, 111 (73.5%) belonged to the same CC as the multidrug-resistant Taiwan (19F)-14 described above114. Clonal analysis of emerging 19A acute otitis media isolates that are resistant to all Food and Drug Administration (FDA)-approved antibiotics showed that these isolates belonged to ST2722, a single-locus sequence variant of the globally spread ST156 discussed above111. Other PNSP clones of non-vaccine serotypes have also expanded in the post-vaccine era in the United States; for example, ST558 of serotype 35B115. These emerging PNSP non-vaccine isolates belong to clonal clusters that possess PI-1 (Refs 67, 68). We might therefore be witnessing clonal selection of pre-existing PNSP isolates of non-vaccine serotypes that were minor lineages in the carriage population, but have the capacity to expand, potentially promoted by adhesive properties, such as pili.

We have not found any published report on post-vaccine expansion in the United States of non-vaccine serotypes that are associated with a high invasive-diseasepotential, such as serotypes 1 and 7F. Furthermore, no capsular switches for these non-vaccine serotypes of high invasive-disease potential have been reported. However, these serotypes are not as common in the United States as they are in some European countries, and therefore vaccination could have different effects on the emergence of more invasive non-vaccine type strains in Europe compared with the United States. In a recent study by Lipsitch et al.116, only one example of capsular switching in which a novel ST was associated with a non-vaccine serotype was noted (a serotype 35F strain of ST124, which is usually associated with serotype 14). As ST124 of serotype 14 represents one of the most successful invasive clones, it will be important to monitor whether its disease-causing capacity is retained after capsular switching to serotype 35F, a serotype that is rarely associated with invasive disease in children116. In this study, other examples of emerging non-vaccine serotype strains could have been selected from an already existing pool of clones with non-vaccine serotype capsules116.

Future perspectives

We are far from understanding the precise nature of clonal success and clonal dynamics for any bacterial species. Recent advances in sequence technology are likely to lead to extensive whole-genome sequencing of clinical isolates and comparative genomics combined with in vivo studies of potential virulence markers. Mathematical modelling based on observed data will also give us some clues to the mechanism behind successful clonal spread within the community, taking into account bacterial, host and environmental factors. These kinds of analyses, in which information from molecular epidemiological studies of clinical isolates can be combined with clinical information about the patients infected, will lead to a new understanding of why some bacterial clones spread successfully whereas others disappear. This will eventually lead to better strategies for the prevention of clonal transmission in our society.

A high number of S. pneumoniae genomes are being sequenced, which will provide information about the so-called pan-genome of this species and on whether this genome is open (whether additional new genes will be discovered within each genome). The pattern of presence or absence of variable genes, polymorphisms in individual genes and gene-expression data needs to be combined with clinical, epidemiological and human susceptibility data to fully understand the interplay between clone and host properties that promote invasive disease by an organism that normally behaves as an innocent colonizer of the nasopharynx. Because the pneumococcus is not the only inhabitant of its ecological niche, we need more information about the bacterial population structure and dynamics in the nasopharynx, a task that can be approached by metagenomic analyses.