Emerging Infectious Diseases
[Volume 5 No.2 / March - April 1999]

Research

Rapid Molecular Genetic Subtyping of Serotype M1 Group A Streptococcus
Strains

Nancy Hoe,* Kazumitsu Nakashima,* Diana Grigsby,* Xi Pan,* Shu Jun Dou,*
Steven Naidich,† Marianne Garcia,‡ Emily Kahn,‡ David Bergmire-Sweat,‡ and
James M. Musser*
*Baylor College of Medicine, Houston, Texas, USA; †Naidich Space
Laboratory, Inc., New York, New York, USA; ‡Texas Department of Health,
Austin, Texas, USA

---------------------------------------------------------------------------
      Serotype M1 group A Streptococcus, the most common cause of
      invasive disease in many case series, generally have resisted
      extensive molecular subtyping by standard techniques (e.g.,
      multilocus enzyme electrophoresis, pulsed-field gel
      electrophoresis). We used automated sequencing of the sic gene
      encoding streptococcal inhibitor of complement and of a region
      of the chromosome with direct repeat sequences to unambiguously
      differentiate 30 M1 isolates recovered from 28 patients in
      Texas with invasive disease episodes temporally clustered and
      thought to represent an outbreak. Sequencing of the emm gene
      was less useful for M1 strain differentiation, and restriction
      fragment length polymorphism analysis with IS1548 or IS1562 as
      Southern hybridization probes did not provide epidemiologically
      useful subtyping information. Sequence polymorphism in the
      direct repeat region of the chromosome and IS1548 profiling
      data support the hypothesis that M1 organisms have two main
      evolutionary lineages marked by the presence or absence of the
      speA2 allele encoding streptococcal pyrogenic exotoxin A2.

Molecular genetic approaches that differentiate isolates of a pathogenic
microbial species have revolutionized contemporary epidemiologic
investigations of putative disease outbreaks. The human gram-positive
bacterium group A Streptococcus (GAS) has more than 80 M-protein serotypes,
but isolates expressing the M1 serotype are disproportionately represented
among invasive disease episodes in most case series (1). M1 organisms also
commonly cause pharyngitis. For reasons that are unknown, M1 isolates and
organisms expressing other M serologic types can undergo rapid temporal
variation in disease frequency and severity (1). Serotype M1 isolates have
been studied by several molecular typing approaches, including multilocus
enzyme electrophoresis; pulsed-field gel electrophoresis; rRNA gene
polymorphism typing (ribotyping); random amplified polymorphic DNA
analysis; and sequencing of the genes encoding streptokinase, C5a
peptidase, M protein, hyaluronidase, and pyrogenic exotoxin A, B, and C
(1-5). The common theme of these analyses is that most M1 isolates cultured
from patients with invasive disease episodes are closely allied in overall
chromosomal relationship as a consequence of sharing a recent common
ancestor (1,3,5). Lack of readily detectable chromosomal variation has
limited insights on the molecular origin of new virulent strains, velocity
of strain spread in human populations, and association of genetic subtypes
with certain clinical syndromes, including necrotizing fasciitis and acute
rheumatic fever.

Recently, Akesson et al. (6) identified a GAS extracellular protein made by
M1 strains that inhibits human complement. This streptococcal inhibitor of
complement (Sic) protein is incorporated into the membrane-attack complex
(C5b-C9) and inhibits target cell lysis by an undetermined mechanism.
Analysis of molecular diversity among 16 M1 GAS isolates from patients with
pharyngitis identified seven alleles of the sic gene (7). The high level of
sic polymorphism was unanticipated, given that other methods of molecular
analysis had failed to identify substantial variation among M1 isolates
(1-5). Subsequently, Stockbauer et al. (8) analyzed 165 M1 isolates from
diverse localities, identified 62 alleles, and documented a uniquely high
level of allelic variation in this gene. The molecular features of sic
variation indicated that structural change in Sic is mediated by natural
selection (8). Moreover, study of 70 M1 isolates from two temporally
distinct epidemics of streptococcal infections in the former East Germany
suggested that variation in sic contributed to fluctuations in GAS disease
frequency and severity (8).

The observation that the polymorphism in the sic gene greatly exceeded that
for all other genes examined in serotype M1 isolates suggested that sic
sequencing could be used as a rapid strategy to differentiate organisms
thought to be epidemiologically linked. A recent statistically significant
increase in cases of invasive GAS in Texas presented an opportunity to test
this hypothesis. We also tested whether molecular variation in a region of
the chromosome with multiple direct repeat (DR) nucleotide sequences and
restriction fragment length polymorphism (RFLP) analysis with insertion
elements IS1548 (9) and IS1562 (10) would differentiate M1 isolates.

Brief Overview of the GAS Epidemiology

Statistics gathered by the Texas Department of Health indicated that from
December 1, 1997, through March 5, 1998, 117 invasive episodes of GAS (and
26 deaths) had occurred statewide. Sixty of these cases and 14 deaths were
in central Texas (population 1.4 million). Concern was raised by community
physicians, lay individuals, and the media that an unusually virulent
strain was causing a disease outbreak. (A complete description of the
epidemiology of this outbreak will be presented elsewhere.) For molecular
analysis of the GAS causing recent cases, 100 isolates were sent to the
laboratory of J.M.M. at Baylor College of Medicine, Houston, TX. On
receipt, the bacteria were checked for purity by visual inspection and were
confirmed to contain beta-hemolytic organisms with a colony morphology
consistent with GAS. Chromosomal DNA was isolated as described (5).

Sequence Analysis of emm

To determine whether one or a few unusually virulent strains might account
for most of the invasive episodes, we sequenced the hypervariable part of
the emm gene encoding M-type specificity (5,11). After the sequence data
were edited electronically, they were used to search an emm database
maintained in the laboratory that contains at least one sequence of all
known M-protein serotypes and provisional serotypes (11). The database also
contains 33 emm1 allelic variants identified among serotype M1 organisms
from global sources (1,5,12) (Figure 1).

The most common M type identified    
was M1 (n = 30 isolates) (Table).
Five emm1 alleles were identified in 
the 30 M1 isolates, including four   		[fig]
(emm1.13, emm1.18, emm1.19, and      Figure 1. Alignment of inferred
emm1.24) not previously described    N-terminal amino acid sequences of 33
(Figure 1). Twenty-three Texas       alleles of emm1. The region shown
isolates had allele emm1.0, the most represents amino acids 27 through 110
common emm1 allele in M1 isolates    (GenBank accession number X07860).
globally (5). Three isolates had     Six of the emm1 alleles were
allele emm1.19, two organisms had    identified in this study, several
allele emm1.24, and one isolate each were described previously (1,5,12),
had allele emm1.13 and emm1.18       and others were from ongoing analysis
(Table). Compared with the emm1.0    of emm1 in M1 strains from global
allele encoding variant M1.0, each   sources. Amino acid residues
of these alleles is characterized by identical to those encoded by emm1.0
single nucleotide changes resulting  are represented by periods.
in single amino acid substitutions
in the resulting M1 protein (Figure 1). The additional 70 isolates were a
heterogeneous array of M types, including M3, M4, M5, M6, M12, M18, and
many others. A more detailed description of the bacteriologic features will
be presented elsewhere.

Analysis of speA Encoding Pyrogenic Exotoxin A

Because M1 isolates were a prominent cause of the invasive disease
episodes, we sought to determine the extent of genotypic heterogeneity
among the 30 M1 GAS isolates. First, polymerase chain reaction (PCR) was
used to test whether the organisms possessed the speA gene encoding
pyrogenic exotoxin A (scarlet fever toxin) (3,13). Most contemporary M1
isolates cultured from patients with invasive disease have this gene
(1,3-5), but some lack it because speA is bacteriophage encoded (13).
Possession of speA is therefore a variable trait among M1 organisms. All 30
M1 isolates had the speA gene, and sequence analysis of 11 random isolates
found that all had allele speA2 (14). Previous study of the speA gene in
several hundred contemporary M1 strains showed that all organisms had the
speA2 allele (1,14).

 Table. Characteristics of serotype M1 Group A Streptococcus
 isolates analyzed
 -----------------------------------------------------------------
                                   DR(sup c)
                                           DR se-
 MGAS                sic    emm1    PCR    quence  speA   IS1548
 no.     TDH no.    allele allele   (sup d) type    PCR    type
 (sup a) (sup b)                    (bp)            (sup e)
 -----------------------------------------------------------------
 6151    BE8-776     1.01   1.0      372     4.0     pos    1.0
 6168    BE-98-743   1.01   1.0      306     3.0     pos    1.0
 6184    BE8-873     1.01   1.0      306     NS(supf)pos    1.0
 6199    BE8-917     1.01   1.19     306     NS      pos    1.0
 6262    BE8-1085    1.01   1.19     306     NS      pos    1.0
 6264    BE8-1087    1.01   1.19     306     3.0     pos    1.0
 6181    BE-98-764   1.02   1.0      240     NS      pos    1.0
 6293    BE8-1339    1.02   1.0      306     NS      pos    1.0
 6294    BE8-1340    1.02   1.0      306     NS      pos    1.4
 6140    BE8-629     1.13   1.0      240     NS      pos    1.0
 6200    BE8-918     1.13   1.0      240     NS      pos    1.0
 6201    BE8-919     1.13   1.0      240     NS      pos    1.0
 6281    BE8-1149    1.13   1.24     306     NS      pos    1.3
 6137    BE8-563     1.32   1.0      306     3.0     pos    1.0
 6148    BE8-773     1.32   1.0      306     NS      pos    1.0
 6249    BE8-929     1.32   1.0      306     NS      pos    1.0
 6172    BE-98-751   1.34   1.0      306     NS      pos    1.0
 5997    BE8-191     1.36   1.0      240     NS      pos    1.0
 6135    BE8-548     1.36   1.0      240     2.2     pos    1.0
 6254    BE8-1021    1.36   1.24     306     NS      pos    1.0
 6189    BE8-88      1.66   1.13     306     NS      pos    1.0
 5999    BE8-208     1.99   1.0      306     3.0     pos    1.0
 6003    BE8-322     1.100  1.0      240     NS      pos    1.0
 6251    BE8-1000    1.100  1.0      240     2.1     pos    1.0
 6006    BE8-369     1.101  1.0      306     3.0     pos    1.0
 6138    BE8-566     1.118  1.0      240     2.2     pos    1.0
 6150    BE8-775     1.119  1.0      306     3.0     pos    1.0
 6154    BE8-792     1.120  1.18     240     2.1     pos    1.0
 6272    BE8-1111    1.179  1.0      306     NS      pos    1.0
 6299    BE8-1380    1.180  1.0      240     2.0     pos    1.0
 2221    NA          1.01   1.0      306     NS      pos    1.0
 5305    NA          1.01   1.0      306     3.0     pos    1.0
 5809    NA          1.01   1.0      305     3.01    pos    1.0
 2139    NA          1.02   1.0      306     3.0     pos    1.0
 2350    NA          1.09   1.0      306     3.0     pos    1.0
 1272    NA          1.35   1.0      306     NS      pos    1.5
 5297    NA          1.121  1.0      240     2.0     pos    1.0
 279     NA          1.08   1.3      570     7.0     neg    1.6
 1632    NA          1.08   1.3      570     7.0     neg    1.6
 1653    NA          1.19   1.3      570     7.0     neg    1.6
 326     NA          1.20   1.3      570     7.0     neg    1.6
 570     NA          1.21   1.3      570     7.0     neg    1.8
 1642    NA          1.24   1.3      504     6.1     neg    1.6
 6708    NA          1.225  1.6      504     6.0     neg    1.7
 (sup g)
 -----------------------------------------------------------------
 (sup a)MGAS, Musser group A Streptococcus strain number. All
 isolates had no known direct epidemiologic connection
 except MGAS 6199, 6264, and 6272 (associated household
 cases); MGAS 6140, 6200, and 6201 (blood and cerebrospinal
 fluid cultures of same patient); and MGAS 6293 and 6294
 (mother-neonate paired isolates).
 (sup b)TDH, Texas Department of Health strain number; NA, not
 applicable (control isolate).
 (sup c)DR, direct repeat.
 (sup d)PCR, polymerase chain reaction.
 (sup e)pos, PCR-positive for speA; neg, PCR-negative for speA.
 The speA gene in MGAS 1272, 6135, 6137, 6138, 6150, 6151,
 6154, 6168, 6251, 6264, 6272, and 6299 was sequenced and
 identified as allele speA2.
 (sup f)NS, not sequenced.
 (sup g)MGAS 6708 is also known as SF370. The genome of this
 organism is being sequenced at the University of Oklahoma.

Sequence Analysis of sic

Recent molecular genetic studies have documented that sic is a uniquely
hypervariable gene among M1 GAS strains (7,8). Our sic database consists of
252 distinct alleles identified by sequence analysis of ~1,200 M1 isolates
from worldwide sources and cultured from patients with a large array of GAS
diseases, including pharyngitis and invasive episodes (7;8; unpub. data).
sic allelic variation has not been identified during in vitro laboratory
passage, nor has variation been detected among strains that are
epidemiologically associated (8). These molecular features suggest that
automated sequencing of sic may be a convenient method for identifying M1
genetic subtypes and inferring epidemiologic relationships in potential
outbreaks. To test this idea, we sequenced the sic gene in the 30 M1
isolates and identified 15 sic alleles that differed from one another by at
least one nucleotide (Figure 2). Seven of the 15 alleles were not found
among the ~1,200 M1 isolates previously characterized for sic variation.
Eight new nucleotide substitutions were identified in eight codons, and one
codon had a new dinucleotide change; these changes would result in nine
amino acid substitutions in the expressed Sic proteins. As observed in
earlier analyses (7,8), the amino-terminal half of the Sic protein had many
insertions and deletions, all in frame (Figure 2).

RFLP Analysis with Insertion Sequences IS1548 and IS1562

IS1548, a recently described
insertion sequence, has been reported
to be polymorphic in copy number and
location in the chromosome of group A 
and group B streptococci (9). IS1562  
is an insertion sequence located in   		[fig]
the Mga regulon between the sic gene  Figure 2. Variation in the sic gene
and scpA gene encoding C5a peptidase  and Sic protein identified in M1
in some GAS (10). Relatively few GAS  group A Streptococcus isolates
strains have been analyzed by RFLP    characterized in the study. The
profiling with these elements, and    figure is a compilation of
their ability to differentiate among  variations found in the 15 distinct
isolates expressing the same M type   sic alleles in the sample. The
has not been assessed. Since          numbers at the top of the figure
insertion sequence profiling has      refer to the nucleotide sequence
helped elucidate transmission         position of a sic allele described
dynamics and evolutionary             in reference 6. Single-letter amino
relationships of Mycobacterium        acid abbreviations are used. SRR,
tuberculosis (15), Bordetella         amino-terminal short repeat region;
pertussis (16), Streptococcus         Roman numeral, short repeats I-V
pneumoniae (17), Escherichia coli     which recur in SRR; R2 and R3,
(18), and Salmonella Enteritidis      tandem repeats; MGAS strain, Musser
(19), we tested the hypothesis that   Group A Streptococcus strain number;
IS1548 or IS1562 subtyping would      X, presence of polymorphism.
provide additional epidemiologically
informative data regarding genetic
diversity among M1 isolates.

To determine whether the IS1548 element was present in M1 organisms
in our sample, PCR was performed on genomic DNA from 10 random isolates 
by using the oligonucleotides (forward) 5'-TGCCGTTCATCAACTGATTTCAGTGG-3' 
and (reverse 5'-CGACGATAACTGAGGTCTTTTTT AGGAAAT-3'(9). A PCR product 
of the anticipated size of ~1 kb was obtained from all organisms, a 
result indicating that the isolates had this element or a close relative. 
The PCR-amplified fragment was subsequently used as a
probe for RFLP analysis by Southern blotting after EcoNI digestion and
electrophoretic separation of chromosomal DNA fragments. The data were
analyzed with a Bioimage Analyzer system interfaced with a Sun
Sparcstation. Four M1 isolates had the same 6-band IS1548 RFLP pattern,
which was distinct from the 3-band pattern obtained from three random
serotype M3 isolates (Figure 3A). Twenty-eight of the 30 M1 isolates
studied had the same IS1548 pattern (Figure 3B and data not shown). The
IS1548 RFLP patterns of the two other isolates were single-band variants of
the common M1 pattern, both characterized by the addition of one
hybridizing band (Figure 3B). One of the isolates (MGAS 6294) with a
variant IS1548 pattern was recovered from the blood of a neonate born to a
woman with GAS sepsis. The isolate (MGAS 6293) from the blood of the
infected mother had the common IS1548 pattern.

						To identify other IS1548 RFLP patterns
                                    in M1 GAS organisms, we analyzed 14
 	                           	non-Texas control isolates. These 14 M1
 	                              isolates were selected for analysis
  	[fig]                         because they have been well [fig]
 Figure 3. Representative           characterized by several molecular
 IS1548 RFLP fingerprint            techniques (5). The isolates also have
 patterns of M1 isolates.           many different sic alleles and include
 Panel A is a lane map showing      representatives of two major genetic
 results from analysis of           subclones of M1 organisms (5). IS1548
 three serotype M3 control          profiling of this group identified the
 isolates and four M1 isolates      common six-band pattern and also found
 with different sic alleles.        five organisms with a distinct subtype
 Lane 1, MGAS5892; lane 2,          with four bands (Figure 3C). All
 MGAS6004; lane 3, MGAS6005;        organisms with this profile were
 lane 4, MGAS5997; lane 5,          speA-negative. Interestingly, MGAS6708
 MGAS5999; lane 6, MGAS6003;        (SF370), the M1 strain whose genome is
 lane 7, MGAS6006. kb, 1-kb         being sequenced (20), had a unique
 DNA ladder. Panel B is a lane      five-band IS1548 fingerprint (Figure
 map showing results from           3C). The IS1548 profile for this strain
 analysis of eleven M1              was very similar to the four-copy
 isolates with eight different      pattern characteristic of most of the
 sic alleles. Lane 1,               speA negative organisms.
 MGAS6201; lane 2, MGAS6249;
 lane 3, MGAS6251; lane 4,          We next used PCR to determine whether
 MGAS6254; lane 5, MGAS6262;        IS1562 was present in the 30 M1
 lane 6, MGAS6264; lane 7,          organisms from Texas and in 11 of the
 MGAS6272; lane 8, MGAS6281;        14 non-Texas isolates by using
 lane 9, MGAS6293; lane 10,         oligonucleotide primers 3244 and 3267,
 MGAS6294; lane 11, MGAS6299.       as described by Berge et al. (10). A
 kb, 1-kb DNA ladder. Panel C       PCR product of the expected size of ~1
 is a lane map showing results      kb was obtained from all isolates. The
 from analysis of four              ~1-kb fragment was used to reprobe the
 speA-positive and seven            nylon membranes used for IS1548 RFLP
 speA-negative M1 isolates.         analysis. The results showed that all
 Lane 1, MGAS2350, lane 2,          M1 isolates tested had the identical or
 MGAS2221, lane 3, MGAS2139,        closely similar RFLP characterized by
 lane 4, MGAS1272, lane 5,          one copy of IS1562 (data not shown).
 MGAS6708, lane 6, MGAS1653,
 lane 7, MGAS1642, lane 8,          
 MGAS1632, lane 9, MGAS570,         
 lane 10, MGAS326, lane 11,         
 MGAS279. kb, 1-kb DNA ladder.

PCR and Sequence Analysis of a Polymorphic Direct Repeat (DR) Chromosomal Region

Several years ago Groenen et al. (21) characterized an unusual region of 
the M. tuberculosis chromosome that contains up to approximately 40 copies 
of a 36-bp DR sequence interspersed with unique-sequence spacer regions 35 bp 
to 41 bp in length. Subsequent analysis of this DR region in hundreds of M. 
tuberculosis isolates by a method referred to as spacer oligotyping 
(spoligotyping) has identified large numbers of distinct subtypes of this 
pathogen (22), indicating that the DR region is highly polymorphic, even 
among isolates closely related in overall chromosomal character (23). We 
examined the M1 GAS genome database maintained by the University of Oklahoma 
Advanced Center for Genome Technology and identified a region of the GAS 
chromosome located on contig 208 (database as of February 22, 1999) that 
consists of seven DR elements separated by six unique 30-bp spacer regions. 
This area of the M1 chromosome is referred to as a DR region on the basis of 
its shared structural features with the M. tuberculosis DR region.

To test the hypothesis that the DR region is polymorphic among M1 GAS
isolates, we analyzed the 14 control isolates by PCR with primers that
flank this region (DR003, 5'-GGGCTTTTCAAGACTGAAGTCTAGCTG-3' and DR004,
5'-TCCGACTGCTGGTATTAACCCTC TT-3'). Four sizes of PCR products were
identified (data not shown). Six of seven isolates previously identified as
RFLP type 1a (speA-positive, containing allele emm1.0) had an apparently
identical size PCR product of ~300 bp. A PCR product of ~240 bp was
identified in the remaining isolate. Two sizes of PCR products (~500 bp and
~570 bp) were also identified in the six organisms with RFLP type 1k
(speA-negative, allele emm1.3). Hence, the PCR results indicated that size
variation was present in the GAS DR region in M1 organisms and showed that
isolates of the RFLP types 1a and 1k categories did not share PCR fragment
sizes.

To examine nucleotide variation in this chromosomal region, we sequenced
the PCR products obtained from 12 of these control M1 isolates, including 5
with the ~240-bp or ~300-bp PCR product and 7 organisms with either the
~500-bp or ~570-bp PCR product. The one organism with the ~240-bp PCR
product, characterized by two identical DR elements and two nonidentical
spacer sequences, is arbitrarily designated DR type 2.0 (Figure 4). Three
of the four organisms with the ~300-bp PCR product had identical DR-region
sequences defined by the presence of three identical DR elements and three
nonidentical spacer sequences (Figure 4B). This molecular arrangement was
designated DR type 3.0 (Figure 4C). The DR element of the fourth isolate
differed from the other three by the absence of 1 base in the second spacer
region and is designated DR type 3.01 (Figure 4C). Consistent with the
difference in PCR fragment size, the sequences of the DR region in the
seven other organisms were distinct from the DR type 3.0 sequence. Five of
these seven isolates had an identical DR-region sequence that was
characterized by seven spacer regions (designated DR type 7.0). Two
organisms lacked one of the spacer regions present in the DR type 7.0
strains; these molecular variants were designated DR types 6.0 and 6.1
(Figure 4C).

We next analyzed the 30 M1 Texas      
isolates by PCR of the DR region and
obtained three PCR fragment sizes:    
products of ~240 bp (n = 11		  		[fig]
isolates), ~300 bp (n = 18 isolates), Figure 4. Polymorphism identified in
and ~370 bp (n = 1 isolate). We       the direct repeat (DR) region of
sequenced the PCR products from 12    serotype M1 group A Streptococcus.
organisms selected to represent an    The data were generated by automated
array of DR PCR fragment sizes and    DNA sequencing of polymerase chain
emm and sic alleles. Two additional   reaction products obtained with the
sequences (designated DR types 2.1    oligonucleotide primers DR003 and
and 2.2) were identified among the    DR004 described in the text. (A) The
five isolates with the DR region PCR  36-bp sequences of the two related
fragment size of ~240 bp. All six     DR and DR' elements. Multiple copies
isolates with the ~300-bp PCR product of the DR element present in
had the identical sequence (DR type   different M1 isolates all had the
3.0). The one isolate with the        identical sequence. (B) The 29-bp or
~370-bp PCR product had a unique      30-bp sequences of the 10 distinct
sequence (DR type 4.0) with four      spacer regions identified in the
spacer regions (Figure 4). The        analysis. (C) Arrangement of the DR
results showed that the DR region had elements and spacer sequences in
more molecular variation than emm.    nine distinct DR allelic variants.
However, the level of allelic         The DR types were given arbitrary
variation in sic exceeded that found  designations based in part on the
in either emm or the DR region.       number of DR elements present. Open
                                      or cross-hatched rectangles
Conclusions                           represent copies of the DR or DR'
                                      elements; arrows represent copies of
Our data underscore the importance of the spacer region sequences
molecular typing techniques in        connecting the DR elements. The
rapidly providing information about   numbers above the spacer region
the epidemiology of GAS infections    sequences refer to the spacers
(24). The emm sequence data indicated designated in part B of the figure.
that a heterogeneous array of GAS M
types was present in the sample of 100 GAS isolates; thus, we could rapidly
rule out the notion that the invasive cases had been caused by one or a few
distinct GAS strains. Moreover, molecular analysis of several other
polymorphic loci, including automated DNA sequencing of sic and a
chromosomal region with multiple DR sequences, showed that M1 organisms,
the most abundant serotype in the sample, had substantial levels of genetic
diversity. Of the molecular techniques used in this analysis, sequencing
the sic gene was the most effective for differentiating among M1 isolates
because it identified the most variants. RFLP-based typing with IS1548 and
IS1562 failed to provide extensive, or even adequate, resolving power among
the M1 organisms for epidemiologic purposes. Moreover, the variation in the
IS1548 RFLP profile we detected in two isolates (MGAS 6293 and MGAS 6294)
from a woman with puerperal sepsis and the blood of her newborn child
suggests that IS1548 can be mobile in host-pathogen interactions.
Instability in insertion sequence profiles has also been reported for
IS6110, an element commonly used for molecular subtyping of M. tuberculosis
(25).

Although sequence analysis of emm and the DR region provided some useful
molecular subtyping data for M1 strains, the level of polymorphism at these
loci was less than in sic. A rapid PCR-based subtyping system to index
polymorphism in the DR region could be formulated for M1 GAS that would be
similar to the method available for M. tuberculosis. However, this approach
would be less useful for M1 GAS than M. tuberculosis because in the latter
organism 43 distinct spacer regions have been described. Hence, the number
of polymorphic markers is considerably greater than in M1 GAS, in which
thus far only 13 spacer regions have been found (unpub. data).

Our work, recently reported results (7,8), and unpublished data obtained
from ongoing analysis of sic polymorphism in large samples obtained from
population-based studies demonstrate four emerging themes in the molecular
epidemiology and evolutionary biology of M1 organisms. First, several sic
variants are dispersed over broad geographic areas; some have achieved
intercontinental distribution. For example, M1 strains with the sic1.01
allele have been identified in 14 countries. This allele might be widely
disseminated because it is the ancestral condition in M1 organisms or
otherwise has had a long-standing association with the M1 serotype. Another
plausible hypothesis to explain its widespread dissemination is that
expression of Sic1.01 protein bestows greater fitness than do other Sic
variants. A third possibility is that the Sic1.01 variant marks an M1
subclone with an unusual propensity to survive and spread. In this regard,
we note that virtually all isolates with the sic1.01 allele are
speA-positive. GAS isolates with the speA gene are statistically
overrepresented among organisms recovered from children with pharyngitis
who have not been cured by oral antibiotic therapy (26). Bacterial survival
despite appropriate antibiotic therapy would likely enhance spread of the
organism to new hosts and, hence, assist widespread dispersal. We also note
that speA-positive M1 isolates are internalized efficiently by human
respiratory tract epithelial cells grown in culture (27,28), a process that
could provide access to a protective niche that enhances survival
capability.

A second important theme is that many sic alleles are confined to local
geographic areas (e.g., individual countries or communities). For example,
seven of the sic alleles identified in this study were unique to the Texas
M1 isolates. Several unique sic alleles also were found among organisms
cultured from patients in Mexico (7) and the former East Germany (8).
Because many sic alleles can be readily linked with one another by a single
molecular event such as a nucleotide substitution or one insertion or
deletion, some of the variants likely arise rapidly in local areas. Their
absence in other regions is explained by lack of sufficient elapsed time
required for widespread dispersal. Recent data obtained from study of M1
isolates recovered from population-based surveys in Finland (29), Ontario,
Canada (30), and Atlanta, Georgia (31) strongly support this explanation
(unpub. data).

The third theme is the remarkable polymorphism in the sic gene. Stockbauer
et al. (8) reported that virtually all changes in the sic gene result in
structural changes in the Sic protein and concluded that positive Darwinian
selection is mediating Sic variation. Our study confirmed these
observations. For example, all 10 new nucleotide changes identified would
result in amino acid substitutions in Sic, and all insertions and deletions
were in frame. Moreover, most of the amino acid changes were radical
replacements, that is, those producing charge changes or polar-nonpolar
substitutions. These types of amino acid replacements commonly result in
functional differences in the resulting proteins and are a hallmark of
positive selection (32).

Last, accumulating data suggest the existence of two genetically divergent
M1 subpopulations, which can be thought of as two evolutionarily distinct
lineages. Our study found that organisms with the speA gene and chromosomal
PFGE type 1a (5) have shorter DR-region sequences and an IS1548 profile
characterized by six hybridizing bands. In contrast, organisms that are
speA-negative usually have PFGE type 1k (5), longer DR sequences, and an
IS1548 fingerprint with four bands. In addition, we will show elsewhere
that the two M1 lineages each have distinct families of sic alleles.
Together, the data indicate that sufficient time has elapsed since a shared
common ancestor for members of the two lineages to have diverged at many
chromosomal loci. The data also indicate that transduction of the speA2
allele between members of the two lineages is apparently rare in natural
populations of GAS (5,14). As more comparative analyses are conducted,
additional genetic differences will probably be identified between isolates
of the two lineages.

In summary, automated sequence analysis of sic and a region of the
chromosome with DR sequences permitted rapid and unambiguous
differentiation among serotype M1 isolates during a period of a significant
increase in the number of invasive disease cases. Genetic analysis of these
polymorphic markers permitted us to rapidly rule out the idea that a single
unusually virulent strain of M1 GAS was responsible. The subtyping methods
described in this work will assist other outbreak investigations and
studies designed to understand the molecular basis of temporal variation in
disease frequency and severity of infections caused by M1 GAS isolates.

---------------------------------------------------------------------------

Acknowledgments

      We thank C. Stager, S. Rossman, K. Krause, and C. Baker for
generously providing strains.

      This work was supported by Public Health Service Grant AI-33119 to
J.M.M.

     Dr. Hoe is a research associate in the Institute for the Study of
Human Bacterial Pathogenesis, Baylor College of Medicine. Her main
interests are in the areas of molecular epidemiology and bacterial
pathogenesis.

      Address for correspondence: James M. Musser, Institute for the Study
of Human Bacterial Pathogenesis, Department of Pathology, Baylor College of
Medicine, One Baylor Plaza, Houston, Texas 77030, USA; fax: 713-798-4595;
e-mail: jmusser@bcm.tmc.edu.

References

  1. Musser JM, Krause RM. The revival of group A streptococcal diseases,
     with a commentary on staphylococcal toxic shock syndrome. In: Krause
     RM, editor. Emerging infections. San Diego: Academic Press; 1998. p.
     185-218.
  2. Martin DR, Single LA. Molecular epidemiology of group A streptococcus
     M type 1 infections. J Infect Dis 1993;167:1112-7.
  3. Musser JM, Hauser JM, Kim MH, Schlievert PM, Nelson K, Selander RK.
     Streptococcus pyogenes causing toxic-shock-like syndrome and other
     invasive diseases: clonal diversity and pyrogenic exotoxin expression.
     Proc Natl Acad Sci U S A 1991;88:2668-72.
  4. Norgren M, Norrby A, Holm SE. Genetic diversity in T1M1 group A
     streptococci in relation to clinical outcome of infection. J Infect
     Dis 1992;166:1014-20.
  5. Musser JM, Kapur V, Szeto J, Pan X, Swanson DS, Martin DR. Genetic
     diversity and relationships among Streptococcus pyogenes strains
     expressing serotype M1 protein: recent intercontinental spread of a
     subclone causing episodes of invasive disease. Infect Immun
     1995;63:994-1003.
  6. Akesson P, Sjoholm AG, Bjorck L. Protein SIC, a novel extracellular
     protein of Streptococcus pyogenes interfering with complement
     function. J Biol Chem 1996;271:1081-8.
  7. Perea Mejia LM, Stockbauer KE, Pan X, Cravioto A, Musser JM.
     Characterization of group A Streptococcus strains recovered from
     Mexican children with pharyngitis by automated DNA sequencing of
     virulence-related genes: unexpectedly large variation in the gene
     (sic) encoding a complement inhibiting protein. J Clin Microbiol
     1997;35:3220-4.
  8. Stockbauer KE, Grigsby D, Pan X, Fu Y-X, Perea Mejia LM, Cravioto A,
     et al. Hypervariability generated by natural selection in an
     extracellular complement-inhibiting protein of serotype M1 strains of
     group A Streptococcus. Proc Natl Acad Sci U S A 1998;95:3128-33.
  9. Granlund M, Oberg L, Sellin M, Norgren M. Identification of a novel
     insertion element, IS1548, in group B streptococci, predominantly in
     strains causing endocarditis. J Infect Dis 1998;177:967-76.
 10. Berge A, Rasmussen M, Bjorck L. Identification of an insertion
     sequence located in a region encoding virulence factors of
     Streptococcus pyogenes. Infect Immun 1998;66:3449-53.
 11. Whatmore AM, Kapur V, Sullivan DJ, Musser JM, Kehoe MA. Non-congruent
     relationships between variation in emm gene sequences and the
     population genetic structure of group A streptococci. Mol Microbiol
     1994;14:619-31.
 12. Harbaugh MP, Podbielski A, Hugl S, Cleary PP. Nucleotide substitutions
     and small-scale insertion produce size and antigenic variation in
     group A streptococcal M1 protein. Mol Microbiol 1993;8:981-91.
 13. Johnson LP, Schlievert PM. Group A streptococcal phage T12 carries the
     structural gene for pyrogenic exotoxin type A. Mol Gen Genet
     1984;194:52-6.
 14. Musser JM, Kapur V, Kanjilal S, Shah U, Musher DM, Barg NL, et al.
     Geographic and temporal distribution and molecular characterization of
     two highly pathogenic clones of Streptococcus pyogenes expressing
     allelic variants of pyrogenic exotoxin A (scarlet fever toxin). J
     Infect Dis 1993;167:337-46.
 15. Alland D, Kalkut GE, Moss AR, McAdam RA, Hahn JA, Bosworth W, et al.
     Transmission of tuberculosis in New York City. An analysis by DNA
     fingerprinting and conventional epidemiologic methods. N Engl J Med
     1994;330:1710-6.
 16. van der Zee A, Mooi F, van Embden J, Musser J. Molecular evolution and
     host adaptation in Bordetella spp.: phylogenetic analysis using
     multilocus enzyme electrophoresis and typing with three insertion
     sequences. J Bacteriol 1997;179:6609-17.
 17. Robinson DA, Hollingshead SK, Musser JM, Parkinson AJ, Briles DE,
     Crain MJ. The IS1167 insertion sequence is a phylogenetically
     informative marker among isolates of serotype 6B Streptococcus
     pneumoniae. J Mol Evol 1998;47:222-9.
 18. Lawrence JG, Dykhuizen DE, DuBose RF, Hartl DL. Phylogenetic analysis
     using insertion sequence fingerprinting in Escherichia coli. Mol Biol
     Evol 1989;6:1-14.
 19. Stanley J, Jones CS, Threlfall EJ. Evolutionary lines among Salmonella
     enteritidis phage types are identified by insertion sequence IS200
     distribution. FEMS Microbiol Lett 1991;66:83-9.
 20. Suvorov A, Ferretti J. Physical and genetic chromosomal map of an M
     type 1 strain of Streptococcus pyogenes. J Bacteriol 1996;178:5546-9.
 21. Groenen PMA, Bunschoten AE, van Soolingen D, van Embden JDA. Nature of
     DNA polymorphism in the direct repeat cluster of Mycobacterium
     tuberculosis; application for strain differentiation by a novel typing
     method. Mol Microbiol 1993;10:1057-65.
 22. Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D,
     Kuijper S, et al. Simultaneous detection and strain differentiation of
     Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin
     Microbiol 1997;35:907-14.
 23. Sreevatsan S, Pan X, Stockbauer KE, Connell ND, Kreiswirth BN, Whittam
     TS, et al. Restricted structural gene polymorphism in the
     Mycobacterium tuberculosis complex indicates evolutionarily recent
     global dissemination. Proc Natl Acad Sci U S A 1997;94:9869-74.
 24. Musser JM, Kapur V, Peters JE, Hendrix CW, Drehner D, Gackstetter GD,
     et al. Real-time molecular epidemiologic analysis of an outbreak of
     Streptococcus pyogenes invasive disease in US Air Force trainees. Arch
     Pathol Lab Med 1994;118:128-33.
 25. Yeh RW, Ponce de Leon A, Agasino CB, Hahn JA, Daley CL, Hopewell PC,
     et al. Stability of Mycobacterium tuberculosis DNA genotypes. J Infect
     Dis 1998;177:1107-11.
 26. Musser JM, Gray BM, Schlievert PM, Pichichero ME. Streptococcus
     pyogenes pharyngitis: characterization of strains by multilocus enzyme
     genotype, M and T protein serotype, and pyrogenic exotoxin gene
     probing. J Clin Microbiol 1992;30:600-3.
 27. LaPenta D, Rubens C, Chi E, Cleary PP. Group A streptococci
     efficiently invade human respiratory epithelial cells. Proc Natl Acad
     Sci U S A 1994;91:12115-9.
 28. Cleary PP, McLandsborough L, Ikeda L, Cue D, Krawczak J, Lam H.
     High-frequency intracellular infection and erythrogenic toxin A
     expression undergo phase variation in M1 group A streptococci. Mol
     Microbiol 1998;28:157-67.
 29. Muotiala A, Seppala H, Huovinen P, Vuopio-Varkila J. Molecular
     comparison of group A streptococci of T1M1 serotype from invasive and
     noninvasive infections in Finland. J Infect Dis 1997;175:392-9.
 30. Davies DD, McGeer A, Schwartz B, Green K, Cann D, Simor AE, et al.
     Invasive group A streptococcal infections in Ontario, Canada. N Engl J
     Med 1996;335:547-53.
 31. Zurawski CA, Bardsley MS, Beall B, Elliott JA, Facklam R, Schwartz B,
     et al. Invasive group A streptococcal disease in metropolitan Atlanta:
     a population-based assessment. Clin Infect Dis 1998;27:150-7.
 32. Hughes MK, Hughes AL. Natural selection on Plasmodium surface
     proteins. Mol Biochem Parasitol 1995;71:99-113.

Emerging Infectious Diseases
National Center for Infectious Diseases
Centers for Disease Control and Prevention
Atlanta, GA

URL: ftp://ftp.cdc.gov/pub/EID/vol5no2/ascii/hoe.txt

Please note that figures and equations are not available in ASCII format; 
their placement within the text is noted by [fig] and [eq], respectively. 
Greek symbols are spelled out. The following codes are used: 
(ft) for footnote; (sup) for superscript; (sub) for subscript; 
>/= for greater than or equal to.