SURVEY OF TRINUCLEOTIDE REPEATS IN THE HUMAN GENOME: ASSESSMENT OF THEIR UTILITY AS GENETIC MARKERS
Julie M. Gastier (1), Jacqueline C. Pulido (1, 2), Sara Sunden (3), Thomas Brody (1),
Kenneth H. Buetow (4), Jeffrey C. Murray (5), James L. Weber (6), Thomas J. Hudson (7),
Val C. Sheffield (3), Geoffrey M. Duyk (1, 2)*
(1) Department of Genetics, (2) Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115;
(3) Department of Pediatrics, University of Iowa, Iowa City, IA 52242;
(4) Fox Chase Cancer Research Center, Philadelphia PA, 19111;
(5) Departments of Pediatrics and Biology, University of Iowa, Iowa City, IA 52245;
(6) Marshfield Medical Research Foundation, Marshfield, WI, 54449;
(7) Center for Genome Research, Whitehead Institute/Massachusetts Institute of Technology, Cambridge, MA 02139
* To whom correspondence should be addressed at present address:
Millennium Pharmaceuticals Incorporated, 640 Memorial Drive, Cambridge, MA 02139
Phone (617) 374-9480 x218, FAX (617) 374-9379
ABSTRACT
Genetic markers based upon PCR amplification of short tandem repeat-containing sequence tagged sites
(STSs) have become the standard for genetic mapping. We have completed a survey based on the direct
isolation of representative members of each of the ten trinucleotide repeat classes to determine their
relative abundance, repeat size distribution, and general utility as genetic markers. Trinucleotide repeats,
depending on the repeat class, are 1 to 2 orders of magnitude less frequent than (AC)n repeats. The
average size of trinucleotide repeats sequenced was less than 15 repeat units in length, and only three
of the STSs developed for this study demonstrated more than 25 repeats units. The (AAT)n class of repeats
are the most abundant and also the most frequently polymorphic. Other classes of trinucleotide repeat
classes observed to be frequently polymorphic include (AAC)n, (ACT)n, (ATC)n, and (AAG)n; however, the
relative abundance of these classes is less than that observed for the (AAT)n class of repeats. Based
upon this initial survey, we have initiated saturation cloning of the (AAT)n class of repeats. At the
time of submission of this manuscript, we have developed, as part of the Cooperative Human Linkage Center
(CHLC), more than 415 new high heterozygosity (AAT)n genetic markers (>2 alleles in 4 individuals) and
200 new low heterozygosity (AAT)n STSs from this larger screening effort combined with the initial survey.
INTRODUCTION
High resolution genetic maps are important tools for the mapping of disease genes and serve
as frameworks for the construction of physical maps. In recent years, collaborative efforts have
provided genetic maps with an average resolution of 3 cM (1,2,3), and ongoing efforts will result
in the production of human genetic maps with an average resolution of <1 cM (4). The majority of the
available genetic markers are based on (AC)n simple sequence repeats. The (AC)n repeats are extremely
abundant and can be found on average once every 30-60 Kb (5-7). (AC)n repeats are generally polymorphic
if the repeat length is greater than ten (2,8).
In contrast to (AC)n repeats, the ten classes of trinucleotide repeats have not been
comprehensively surveyed to determine their utility as genetic markers, although a limited set of
genetic markers based on trinucleotide repeats have been published (1,9). Information regarding additional
sources for PCR-based genetic markers is extremely important, since completion of the current generation
of genetic maps will require the filling of gaps resulting from local clustering of (AC)n repeats. High
resolution genetic maps will be critical for methods such as allele association analysis of the genetic
components of complex traits. In addition, large numbers of markers and STSs will facilitate completion of physical maps.
There are several advantages to incorporating markers based on higher order short tandem repeats into genetic maps.
Genetic markers based upon trinucleotide and tetranucleotide repeats produce cleaner PCR amplification
products than dinucleotide repeats and are more readily co-amplified. In addition, the amplification
products derived from the trinucleotide and tetranucleotide classes of simple sequence repeats are more
suitable for detection by silver staining than (AC)n repeat markers (unpublished data). The increasing
use of trinucleotide and tetranucleotide repeats should therefore result in more rapid and accurate
genotyping. The availability of marker sets based upon combinations of di-, tri-, and tetranucleotide
repeat markers may increase the utility of hybridization based approaches for high throughput genotyping.
Multiplex sets could be assembled based upon size ranges of expected alleles as well as by class of simple
sequence repeats. Amplification products would then be detected following serial rounds of probing with
simple sequence specific probes. Beyond their role in genetic mapping, STSs based on trinucleotide
repeats may help define candidate disease loci based upon expansion of repeat sequences within defined
genes. To date, the expansion of a trinucleotide repeat is associated with at least 9 diseases and 4 fragile sites (10-23).
Previously, we have described a method for generating small insert genomic libraries which have
been highly enriched for clones containing short tandem repeats (24,25). We have used this method to survey
the ten classes of trinucleotide repeats in order to determine their usefulness as genetic markers. Starting
with at least fifty small insert human genomic clones for each repeat class, we have estimated the size
distribution of each type of repeat in the genome, the association with repetitive DNA elements, and the
polymorphic character of the repeat class. By screening a cosmid library with probes for each of the
trinucleotide repeats, we have been able to estimate the relative abundance of the trinucleotide repeats
in the human genome.
The information provided by this survey will help direct genome wide efforts towards generating high
resolution genetic linkage maps. For example, we have screened the most polymorphic and abundant of the
trinucleotide repeat classes, (AAT)n on a much larger scale than this initial survey to generate large
numbers of new markers for the Cooperative Human Linkage Center (4). In this paper, we present evidence
that the first 200 (AAT)n markers from that screen have similar polymorphic quality to the first markers
generated in the initial survey by analyzing the correlation between repeat length and heterozygosity
when typed on more individuals. In addition, data from the trinucleotide repeat survey will be useful to
other researchers searching for new markers near a disease locus in the production of local, very high
resolution genetic maps that have become essential for positional cloning strategies. Such markers,
whether or not they are highly polymorphic, contribute to the establishment of integrated, high quality
physical and genetic maps, improve the localization of recombinant breakpoints, and facilitate location
of genes by the detection of "ancestral recombinants" by linkage disequilibrium methods.
RESULTS
Relative Abundance of individual classes of trinucleotide repeats
The most useful classes of simple sequence repeats for developing large numbers of genetic
markers would be highly abundant in the human genome and would lead to the rapid and easy development
of highly robust polymorphic markers. Previous studies suggested that the trinucleotide containing simple
sequence repeats were less frequent than the dinucleotide repeats (26). In order to estimate the frequency
of each of the trinucleotide repeat classes, we screened an amplified human cosmid library with probes for
each type of repeat. Table 1
shows the frequency estimates of each of the ten classes of trinucleotide repeats.
Estimates were made by screening approximately 30,000 cosmids for each repeat class with a non-radioactive
screening system from Lifecodes, Inc. Frequency estimations are based on an average insert size for the
cosmids of 35 Kb, and the human genome was assumed to be 3000 Mb. Hybridization conditions were such that
positives include repeats of 5 units or greater, based on hybridization control clones. However, the most
useful estimates for trinucleotide repeats would include only those with 8 repeat units or greater.
Therefore, frequencies were adjusted based on the percentage of clones that when sequenced had repeats of
8 units or longer. Since the (ACG)n and (CCG)n clones were difficult to sequence (see below), the frequency
estimates for these classes could not be adjusted. It is critical to note that these frequencies are rough
estimates, under the conditions specified. It is difficult to make comparisons of such screens between
various laboratories because of inherent differences in experimental and hybridization procedures.
Distribution of repeat lengths
We sequenced at least 50 hybridization positive small insert clones for each repeat class,
and the profiles of the repeat lengths for the clones sequenced from each class is shown in Figure 1.
The data presented is based on unique clones where a repeat length of at least five was observed. Two
of the classes, (ACG)n and (CCG)n, proved to be very difficult to analyze. Clones harboring plasmids
containing these sequence repeats appeared to grow poorly, and only small quantities of template could be
obtained from our standard minipreps. Furthermore, most of the DNA sequences derived from these plasmids
proved to be of low quality, possibly related to the high G/C content of these repeat sequences as well as
flanking sequences. Since our aim was to determine which classes of repeats would be useful for rapidly
generating large numbers of informative markers, these two classes were excluded from further analyses.
Development of Sequence Tagged Sites (STSs)
Table 2 presents the
success and failure rates for the development of
STSs from each class of trinucleotide repeats. The minimum repeat length used for the design of primer
sets was n=7. Selection of this repeat length as a minimum cut off was based on the knowledge that most
(AC)n repeat of 10 units or greater (20 bp total) tend to be polymorphic (2,5). This initial survey of
trinucleotide simple sequence repeat lengths focused on the shortest observed allele size associated
with highly polymorphic trinucleotide repeat based STSs. From these initial clones, we developed
hybridization controls which served to calibrate
our screening conditions. i.e. Very few polymorphic STSs were developed from clones containing repeat lengths
less than 8 units. Large scale marker development has subsequently been restricted to only those only those
clones which have a hybridization signal equal to or greater than the (NNN)8 control.
In order to assess the ease of marker development as well as optimize the process, we sought to
characterize the common cause of failures that would lead to rejection of a clone/sequence for the development
of a marker (Table 2).
For example, if the majority of the losses were due to the length of observed repeats,
the hybridization conditions were made more stringent for subsequent large scale screenings. In addition,
duplication rates were useful for determining the quality of a small insert library as well determining whether a
screen had reached "saturation." The initial high rate of duplication observed of clones observed in the
restriction enzyme based libraries lead to the decision to develop higher complexity randomly sheared libraries.
In addition, the observation of the relatively large number of clones requiring bi-directional sequencing in
order to define the simple sequence repeats or loss of clones because the repeat was not observed following
sequencing, allowed us to optimize insert size for library construction. Also, classes of clones that were
frequently associated with Alu repetitive elements were prescreened for the presence of Alu sequences in
subsequent marker development (C. Yandava, et al., in preparation). Finally, one class of repeats, (AGG)n
were frequently very complex, yielding long stretches of C/T rich sequence with few perfect repeats,
suggesting that it would be difficult to obtain large numbers of polymorphic markers from this class of repeats.
Polymorphism analysis
Based upon the strategy initially employed by Weissenbach, et al. (2) for the estimation of the
likelihood that a newly isolated STS would be polymorphic in the reference CEPH pedigrees, we estimated
the polymorphic potential of all STSs developed in the initial survey by analyzing the alleles in 4 reference
CEPH individuals. If a given STS had at least three alleles in the 4 individuals, it was categorized as
polymorphic and has been subsequently integrated into the high resolution genetic map of the CHLC. The
yield of highly polymorphic markers developed for each of the trinucleotide repeat classes is shown in Figure 2.
Figure 2. Yield of highly polymorphic markers in the initial trinucleotide repeat survey % polymorphic =
# STSs with > 2 alleles in the four CEPH individuals/total STSs which amplified. n = number of STSs which
amplified.
Development of an extended set of (AAT)n class of markers
The (AAT)n class of simple sequence repeats was determined to be relatively abundant, highly polymorphic
and readily developed into robust markers. Therefore, we screened more than 60,000 additional
(AAT)n-enriched clones (8,000 from each of the six random sheared libraries and 4,000 from each of the
three restriction enzyme digested libraries) in order to expand this pool of potential markers. All
combined (AAT)n screens yielded 2877 positive clones which were sequenced, and 13.9 % of those
sequenced yielded high heterozygosity markers (27). Failures were due to duplicates, Alu repeats, etc.,
as listed in Table 2. This
percentage refers only to the success rate of the clones under the given screening conditions. It does not
reflect the percentage of (AAT)n repeats which are polymorphic in the
entire genome. i.e. The (AAT)n repeats located near Alu repeats may be polymorphic, but cannot be tested
since unique primers cannot be designed. In the initial survey, we ascertained that the (AAT)n class of
repeats are frequently associated with a specific subset of Alu repetitive elements. In order to reduce
the number of sequences which could not be used for primer design due to an Alu element, we prescreened
clones with custom Alu repeat probes and were able to reduce the loss of sequences due to an Alu repeat
from 32% to <10% (C. Yandava, et al., in preparation). Using this larger set of (AAT)n markers, we studied
the relationship between repeat length and informativeness for a given marker. To determine if the length
of the repeat relates to the degree of polymorphism for a given marker, we analyzed 110 new (AAT)n markers
which had been genotyped on the DNA from a pool of 40 individuals and the first 32 (AAT)n markers which
were integrated into the CHLC maps by genotyping on 8 CEPH families (4), therefore yielding heterozygosity
frequencies. The correlations are shown in Figure 3.
DISCUSSION
The development of highly informative markers remains a continuing goal of the Human Genome project
because a lack of markers is often the limiting factor in positional cloning projects. We have performed
a survey of the ten classes of trinucleotide repeats to determine their potential usefulness as new
genetic markers. Plasmid clones containing the (ACG)n and (CCG)n classes were found to be difficult to grow,
screen, and obtain reliable sequence information from, so they were discarded after the initial analysis.
Eight classes of trinucleotide repeats were analyzed in detail.
We found that all classes of trinucleotide repeats are less frequent in the human genome than
the (AC)n repeats. The most abundant trinucleotide repeat class, (AAT)n, occurs approximately every 500 Kb,
while the least frequent class, (ACT)n, occurs approximately every 25,000 Kb. Since the repeats are so
infrequent, we have taken advantage of marker-enriched libraries which have an estimated minimum enrichment
factor of at least 100 (unpublished data). This method has allowed us to isolate large numbers of unique,
repeat-containing clones for any short tandem repeat class. A previous report of data base searches to
estimate the distribution of trinucleotide repeats in the human genome found (AAT)n, (AAC)n, and (AGC)n
to be the most frequent (28). Consistent with this data, we found (AAT)n, and (AAC)n to be the most
frequent trinucleotide repeats in cosmid screenings. However, (AGC)n were roughly 10-fold less abundant.
This inconsistency may results from an overabundance of data base entries with (AGC)n repeats due to
efforts to screen cDNAs for this type of repeat (29,30) or an unrealized systematic experimental error
for which we cannot account.
The distribution of the number of contiguous repeat units varies between the trinucleotide repeat
classes. In general, the range of the distribution of repeat lengths for a given class was indicative
of the utility of that particular class in marker development. This correlation suggests that some
classes are less likely to undergo expansion and contraction and will be less useful as genetic markers.
The biological basis for these differences in stability and lengths of simple sequence repeats remains unclear,
although G/C content appears to be at least one factor. It would be of interest to assess the relative
stability of these different classes in cells carrying mutations in the mismatch repair system (e.g. MSH2
or MLH1 genes) (31,32) and also to determine the differences between the new mutation rates of the various
classes. Initial mutation rate studies of the CHLC tetranucleotide markers [mainly (AGAT)n and (AAGG)n]
suggest that the tetranucleotide-based markers have a mutation rate 3-4 times that of dinucleotides.
However, trinucleotide-based markers [mainly (AAT)n] have a mutation rate roughly equivalent to that of
dinucleotide repeat markers (unpublished data). Mutation rates at specific loci could aid researchers
in determining which genetic markers would be best in cases where the difference between a mutated allele
and a different allele is critical (e.g. paternity testing).
In addition, it would appear that mechanisms that result in the instability of repeats leading to
alterations in repeat length by a few units is different than the mechanisms that result in the expansion
of trinucleotide repeat length observed in some disease genes. We have developed over 300 STSs based on the
(AGC)n repeat class and have found that the majority of the repeats are short and not highly polymorphic
(unpublished data). One has to be cautious in interpreting these results as there may exist some selection
against propagation of plasmids maintaining G/C rich inserts and a systematic selection against clones
containing long G/C rich sequences.
In this survey of the trinucleotide repeats, we have categorized the reasons that a given clone
containing a repeat failed to be developed into a suitable genetic marker. This analysis is most useful
for selecting those classes which will be useful for larger screenings in comprehensive efforts of marker
generation. The most useful class, in terms of rate of polymorphism and quality of markers has been the
(AAT)n > (ACT)n > (AAG)n > (ATC)n > (AAC)n classes of simple sequence repeats. While we will be able to
isolate a number of markers using marker selection on a genome wide basis from all of these classes in
support of the CHLC effort, the low density of most of the trinucleotide repeat classes reduce their
utility in support of more focused local projects. (ACT)n markers tend to be polymorphic, but they are
extremely rare in the genomes. (ATC)n markers are relatively more abundant, but often less informative.
(AAG)n and (AAC)n repeats are frequently associated with Alu elements, making an Alu prescreening step
necessary for large scale development. It is interesting that the five most useful classes in this survey
are all A/T rich. In addition, the most useful tetranucleotide classes [(AGAT)n, (AATG)n, (AATC)n, (AAGG)n]
are also A/T rich (C. Yandava, et al., in preparation). The difference in polymorphic quality among the
classes may suggest that mutation is due to mismatch repair error. However, mutation could also be due
to "slippage" of the DNA polymerase, since slippage would be predicted to be greater for A/T rich sequences.
Since the (AAT)n class of trinucleotide repeats was most abundant and most frequently polymorphic,
large scale screening of this class was initiated. We have used the additional markers to correlate the
repeat length with the polymorphic quality of each STS. As has been observed for (AC)n repeats (8), there
was a trend toward more informative markers as the sequenced repeat length increased. To date, we have
developed more than 415 new (AAT)n markers as part of the Cooperative Human Linkage Center effort. Additional
markers will be developed from the (AAC)n, (AAG)n, (ACT)n and (ATC)n, classes of trinucleotide repeats.
This survey has been useful for determining the efficacy of using these types of short tandem repeats as new
markers, and will provide suggestions for classes to use when screening for additional markers near a disease region.
MATERIALS AND METHODS
Marker-enriched small insert libraries
Plasmid based, small insert marker selected genomic DNA libraries were constructed from AluI, HaeIII,
or EcoRV/SspI digested DNA or randomly sheared human genomic DNA as previously described (25). Briefly,
a short tandem repeat oligo was extended off single-stranded uracil-containing phage DNA prepared from
small-insert human genomic libraries. The extension products were transformed into wild type bacteria to
enriched for the double stranded products which contained short tandem repeats. The estimated complexity of
the restriction enzyme generated libraries was one genome equivalent, and the complexity of the randomly
sheared genomic library was estimated to be six genome equivalents (27). As previously described, libraries
were picked into 96 well microtiter plates and arrayed by replica platings in order to simplify and accelerate
the screening process (25).
Identification of Trinucleotide repeat containing clones
Human genomic cosmid clones (Stratagene) and small insert clones containing simple sequence repeats
based on the desired trinucleotide motif were detected by screening marker enriched libraries
using the Quick-Light hybridization system (FMC Corporation). This system utilizes oligonucleotides
corresponding to each repeat class, directly conjugated to alkaline phosphatase. Hybridization and wash
temperatures were as follows: (AAC)n, 57 deg. C; (AAG)n 56 deg. C; (AAT)n 32 deg. C; (ACC)n 55 deg. C;
(ACG)n 58 deg. C; (ACT)n 45 deg. C; (AGC)n 58 deg. C; (AGG)n 58 deg. C; (ATC)n 45 deg. C. Clones for the
(CCG)n class of repeats were screened with an alternative protocol to reduce background due to the high
G/C content: a custom (CCG)15 probe was constructed by Lifecodes Corporation, hybridization was performed
at 48 degrees Celsius in Quick-Light hybridization buffer, washes consisted of 2X SSC at room temperature;
3M TEMAC, 0.1% SDS at 60 degrees Celsius; 2X SSC at room temperature.
Development of Sequence Tagged Sites (STSs)
Hybridization positive, small insert clones were subjected to single pass cycle sequencing
using the M13 (-21), M13 reverse, and/or SP6 dye primer kits including Taq polymerase (Applied
Biosystems, Inc.) with the ABI373 automated sequencer (Applied Biosystems, Inc.). Template DNA
was prepared using the Magic Minipreps kit (Promega Corporation). Duplicate clones were detected
using Sequencher (GeneCodes Corporation). Primers flanking the repeat were chosen using the Primer
program (Whitehead Institute/MIT) as implemented by the CHLC primer pipeline. For information on
the CHLC pipeline server, send a blank email message to: primer-server@chlc.org.
Polymorphism analysis
The STSs generated in the initial survey of 50 STS from each trinucleotide simple sequence
repeat class were genotyped on four reference CEPH individuals (1331 01, 1331 02, 1408 01, and 1408 02)
using standard PCR conditions. Subsequent STSs were analyzed for their degree of polymorphism and
ranges of allele sizes by PCR amplification of pooled DNA samples derived from 40 CEPH individuals. (27).
Where heterozygosity frequencies are listed, the data reflects typing of 8 CEPH families (4).
Chromosome assignment
Tentative localization of each STS to a specific human chromosome was accomplished by PCR based
screening of the National Institute of General Medical Sciences (NIGMS) somatic cell
hybrid mapping panels #1 and/or #2.
Electronic access to data
CHLC maps and marker information are available though several electronic information sources:
anonymous ftp (ftp. chlc. org), Gopher (gopher. chlc. org), and World Wide Web (http://www.chlc.org).
Nomenclature
In this manuscript, the classes of trinucleotide repeats are referred to by their "alphabetically minimal"
names (listed in ref. 33) in order to simplify the literature on short tandem repeats. For example,
(AAT)n repeats are equivalent to (ATA)n, (TAA)n, (TAT)n, (TTA)n, and (ATT)n repeats. Unfortunately,
this nomenclature was not adopted at the start of the CHLC database. The names of the trinucleotide
repeat-based STSs in the CHLC database are identical to those listed in this paper except the following:
(AAT)n = (ATA)n, (ACC)n = (CAC)n, (AGG)n = (CCT)n, (AAG)n = (CTT)n, (AGC)n = (GCT)n, and (CGC)n = (CCG)n.
ACKNOWLEDGMENTS
We are grateful to David Kwiatkowski, Richard Baldarelli, and George M. Church for critical
reading of the manuscript. We thank the following for their invaluable technical assistance:
Jelveh Ghazizadeh (Harvard Medical School), Gretel Mattes, John Beck, Brain Thompson, Tom Businga,
Kerry Wiles, Dee Even (University of Iowa), Matt Stephenson, Donna David (Marshfield Clinic),
Robert K. Stodola, Frank J. Manion, Raymond Reichard, Michel van der List, and John Quillen
(Fox Chase Cancer Research Center).
REFERENCES
- NIH/CEPH Collaborative Mapping Group. (1992) A comprehensive genetic linkage map of the human genome.
Science, 258, 67-86.
- Weissenbach, J., Gyapay, G., Dib, C., Vignal, A., Morissette, J., Millasseau, P.,
Vaysseix, G., Lathrop, M. (1992) A second-generation linkage map of the human genome. Nature, 359, 794-801.
- Gyapay, G., Morissette, J., Vignal, A., Dib, C., Fizames, C., Millasseau, P., Marc, S.,
Bernardi, G., Lathrop, M., Weissenbach, J. (1994) The 1993-94 Genethon human genetic linkage map.
Nature Genet., 7, 246-339.
- Murray, J.C., Buetow, K.H., Weber, J.L., Ludwigsen, S., Scherpbier-Heddema, T., Manion, F., Quillen,
J., Sheffield, V.C., Sunden, S., Duyk, G., Weissenbach, J., Gyapay, G., Dib, C., Morissette,
J., Lathrop, G.M., Vignal, A., White, R., Matsunami, N., Gerken, S., Melis, R., Albertsen, H.,
Plaetke, R., Odelberg, S., Ward, D., Dausset, J., Cohen, D., Cann, H. (1994) A comprehensive human linkage
map with centimorgan density. Science, 265, 2049-2054.
- Weber, J. L., May, P. E. (1989) Abundant class of human DNA polymorphisms which can be typed using
the polymerase chain reaction. Am. J. Hum. Genet., 44, 388-396.
- Litt, M., Luty, J. A. (1989) A hypervariable microsatellite revealed by in vitro amplification of a
dinucleotide repeat within the cardiac muscle actin gene. Am. J. Hum. Genet., 44, 397-401.
- Stallings, R.L., Ford, A. F., Nelson, D., Torney, D. C., Hildebrand, C.E., Moyzis, R. K. (1991)
Evolution and distribution of (GT)n repetitive sequences in mammalian genomes. Genomics, 10, 807-815.
- Weber, J. (1990) Informativeness of human (dC-dA)n (dG-dT) polymorphisms. Genomics, 7, 524-530.
- Edwards, A., Civitello, A., Hammond, H.A., Caskey, C.T. (1991) DNA typing and genetic mapping
with trimeric and tetrameric tandem repeats. Am. J. Hum. Genet., 49, 746-756.
- Verkerk, A.J., Pieretti, M., Sutcliffe, J.S., Fu, Y.H., Kuhl, D.P., Pizzuti, A., Reiner, O.,
Richards, S., Victoria, M.F., Zhang, F., Eussen, B.E., van Ommen, G.-J.B., Blonen, L.A.J.,
Riggins, G.J., Chastain, J.L., Kunst, C.B., Galjaard, H., Caskey, C.T., Nelson, D.L., Oostra, B.A.,
Warren, S.T. (1991) Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint
cluster region exhibiting length variation in Fragile X syndrome. Cell, 65, 905-914.
- La Spada, A. R., Wilson, E. M., Lubahn, D. B., Harding, A. E., Fischbeck, K. H. (1991) Androgen
receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature, 352, 77-79.
- Brook, J. D., McCurrach, M.E., Harley, H.G., Buckler, A.J., Church, D., Aburatani, H.,
Hunter, K., Stanton, V.P., Thirion, J.-P., Hudson, T., Sohn, R., Zemelman, B., Snell, R.G.,
Rundle, S.A., Crow, S., Davies, J., Shelbourne, P., Buxton, J., Jones, C., Juvonen, V.,
Johnson, K., Harper, P.S., Shaw, D.J., Housman, D.E. (1991) Molecular basis of myotonic dystropy:
Expansion of a trinucleotide (CTG) repeat at the 3' end of a transcript encoding a protein kinase
family member. Cell, 68, 799-808.
- Mahadevan, M., Tsilfidis, C., Sabourin, L., Shutler, G., Amemiya, C., Jansen, G., Neville, C.,
Narang, M., Barcelo, J., O'Hoy, K., Leblond, S., Earle-MacDonald, J., De Jong, P.J., Wieringa,B,
Korneluk, R.G. (1992) Myotonic Dystrophy mutation: An unstable CTG repeat in the 3' untranslated
region of the gene. Science, 255, 1253-1255.
- Fu, Y.-H., Pizzuti, A., Fenwick, Jr., R.G., King, J., Rajnarayan, S., Dunne, P.W., Dubel, J.,
Nasser, G.A., Ashizawa, T., De Jong, P., Wieringa,B., Korneluk, R., Perryman, M.B., Epstein, H.F.,
Caskey, C.T. (1992) An unstable triplet repeat in a gene related to myotonic muscular dystrophy.
Science, 255, 1256-1258.
- The Huntington's Disease Collaborative Research Group. (1993) A novel gene containing a
trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell, 72, 971-983.
- Orr, H. T., Chung, M., Banfi, S., Kwiatkowski, Jr., T.J., Servadio, A., Beaudet, A.L., McCall, A.E.,
Duvick, L.A., Ranum, L.P.W., Zoghbi, H.Y. (1993) Expansion of an unstable trinucleotide CAG repeat
in spinocerebellar ataxia type 1. Nature Genet., 4, 221-226.
- Knight, S. J. L., Flannery, A.V., Hirst, M.C., Campbell, L., Christodoulou, Z., Phelps, S.R.,
Pointon, J., Middleton-Price, H.R., Barnicoat, A., Pembrey, M.E., Holland, J., Oostra, B.A., Bobrow, M.,
Davies, K.E. (1993) Trinucleotide repeat amplification and hypermethylation of a CpG island in FRAXE
mental retardation. Cell, 74, 127-134.
- Koide, R., Ikeuchi, T., Onodera, O., Tanaka, H., Igarashi, S., Endo, K., Takahashi, H.,
Kondo, R., Ishikawa, A., Hayashi, T., Saito, M., Tomoda, A., Miike, T., Naito, H., Ikuta, F.,
Tsuiji, S. (1994) Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysian
atrophy (DRPLA). Nature Genet., 6, 9-13.
- Nagafuchi, S., Yanagisawa,H., Sato, K., Shirayama, T., Ohsaki, E., Bundo M., Takeda, T.,
Tadokoro, K., Kondo, I., Murayama N., Tanaka, Y., Kikushima, H., Umino, K., Kurosawa, H., Nihei, K.,
Inoue, T., Sano, A., Komure, O., Takahashi, M., Yoshizawa, T., Kanazawa, I., Yamada, M. (1994)
Dentatorubral and pallidoluysian atrophy expansion of an unstable CAG trinucleotideon chromosome 12p.
Nature Genet., 6, 14-18.
- Nancarrow, J.K., Kremer, E., Holman, K., Eyre, H., Doggett, N.A., Le Paslier, D., Callen, D.F.,
Sutherland, G.R., Richards, R.I. (1994) Implications of FRA16A structure for the mechanism of
chromosomal fragile site genesis. Science, 264, 1938-1941.
- Burke, J.R., Wingfield, M.S., Lewis, K.E., Roses, A.D., Lee, J.E., Hulette, C.,
Pericak-Vance, M.A., Vance, J.M. (1994) The Haw River Syndrome: Dentatorubropallidoluysian
atrophy (DRPLA) in an African-American family. Nature Genet., 7, 521-524.
- Kawaguchi, Y., Okamoto, T., Taniwaki, M., Aizawa, M., Inoue, M., Katayama, S., Kawakami, H.,
Nakamura, S., Nishimura, M., Akiguchi, I., Kimura, J., Narumiya, S., Kakizuka, A. (1994)
CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nature Genet., 8, 221-227.
- Parrish, J.E., Oostra, B.A., Verkerk, A.J.M.H., Richards, C.S., Reynolds, J., Spikes, A.S.,
Shaffer, L.G., Nelson, D.L. (1994) Isolation of a GCC repeat showing expansion in FRAXF, a
fragile site distal to FRAXA and FRAXE. Nature Genet., 8, 229-235.
- Ostrander, E.A., Jong, P.M., Rine, J., Duyk, G. (1991) Construction of small-insert genomic
DNA libraries highly enriched for microsatellite repeat sequences. Proc. Natl. Acad. Sci. USA, 89, 3419-3423.
- Pulido. J.C., Duyk, G.M. (1994) In Dracopoli, N. C., Haines, J. L., Korf, B. R., Moir, D. T.,
Morton, C. C., Seidman, C. E., Seidman, J. G., Smith, D.R. (eds.), Current Protocols in Human
Genetics. Wiley, New York, pp. 2.2.1-2.2.33.
- Beckmann, J.S., Weber, J.L. (1992) Survey of human and rat microsatellites. Genomics, 12, 627-631.
- Sheffield, V.C., Weber, J.L., Buetow, K.H., Murray, J.C., Even, D.A., Wiles, K., Gastier, J.M.,
Pulido, J.C., Yandava, C., Sunden, S.L., Mattes, G., Businga, T., McClain, A., Beck, J., Duyk, G.M.
(1995) A collection of tri-and tetranucleotide repeat markers used to generate high quality, high
resolution human genome-wide linkage maps. Hum. Molec. Genet., this issue.
- Stallings, R. L. (1994) Distribution of trinucleotide microsatellites in different categories
of mammalian genomic sequence: Implication for human genetic diseases. Genomics, 21, 116-121.
- Riggins, G. J., Lokey, L. K., Chastain, J. L., Leiner, H. A., Sherman, S. L., Wilkinson, K. D.,
Warren, S. T. (1992) Human genes containing polymorphic trinucleotide repeats. Nature Genet., 2, 186-191.
- Li, S.-H., McInnis, M. G., Margolis, R. L., Antonarakis, S. E., Ross, C. A. (1993) Novel triplet
repeat containing genes in human brain: Cloning, expression, and length polymorphisms. Genomics, 16, 572-579.
- Fishel, R., Lescoe, M. K., Rao, M. R. S., Copeland, N. G., Jenkins, N. A., Garber, J., Kane, M.,
Kolodner, R. (1993) The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis
colon cancer. Cell, 75, 1027-1038.
- Liu, B., Nicolaides, N. C., Markowitz, S., Willson, J. K. V., Parsons, R. E., Jen, J.,
Papadopolous, N., Peltomaki, P., de la Chapelle, A., Hamilton, S. R., Kinzler, K. W.,
Vogelstein, B. (1995) Mismatch repair gene defects in sporadic colorectal cancers with
microsatellite instability. Nature Genet., 9, 48-55.
- Jin, L., Zhong, Y., Chakraborty, R. (1994) The exact numbers of possible microsatellite motifs.
Am. J. Hum. Genet., 55, 582-583.
Back to CHLC Home page
Last modified: Thur Nov 30 16:10:16 EDT 1995
The CHLC Informatics Group (help@chlc.org)