JGI Home

Human Chromosome 5: Few Genes, Many Repeats

A team of JGI collaborators from across the United States has announced the completion of the finished sequence for human chromosome 5. The result reveals 177.7 million base pairs, many of which do not code for genes yet are still highly conserved. The collaborators have reported identification of 923 gene loci in a sequence finished to 99.99% accuracy. Both genes and conserved noncoding regions appear to play significant roles in human diseases. The initial analysis already provides new clues about spinal muscular atrophy, pathologies similar to Marfan syndrome, Paget's disease of the bone, allergic asthma, and Crohn's disease.

For intrachromosomal duplications, high % id is found for longer alignement lengths

Large (>5 Kb), highly (>90%) similar segmental duplications on human chromosome 5. Blue lines connect intrachromosomal repeats; red lines, interchromosomal repeats. Purple bars indicate centromeres. (Chromosome 5 is enlarged to show detail.)

The shotgun sequencing process integrated data from all publicly available sequence for chromosome 5. Only four gaps remain, all in the long arm, in regions that appear to be currently uncloneable. The researchers estimated the total chromosome size to be 180.8 megabases (Mb).

The team succeeded in sequencing chromosome 5's many duplications. They found fewer segmental duplications (3.49%) than the genomewide average (5.3%), but these duplications had higher than average percentages of identical bases (>=97.5%), especially among duplications within chromosome 5. Intrachromosomal duplications account for the majority of repeats, and their high sequence identity suggests that the genetic events giving rise to them took place relatively recently, probably when humans were diverging from the great apes. Unlike other human chromosomes, there is relatively little duplication near the centromere. There does, however, appear to be a bias toward interchromosomal duplication near the telomeres. One notable area of duplication encodes genes related to spinal muscular atrophy (SMA). The finished sequence offers the first detailed look at the complex arrangement of duplications there. Some repeats are intrachromosomal, whereas others map to chromosome 6. The collaborators annotated 14 gene loci in this region and documented extensive variations between two haplotypes.

In comparing the sequence of chromosome 5 to the genomes of other vertebrates, the collaborators confirmed and extended previously known homologies. Segmental homology maps comparing the human chromosome and the genomes of the chimpanzee, mouse, rat, chicken, frog ( Xenopus tropicalis ), and fish ( Fugu rubripres ) showed many large-scale rearrangements that must have occurred since our last common ancestor with each of these species. One particularly lengthy rearrangement is an 80-Mb stretch (almost half the chromosome) that appears to be an inversion of chimpanzee chromosome 4. Such large-scale rearrangements are thought to prevent successful interbreeding and thus likely mark key events in the speciation of humans versus other primates. In addition, roughly one-third of chromosome 5 shows similarity to the entire chicken sex chromosome Z, indicating that sex chromosomes have evolved independently since the split between birds and mammals. The researchers also found a nonrandom distribution of conserved noncoding regions, which may regulate relatively distant genes on the chromosome.

Analyses of nucleotide substitutions with respect to the chimpanzee genome suggested that there have been evolutionary constraints on coding sequences as well as on noncoding sequences that are conserved in rodents. The researchers also identified genes under positive selection in humans, of which the top-ranked two were clearly disease related. One ( FBN2 ) is implicated in diseases similar to Marfan syndrome, the other ( SQSTM1 ) in Paget's disease.

The interleukin gene cluster on the long arm of chromosome 5 is another region of medical interest, as it contains five genes that encode blood-cell growth factors and also codes genes associated with allergic asthma and Crohn's disease. Comparison with mouse and   chicken sequences confirmed a fast evolutionary rate for interleukin genes and revealed significant areas of noncoding conservation within the region.

Thus far, some 80 human diseases have been traced to chromosome 5, 66 to specific loci. As future analyses are done, the finished sequence will surely enable further gains in our understanding of genetic disorders.

Authors

J. Schmutz, J. Grimwood, E. Bajorek, S. Black, C. Caoile, Y.M. Chan, M. Denys, J. Escobar, D. Flowers, D. Fotopulos, M. Gomez, E. Gonzales, L. Haydu, F. Lopez, C. Medina, L. Ramirez, J. Retterer, A. Rodriguez, S. Rogers, A. Salazar, M. Tsai, N. Vo, J. Wheeler, K. Wu, J. Yang, M. Dickson, A. Olsen, and R.M. Myers (Stanford Human Genome Center, SHGC); J. Martin, A. Terry, D. Scott, W. Huang U. Hellsten, A. Aerts, J.C. Detter, T. Glavina, D. Goodstein, I. Grigoriev, N. Hammon, T. Hawkins, S. Israni, J. Jett, K. Kadner, H. Kimball, Y. Lou, D. Martinez, J. Morgan, S. Pitluck, M. Pollard, P. Predki, A. Salamov, H. Tice, A. Ustaszewska, D.S. Rokhsar, P. Richardson, and S.M. Lucas (JGI); O. Couronne, S. Prabhakar, J. Priest, and J.-F. Cheng (Lawrence Berkeley National Laboratory, LBNL); M. Groza and R. Nandkeshwar (Lawrence Livermore National Laboratory, LLNL); J.F. Challacombe (Los Alamos National Laboratory, LANL); X. She and E.E. Eichler (Case Western Reserve University and University Hospitals of Cleveland); J.P. Noonan (Stanford University); L.A. Gordon, M. Tran-Gyamfi, E. Branscomb, A. Kobayashi, and A. Olsen (JGI and LLNL); G. Xie, M. Altherr, and N. Thayer (JGI and LANL); and L. Pennacchio and E.M. Rubin (JGI and LBNL)

 

pattern of interchromosomal and intrachromosomal duplications

Comparisons of the two sequenced haplotypes for the SMA region. At top is the gene content for SMAvar1. Immediately below it is a map of interchromosomal (red) and intrachromosomal (blue) duplications and corresponding % identities. Light pink areas are repeats on chromosome 5, dark pink are on chromosome 6, and yellow are on chromosome 3. Analogous information for SMAvar2 is at bottom. The center graph shows how the structures of the two variants map to each other .

 

segmental homology maps

Gene density (blue) and density of noncoding conservation (purple) with mouse plus (top to bottom) rat, chicken, frog, and fish across the length of chromosome 5.

 

Publication

"The DNA Sequence and Comparative Analysis of Human Chromosome 5", Nature 431, 268-274 (2004), doi: 10.1038/nature02919.

Funding

This research was funded by the U.S. Department of Energy Office of Biological and Environmental Research.