Posted on 8 Oct. 2004
UCSC has released v1.1 of the
UCSC
Proteome Browser. This updated version includes the
following major enhancements:
- direct access to a Proteome Browser gateway via the
"Proteome Browser" menu link
on the Genome Browser home page. The Proteome Browser gateway
page prompts the user for a protein ID or gene
symbol, then directly starts up the Proteome Browser,
eliminating the multiple Genome Browser steps required by
the previous release.
- extended protein coverage that includes all proteins in
the Swiss-Prot/TrEMBL databases, rather than just the
human, mouse and rat protein sets included in the previous
version.
In addition to providing direct access to the Proteome
Browser, the v1.1 release preserves the existing tight
coupling between the Proteome Browser and the Genome Browser
for the human, mouse and rat genomes. Users may still
navigate between the Genome Browser Known Genes track and
the Proteome Browser for quick, easy access to
the details of a gene's genomic and proteomic
sequence structures.
The UCSC team who worked on the Proteome Browser update
includes Fan Hsu, Robert Kuhn, Donna Karolchik and Tom
Pringle. Please send comments or questions to our mailing
list at
genome@soe.ucsc.edu.
Posted on 5 Oct. 2004
The UCSC Bioinformatics Group announces two seminars and
hands-on workshops on the UCSC Genome Browser, presented by
OpenHelix,
a bioinformatics training, software testing and consulting
company.
These introductory sessions are geared towards anyone with a
basic knowledge of genomic and biological concepts who is
interested in learning how to use the UCSC Genome Browser.
No programming experience is required. The seminars will
cover the topics necessary to learn how to effectively use
the browser tool set, including basic Genome Browser
functionality, searching and BLAT use, Table Browser use,
creating and using custom annotation tracks, and an
introduction to the Gene Sorter. Lectures will be
accompanied by hands-on computer exercises conducted directly
on the Genome Browser web site.
The first three-hour course will be held on Tuesday 9 November
in the Washington, D.C. area. Two sessions will be
offered: 9am-12pm and 1pm-4pm.
The second seminar will be held in the Raleigh/Durham, N.C.
area on Wednesday, 10 November. Two sessions
will be offered: 9am-12pm and 1pm-4pm.
For registration information, visit the
OpenHelix website or call 1-888-861-5051.
Academic, student, and early registration discounts are
available.
Posted on 1 Oct. 2004
We are pleased to announce the release of an enhanced version
of the UCSC Table Browser.
This new release sports several improvements and additions
over the previous Table Browser, including:
- a new streamlined user interface
- support for generating filters that include fields from multiple tables, including those from non-positional tables
- an enhanced schema-viewing utility that displays all
tables associated with a track, as well as all tables linked
to a selected table
- the ability to restrict queries to include only data from ENCODE regions
- the addition of a GALA output option
- an option to save large output results sets directly to a
file rather than displaying them in the Table Browser
The preliminary
User's Guide
will be enhanced in upcoming weeks to include examples of
many common Table Browser queries generated by our users.
The new version of the Table Browser was produced by
Jim Kent, Donna Karolchik, Heather Trumbower, Hiram Clawson,
and Robert Kuhn, and incorporates code from the
previous version written by Angie Hinrichs.
We'd like to thank Mary Mangan and others on the
OpenHelix staff for their feedback on early versions of this
software.
The older version of the Table Browser will remain available
for a limited time at
http://genome.ucsc.edu/cgi-bin/hgText. Please send feedback and questions to our
mailing list at genome@soe.ucsc.edu.
Posted on 10 Sep. 2004
The Genoscope v7 Tetraodon nigroviridis genome
assembly is now available in the UCSC Genome Browser and
Blat server. This assembly, UCSC version tetNig1 dated
Feb. 2004, is the result of a collaboration between
Genoscope and the
Broad Institute of MIT and Harvard.
The v7 assembly was constructed using the whole genome shotgun
(WGS) approach, resulting in a sequence coverage of about
7.9X. The assembly contains 45,609 contigs and 25,773
scaffolds generated by the Arachne program and covers more
than 90% of the genome. Additional linking data were used to
build ultracontigs and to organize the assembly into
chromosomes. Genoscope estimates the size of the Tetraodon
genome to be about 385 Mb.
Downloads of the tetNig1 data and annotations may be obtained
from the UCSC Genome Browser
FTP server
or Downloads
page. These data have been freely
provided by Genoscope before publication with
specific conditions for use.
The initial set of annotation tracks were generated by
Genoscope and the UCSC Bioinformatics Group based on
data provided Genoscope. Tetraodon gene predictions
generated by Genoscope using
GAZE will be available in the Genome
Browser within a few weeks.
Many thanks to Genoscope and the Broad Institute of
MIT and Harvard for this genome assembly. The UCSC team who
produced this browser are Rachel Harte, Robert Kuhn,
Donna Karolchik, and the Genome Browser sysadmin team.
See the
Credits
page for a detailed list of the organizations and individuals
who contributed to this release.
Posted on 1 Sep. 2004
The UCSC Genome Bioinformatics Group has released a Genome
Browser and Blat server for the Drosophila
pseudoobscura Freeze 1 draft assembly (Aug. 2003).
This assembly, UCSC version dp2, was
produced by the Human Genome Sequencing Center (HGSC) at
Baylor College of Medicine.
Freeze 1 is a whole genome shotgun assembly produced using
Baylor HGSC's assembly engine, Atlas. The assembly, which
provides approximately 7x coverage of the euchromatic portion
of the genome, contains 759 scaffolds. The scaffold N50 size
is 1,018,646 bp. The total scaffold size for this assembly
is 139.3 Mbp, with an average size of 184,465 bp. Due to an
assembly error, four large scaffolds "jumped" chromosomes.
These have been split into "A" and "B" parts in the
downloadable assembly files. See the gateway page for more
information.
Baylor HGSC has provided a
putative chromosome assignment
for the majority of larger scaffolds (> 90% of unique
sequence), based on conservation between the Muller elements.
Downloads of the dp2 data and annotations may be obtained
from the UCSC Genome Browser
FTP server
or Downloads
page. The initial set of annotation tracks were generated by
UCSC.
Many thanks to the Baylor HGSC for providing the genome
assembly data. The UCSC team who produced this browser are
Angie Hinrichs, Heather Trumbower, Robert Kuhn,
Donna Karolchik, and the Genome Browser sysadmin team.
See the
Credits
page for a detailed list of the organizations and individuals
who contributed to this release.
Posted on 30 Aug. 2004
The UCSC Bioinformatics Group announces 2 seminars and
hands-on workshops on the UCSC Genome Browser, presented by
OpenHelix,
a bioinformatics training, software testing and consulting
company.
These introductory sessions are geared towards anyone with a
basic knowledge of genomic and biological concepts who is
interested in learning how to use the UCSC Genome Browser.
No programming experience is required. The seminars will
cover the topics necessary to learn how to effectively use
the browser tool set, including basic Genome Browser
functionality, searching and BLAT use, Table Browser use,
creating and using custom annotation tracks, and an
introduction to the Gene Sorter. Lectures will be
accompanied by hands-on computer exercises conducted directly
on the Genome Browser web site.
The first three-hour course will be held on Monday 4 October
in Kirkland, WA (Seattle area). Two sessions will be
offered: 9:00 a.m. - noon or 1 - 4 p.m.
The second seminar will be held in Cupertino (San
Francisco Bay Area) on Tuesday, 5 October. Two sessions
will be offered: 1 - 4 p.m. or 6 - 9 p.m.
For registration information, visit the
OpenHelix website or call 1-888-861-5051.
Academic, student, and early registration discounts are
available.
Posted on 10 Aug. 2004
The UCSC Genome Bioinformatics Group has released a Genome
Browser and Blat server for the Anopheles gambiae
v. MOZ2 draft genome sequence (Feb. 2003). This assembly --
UCSC version anoGam1 -- was produced by the International
Anopheles Genome Project and downloaded from Ensembl.
The MOZ2 assembly is a 10x whole genome shotgun assembly.
The assembled portion of the genome is about 278 Mbp in
length with a total of 8,987 unique scaffolds, the largest
scaffold being 23.1 Mbp. Approximately 85% of the sequence
has been assigned to chromosomal locations. Chromosome arms
chr2L, chr2R, chr3L, chr3R, and
chrX are represented by 13, 49, 42, 28, and 10 large scaffolds
respectively. No scaffolds have yet been assigned to the Y
chromosome. The unassigned scaffolds, concatenated together
in arbitrary order, can be found in the artificial unknown
"chromosome" chrUn.
For more information about the initial A. gambaie
assembly, see Holt et al. (2002),
The Genome Sequence of the Malaria Mosquito
Anopheles gambiae,
Science 2002 298:129-149.
Downloads of the anoGam1 data and annotations may be obtained
from the UCSC Genome Browser
FTP server or
Downloads
page. The anoGam1 annotation tracks were generated by UCSC and
collaborators worldwide.
Many thanks to the International Anopheles Genome
Project and Ensembl for providing the sequence of this genome.
The UCSC team who produced this browser are Angie Hinrichs,
Galt Barber, Donna Karolchik, and sysadmins Paul Tatarsky and
Jorge Garcia. See the
Credits
page for a detailed list of the organizations and individuals
who contributed to this release.
Posted on 29 Jul. 2004
The Zv3 Zebrafish genome assembly (UCSC version danRer1) is
now available on the UCSC Genome Browser and Blat server.
This assembly was produced by The Wellcome Trust Sanger
Institute, Hinxton, UK, in collaboration with the Max Planck
Institute for Developmental Biology in Tuebingen, Germany,
and the Netherlands Institute for Developmental Biology
(Hubrecht Laboratory), Utrecht, The Netherlands.
The Zv3 assembly consists of 1,459,115,486 bp in 58,339
supercontigs, with a sequence coverage of approximately
5.7X. This zebrafish assembly is the first to be tied to
the FPC map: 1,083,447,588 bp (74%) of the sequence were
mapped in this way. Please note that this is a preliminary
assembly; a high level of misassembly is present due to
polymorphisms in the DNA source.
For more information about this assembly, see the Sanger
Institute's
Danio rerio Sequencing Project
web page.
UCSC plans to release the Zv4 version of the zebrafish
assembly on the Genome Browser in Fall '04.
Downloads of the Zebrafish data and annotations can be
obtained from the UCSC
FTP site or
Downloads
page. The danRer1 annotation tracks were generated by
UCSC and collaborators worldwide. See the
Credits
page for a detailed list of the organizations and individuals
who contributed to the success of this release.
We'd like to thank The Wellcome Trust Sanger Institute
and their collaborators for providing this assembly. A special
thanks to Yi Zhou, Anthony DiBiase and Leonard Zon from the
Children's Hospital in Boston, MA, USA for their collaboration
on this release. The UCSC Zebrafish
Genome Browser team is Rachel Harte, Heather Trumbower, and
Donna Karolchik.
Posted on 23 Jul. 2004
The latest human genome reference sequence (NCBI Build 35,
May 2004) is now available as database hg17 in the UCSC
Genome Browser and Blat server. This sequence was obtained
from NCBI and was produced by the International Human
Genome Sequencing Consortium.
Bulk downloads of the data are available via FTP at
ftp://hgdownload.cse.ucsc.edu/goldenPath/hg17
or through the Downloads link on this page. We recommend
that you use FTP rather than HTML for the download of large
or multiple files.
We'd like to thank NCBI and the International Human Genome
Sequencing Consortium for furnishing the data, and the UCSC
team members who contributed to this release: Hiram Clawson,
Terry Furey, Heather Trumbower, Robert Kuhn, Donna Karolchik,
Kate Rosenbloom, Angie Hinrichs, Rachel Harte, Jim Kent and
our sysadmin team Patrick Gavin, Jorge Garcia,
and Paul Tatarsky.
Posted on 23 Jul. 2004
The UCSC Genome Bioinformatics Group has released a Genome
Browser and Blat server on a second species of fruitfly,
D. yakuba. The April 2004 Release 1.0 of this genome
(UCSC version droYak1) was sequenced and assembled by the
Genome Sequencing Center, Washington University (WUSTL)
School of Medicine in St. Louis.
D. yakuba is closely related to the model organism,
D. melanogaster, with which it shared a common
ancestor approximately 10 million years ago.
The D. yakuba genome is largely alignable to the
D. melanogaster genome, but differs sufficiently to
offer an interesting study of sequence divergence between
the two species. D. yakuba occupies a critical
intermediate position among several Drosophila species that
will facilitate evolutionary studies among the fruitflies.
For information about the D. yakuba assembly
and statistics, see the WUSTL Genome Sequencing Center
Drosophila yakuba web page.
Downloads of the droYak1 data and annotations can be obtained
from the UCSC Genome Browser
FTP server or
Downloads page.
The droYak1 annotation tracks were generated by UCSC and
collaborators worldwide.
Thanks to the Genome Sequencing Center at WUSTL School of
Medicine for providing the sequence and assembly of this
genome. The
UCSC D. yakuba Genome Browser was produced by Angie
Hinrichs, Michael Chalup, and Donna Karolchik. See the
Credits
page for a detailed list of the organizations and individuals
who contributed to the success of this release.
Posted on 16 Jul. 2004
The latest mouse assembly -- Build 33 from NCBI (UCSC version
mm5) -- is now available via the UCSC Genome Browser and
Blat server. This assembly includes approximately
2.6 gigabases of sequence.
Chromosome 11 is finished in Build 33; the Sanger
Institute has provided a corresponding agp file.
The whole genome N50 for this assembly is 22.3 Mb, in comparison
to 17.7 Mb for the previous build.
Please note: the UCSC mm5 assembly contains only the
reference strain C57BL/6J.
This assembly is a composite version in which phase 3 High
Throughput Genome Sequence (HTGS) was merged with the
Mouse Genome Sequencing Consortium v3 Whole Genome Shotgun
Assembly (MGSCv3). The assembly was performed by NCBI using
a "combined" tiling path that was created
automatically for the most part, but was manually curated
in places. This facilitated the placement of finished
sequence in the context of the MGSCv3 assembly. Draft
sequence was not included in this build: the slight
increase in coverage gained by using this would have been
offset by the increase in build errors.
More information about Build 33 will be available soon in
the NCBI
assembly notes and
Build 33 statistics.
The mm5 sequence and annotation data may be downloaded from
the Genome Browser
FTP
server or
Downloads
web page. The mm5 annotation tracks were generated by UCSC
and collaborators worldwide.
We'd like to thank Deanna Church, Richa Agrawala, and
the Mouse Genome Sequencing Consortium
for this assembly. We'd also like to
acknowledge the work of the UCSC mm5 team: Fan Hsu,
Hiram Clawson, Angie Hinrichs, Heather Trumbower, Mark
Diekhans, Donna Karolchik and our systems
administrators Jorge Garcia, Patrick Gavin and Paul Tatarsky.
Posted on 15 Jul. 2004
The v1.0 C. intestinalis draft assembly from the
US DOE Joint Genome Institute is now available for study
using the UCSC Genome Browser and Blat server (UCSC database
ci1).
The whole genome shotgun assembly was constructed with the
JGI assembler (JAZZ) paired-end
sequencing reads at a coverage of 8.2X. The draft contains
116.7 million bp of nonrepetitive sequence in 2,501 scaffolds
greater than 3 kb. 60 Mbp of this has been assembled into
117 scaffolds longer than 190 Kbp, and 85% of the assembly
(104.1 Mbp) is found in 905 scaffolds longer than 20 kb. The
assembly, gene modeling and analysis were performed at the
JGI.
For more information about the ci1 assembly, see the JGI
C. intestinalis project page.
Additional information and an analysis of the euchromatic
regions of this genome may be found in Dehal et al.,
The Draft Genome of Ciona intestinalis:
Insights into Chordate and Vertebrate Origins. Science.
2002 Dec 13;298(5601):2157-67.
Bulk downloads of the sequence and annotation data are
available via the Genome Browser
FTP server or
Downloads page. The ci1 annotation tracks
were generated by UCSC and collaborators worldwide. See the
Credits page for a detailed list of the organizations and
individuals who contributed to this release.
Many thanks to the JGI and their collaborators for providing
the v1.0 sequence and annotations. The ci1 Genome Browser
was produced by
Brian Raney, Galt Barber, Heather Trumbower, Robert Kuhn,
Donna Karolchik and the Genome Browser sysadmin team -
Patrick Gavin, Jorge Garcia, and Paul Tatarsky. We'd also
like to thank Tom Pringle for his technical input and Mark
Diekhans for his work on the incremental updates for this
release.
Posted on 14 Jul. 2004
UCSC has released a Genome Browser and Blat server on the
July 2004 v1.0 dog genome sequenced and assembled by the
Broad Institute of MIT and Harvard and Agencourt Bioscience.
The whole genome shotgun (WGS) sequence is
based on 7.6X coverage of the dog genome, assuming a WGS
assembly size of 2.4 Gb. The assembly has an N50 contig
length of 123 kb and an N50 supercontig length of 41.6 Mb.
The dog genome, which contains approximately 2.5 billion
base pairs, is similar in size to the genomes of humans and
other mammals. The boxer breed was selected for the initial
sequencing effort, based on the lower variation rate in its
genome relative to other breeds. In addition to the boxer,
samples from nine other dog breeds, four wolves and a coyote
are being used to generate an initial set of single
nucleotide polymorphisms (SNPs) to facilitate disease studies.
The SNPs should be available soon from
dbSNP.
For more information about the dog draft assembly, see the
NHGRI
press release.
The dog sequence and annotation data can be downloaded from
the UCSC Genome Browser
FTP server
or downloads
page. These data have
specific
conditions for use.
Many thanks to the Broad Institute of MIT and Harvard, NHGRI,
Agencourt Bioscience, Children's Hospital Oakland Research
Institute, Centre National de la Recherche Scientifique,
North Carolina State University, and Fred Hutchinson Cancer
Research Center for their contributions to the sequencing,
assembly, and mapping efforts. The initial canFam1
annotation track set, generated by the UCSC Genome
Bioinformatics Group, will soon be
supplemented by annotations from collaborators
worldwide. See the
credits page
for a detailed list of the organizations and individuals who
contributed to the success of this release.
Posted on 24 Jun. 2004
We'd like to announce the release of UCSC Genome Browser
features tailored to the ENCODE project community, including
an
ENCODE-specific page to
highlight
the ENCODE contributors and their work, guidelines for data
submission, and a list of specific links to ENCODE regions
in the Genome Browser.
The initial resources include sequences
for the current human assemblies (hg16, hg15, hg13, and hg12),
sequence of the
comparative species from NISC, tools for coordinate
conversion between human assemblies, format descriptions for
data submission, and contact information for help with
submitting annotation data and analyses. Bulk downloads of
the sequence and annotations may be obtained from the ENCODE
Project
Downloads
page.
We'd like to thank NHGRI for their
support of this project and the various contributors of
annotations and analyses.
Posted on 10 Jun. 2004
The UCSC Bioinformatics Group announces a seminar and hands-on
workshop on the UCSC Genome Browser, presented by
OpenHelix,
a bioinformatics training, software testing and consulting
company.
This introductory session is geared towards anyone with a
basic knowledge of genomic and biological concepts who is
interested in learning how to use the UCSC Genome Browser.
No programming experience is required. The seminar will
cover the topics necessary to learn how to effectively use
the browser tool set, including basic Genome Browser
functionality, searching and BLAT use, Table Browser use,
creating and using custom annotation tracks, and an
introduction to the Gene Sorter. The lecture will be
accompanied by hands-on computer exercises conducted directly
on the Genome Browser web site.
The three-hour course will be held at Tufts University School
of Medicine, 145 Harrison Street, Boston, MA, on Tuesday,
August 10th. Two sessions will be offered: 1 - 4 p.m. or
6 - 9 p.m. For registration information, visit the
OpenHelix website or call 1-888-861-5051.
Academic, student, and early registration discounts are
available.
Posted on 28 May 2004
What's in a name? In an effort to clarify the role of the
UCSC Family Browser, we have changed its name to the UCSC
Gene Sorter. We think this name better describes this tool,
which lets the user collect information on groups of genes
that may be related in many different ways. The Gene Sorter
provides a wealth of information on gene expression, protein
homology (both within and across species), GO terms, and
Pfam domains, cross links to many other databases, and much
more.
If you haven't already tried this tool, we encourage you to
give it a spin. You'll find it at
http://genome.ucsc.edu/cgi-bin/hgNear,
or click the "Gene Sorter" link on any Genome Browser menu
bar.
Posted on 27 May 2004
As a follow-up to last week's FTP site switch, we are changing
the location of the UCSC Genome Browser downloads site to
http://hgdownload.cse.ucsc.edu/.
All downloadable files currently located in
http://genome.ucsc.edu/goldenPath
will be moved to the new server.
Please make a note of the new URL and update any references
to it. Users accessing downloads through the Genome Browser
Downloads page
will be redirected automatically to the new location.
Posted on 19 May 2004
We have changed the URL for the UCSC Genome Browser ftp site
to ftp://hgdownload.cse.ucsc.edu/.
This replaces the old URL of
ftp://genome.ucsc.edu/.
The old URL will be disabled within a few days.
Please make a note of the new URL and update any references
to it.
Posted on 11 May 2004
UCSC has released a Genome Browser and Blat server on an
updated version of the C. elegans genome. The
March 2004 assembly -- UCSC version ce2 -- is based on
sequence version WS120 deposited into
WormBase
as of 1 March 2004. This assembly has a finishing error rate
of 1:10,000.
The ce2 sequence and annotation data may be downloaded from
the Genome Browser
FTP server
or Downloads web page.
The ce2 annotation tracks were generated by UCSC and
collaborators worldwide.
We'd like to thank the
Genome Sequencing Center
at Washington University in St. Louis and the
Sanger Institute
for their collaborative work in sequencing the
C. elegans genome. Many thanks to the
WormBase consortium for making the worm
sequence publicly available. We'd also like to acknowledge
the UCSC team who contributed to this release: Rachel Harte
(lead engineer), Hiram Clawson (WABA and miRNA annotations),
Mike Chalup (QA), Galt Barber (QA), Heather Trumbower (QA),
and Donna Karolchik (documentation).
Posted on 22 Apr. 2004
Proteome Browser functionality is now available on the
Oct. 2003 mouse genome assembly (mm4). Protein
information may be viewed for most genes in the Known Genes
track by clicking the Proteome Browser link on the gene's
details page.
For more information on the UCSC Proteome Browser, see the
news release dated 10 March 2004.
In conjunction with this release, the Known Genes and Gene
Family Browser protein data have been updated to the
Swiss-Prot version dated 15 March 2004.
Posted on 16 Apr. 2004
The UCSC Bioinformatics Group announces a seminar and
hands-on workshop on the UCSC Genome Browser, presented by
OpenHelix,
a bioinformatics training,
software testing and consulting company.
This introductory session is geared towards industry and
academic biologists engaged in genomics research. No
programming experience is required. The seminar will cover
the topics necessary
to learn how to effectively use the browser tool set,
including basic Genome Browser functionality, searching and
BLAT use, Table Browser use, creating and using Custom Tracks,
and an introduction to the Family Browser. The lecture will
be accompanied by hands-on computer exercises conducted directly on the Genome Browser web site.
The three-hour course will be held at the UCSC
Extension Campus computer lab in Cupertino, CA on Thursday,
May 6th 6-9 p.m. For pricing information or to reserve a
seat in the class, visit the
OpenHelix web site or call 1-888-861-5051.
Pre-registration is required. Academic, student, and early
registration discounts are available.
Posted on 12 Apr. 2004
Expression data from the GNF Gene Expression Atlas 2 are now
available on the July 2003 human genome assembly on the UCSC
web site. The data may be viewed graphically in the
Family Browser or via
the GNF Atlas 2 track in the Genome Browser. The track data
contain 2 replicates each of 61 mouse tissues and 79 human
tissues run over Affymetrix microarrays.
We'd like to thank the Genomics Institute of the Novartis
Research Foundation (GNF) for providing the expression data
underlying the browser displays. More information on the data
will be available in the paper Su et al. "A gene
atlas of the mouse and human protein-encoding transcriptomes"
(in press - PNAS).
Posted on 10 Mar. 2004
We are proud to announce a new addition to the
UCSC family of genome browsing and analysis tools. The UCSC Proteome
Browser presents a rich set of useful protein properties as well as
links to several protein and genomic data sources
on the Web. For the first time, Genome Browser users can have
both the genome and proteome worlds at their fingertips
simultaneously. The browser is accessible from the Genome
Browser via the "Proteome
Browser" link on the details page of any gene in the
Known Genes track. The initial release is available only on
Human Build 34 (hg16); Proteome Browsers for the latest mouse
and rat assemblies will follow.
For each protein, the browser displays the corresponding
genomic exon structure and its amino acid sequence.
Several protein property tracks are aligned to the sequence
to help a user pinpoint regions of interest.
Additional properties are plotted with histograms against
genome-wide protein data to
highlight significant trends and anomalies.
The Proteome Browser is tightly coupled with the UCSC Genome
Browser and UCSC Gene Family Browser, allowing easy navigation
among the tools. For example, clicking on an exon in the
Proteome Browser tracks display brings up the Genome Browser tracks
page showing
the genomic region of the exon together with a wealth of
relevant data. Similarly, clicking on the Proteome Browser's
"Family Browser" link
displays related gene family information.
The v1.0 release of the browser offers a variety of
data tracks, including amino acid and DNA sequence, exon
boundaries, hydrophobicity, polarity, cysteine and predicted
glycosylation sites, Superfamily/SCOP domains, and amino acid
anomalies. In addition, the browser includes histograms of
several properties on a genome-wide scale: pI, molecular
weight, exon count, number of cysteines, InterPro domain
counts, hydrophobicity, amino acid frequencies and anomalies.
The Proteome Browser also provides links to a variety of
external sites containing supplementary information on the protein,
including SwissProt, InterPro and
Pfam domains, 3-D structures at PDB and UCSF ModBase, and
pathway maps of KEGG, BioCarta (CGAP), and BioCyc.
We'd like to thank SwissProt for sharing their high quality
protein data and the pI calculation algorithm, as well as the other
external data sites linked to by the Proteome Browser.
We'd also like to acknowledge the hard work of Fan Hsu, lead
engineer on the project, and Jim Kent, Tom Pringle,
Donna Karolchik, and Robert Kuhn. The project received
technical input, review and support from several other members of
the UCSC Bioinformatics group.
Posted on 1 Mar. 2004
We've added the chicken genome to the collection of assemblies available
in the UCSC Genome Browser and Blat Server.
The Feb. 2004 assembly (UCSC version galGal2) was produced by
the Genome Sequencing Center at the Washington University
School of Medicine in St. Louis. The source of
this sequence was a female inbred Red Jungle Fowl (Gallus
gallus), the ancestor of domestic chickens. The chicken
genome is the first of the avian genomes to be sequenced.
The genome has been sequenced
to 6.63X coverage. Approximately 88% of the sequence has been
anchored to chromosomes, which include autosomes 1-24, 26-28,
and 32, and sex chromosomes W and Z. (In contrast to mammals,
the female chicken is heterogametic (ZW) and the male is
homogametic (ZZ).) The remaining unanchored
contigs have been concatenated into the virtual chromosome
"chrUn", separated by gaps of 10,000 bp. The
chicken mitochondrial sequence is also available as the
virtual chromosome "chrM".
Washington University School of Medicine in St. Louis
created the physical map for this release. Genetic
mapping and linkage analysis were produced through a
collaborative effort led by Martien Groenen at Wageningen
University in the Netherlands.
SNP data based on three strains of domestic
chickens will soon be available in GenBank from an
international team of scientists led by the Beijing Genomics
Institute in China and supported by the Wellcome Trust in
Britain.
The chicken is considered to be the premier non-mammalian
vertebrate model organism. It plays an important role in
the research of viruses and cancer, and is a primary
model for the study of embryology and development. From an
evolutionary standpoint, the chicken's position
provides a good intermediate data point between mouse and
fugu. Comparative genomics analyses between the chicken and
other sequenced organisms should yield valuable
information on the evolution of gene order and
arrangement, thus improving our understanding of the
structure and function of genes.
To facilitate comparative genomics studies,
alignments of the chicken sequence to the human genome
will be available in the Genome Browser later this week.
Downloads of the comparative data are currently
available through the Downloads page (see below).
For more information about the release of the chicken genome
assembly, see the NHGRI
press release.
Additional background on the rationale behind the chicken genome
sequencing effort can be found in the
sequencing proposal.
Bulk downloads of the chicken sequence and annotations may be obtained from
the Genome Browser
FTP server or
Downloads page. These data have
specific conditions for use.
We'd like to thank the Genome Sequencing Center at the Washington University
School of Medicine in St. Louis, Wageningen University, and
the Chicken Mapping Consortium for providing these data.
The chicken browser annotation tracks were generated by UCSC and
collaborators worldwide. See the
Credits
page for a detailed list of acknowledgements. The UCSC Chicken Genome Browser
was produced by Angie Hinrichs, Heather Trumbower, Rachel Harte, and Donna
Karolchik.
Posted on 23 Feb. 2004
We are happy to announce the release of a Genome Browser and Blat server for the
chimpanzee (Pan troglodytes).
The 13 Nov. 2003 Arachne assembly -- labeled Chimp Build 1
Version 1 (UCSC version panTro1) -- was produced by the
Chimpanzee Genome Sequencing
Consortium.
This assembly covers
about 95 percent of the genome and is based on 4X sequence coverage.
It is composed of 361,782 contigs with an N50 length of 15.7 kb, and 37,849 supercontigs having an N50 length of 8.6 Mb (not including
gaps). The total contig length is 2.73 Gb, spanning 3.02 Gb.
The DNA donor for this genome assembly, "Clint",
is a captive-born West African chimpanzee (Pan troglodytes
verus).
Background information on the chimp genome
sequencing project and the initial news release about the chimp
assembly can be found on the NHGRI website.
Research has indicated that
the human and chimp genomes probably differ by approximately one
percent.
Because of this close relationship between chimpanzees and
humans, the assembly should facilitate comparative analyses
of the two genomes that have not been possible with other species that have been sequenced to date.
The initial release of the Chimp Browser provides several
annotation tracks comparing the chimp and human genomes.
More comparative annotations will be added in upcoming
weeks.
Bulk downloads of the chimp sequence and annotations may be obtained from the Genome
Browser FTP server or
Downloads page.
The complete set of sequence reads is available at the
NCBI trace archive.
Blat searches on chrUn_random are not supported in the initial
release, but will be available soon.
We'd like to thank NHGRI, the Eli & Edythe L. Broad Institute at
MIT/Harvard, and Washington University at St. Louis School of
Medicine for providing this sequence, and LaDeana Hillier,
Washington University School of Medicine, and the
Broad Institute for their work on the alignments. The chimpanzee
browser annotation tracks were generated by UCSC and collaborators
worldwide.
The UCSC team who worked on this release consisted of
Kate Rosenbloom, Jim Kent, Hiram Clawson, Heather Trumbower, Robert
Kuhn, Donna Karolchik, and the Genome Browser sysadmin team.
Posted on 12 Feb. 2004
The Genome Browser project now has a UCSC-supported mirror site
that may be used during power or network outages on the UCSC
campus. The mirror -- which can be found at
http://genome.brc.mcw.edu/
-- is located at the Medical College of Wisconsin in Milwaukee.
The site will be updated regularly by UCSC with the latest data and
software to closely replicate the main Genome Browser site at
http://genome.ucsc.edu.
Please continue to use the UCSC-based
site for routine Genome Browser and Blat access.
We'd like to thank the Department of Physiology at the Medical
College of Wisconsin -- and in particular Jeff Nie and Greg
McQuestion -- for their resources and collaboration on this
project.
We'd also like to acknowledge the hard work of UCSC's Paul
Tatarsky, who invested many hours in arranging the collaboration
and setting up the mirror.
Posted on 16 Jan. 2004
We've discovered a handful of hg16 chrN_random_gap
and chrN_random_gold tables on our public server that are
out of date. We have replaced the following tables with
updated versions:
- chr4_random_gap
- chr4_random_gold
- chr8_random_gap
- chr8_random_gold
- chrX_random_gap
- chrX_random_gold
- chrUn_random_gap
- chrUn_random_gold
Many thanks to Grigoriy Kryukov for discovering this
problem. We apologize for any inconvenience this may have
caused to our users.
Posted on 14 Jan. 2004
We are proud to add yeast
(S. cerevisiae) to our growing list
of genome assemblies. The study of brewer's yeast, the most
basic eukaryotic model system, has led to important discoveries
in a wide variety of areas, including metabolism, centromeres,
recombination, cell division control, meiosis and splicing.
This assembly (UCSC version sacCer1) is based on sequence dated 1 Oct. 2003 in the
Saccharomyces Genome
Database (SGD). The sequence, open reading frame (ORF), and gene annotations
were downloaded from the site
ftp://genome-ftp.stanford.edu/pub/yeast/data_download.
The S288C strain was used in this sequencing project. Reference information for
each chromosome may be found in the SGD
Systematic
Sequencing Table. For more information about the yeast genetic and physical
maps, see the paper Cherry JM et al.
Genetic and physical maps of Saccharomyces cerevisiae.
Nature 1997 387(6632 Suppl):67-73.
Downloads of the yeast data and annotations may be obtained from the UCSC Genome
Browser FTP server or
Downloads page.
We'd like to thank Stanford University, the SGD, the University of California
San Francisco (UCSF), Washington University in St. Louis, and the Eli & Edythe
L. Broad Institute at MIT/Harvard for providing the data and annotations for
this assembly. We'd also like to acknowledge the UCSC team who worked on this
release: Jim Kent, Heather Trumbower, Robert Kuhn, Donna Karolchik, and our
sysadmin team.
Posted on 10 Dec. 2003
UCSC has released alignments of the Nov. 2003
chimpanzee
draft assembly to the July 2003 human
assembly in the Genome Browser. These alignments may be
viewed on the
Human July 2003
assembly. This release coincides with today's
announcement
by the National Human Genome Research Institute (NHGRI) of the
first draft assembly of the chimpanzee genome.
The set of human/chimpanzee alignments consists of a
reciprocal best-in-genome net track and a
chimp chain track.
These alignments were generated using
the blastz program developed at Pennsylvania State
University and the programs blat, axtChain, chainNet, and netSyntenic
developed at UCSC by Jim Kent.
Research scientists should find these tracks useful for locating
orthologous regions and studying genome rearrangement in the
two species.
For more information about the alignment tracks, refer to
the track description pages. The tables may be downloaded from the
Genome Browser FTP server's
hg16 database
directory. The chimp sequence and alignment data are downloadable
from the
hg16 human/chimp
alignments directory.
The chimp sequence used in these alignments was obtained
from the 13 Nov. 2003 Arachne assembly. We'd like to thank
NHGRI, the Eli & Edythe L. Broad Institute at MIT/Harvard,
and Washington University School of Medicine for providing
this sequence, and LaDeana Hillier, Washington University School
of Medicine, and the Whitehead Institute for their work on the
alignments. We'd also like to acknowledge the members
of the UCSC team who contributed to the release of these
alignments in the Genome Browser: Jim Kent, Kate Rosenbloom,
Heather Trumbower, and Donna Karolchik.
Posted on 24 Nov. 2003
We have released a Genome Browser and Blat server for the
latest mouse genome assembly, NCBI Build 32
(UCSC v. mm4). Build 32 is a composite
assembly in which chromosomes were assembled by two
slightly different algorithms depending on the available
mapping data. Chromosomes 2, 4, 5, 7, 11, 15, 18, 19, X,
and Y were assembled using a clone-based tiling path file
(TPF) provided by the Mouse Genome Sequencing Consortium
(MGSC), with whole genome shotgun sequence used to fill gaps
when necessary. The remaining chromosomes were assembled
using the MGSCv3 whole genome shotgun assembly as the TPF
and merging High Throughput Genomic Sequence (HTGS) as
needed. The UCSC mm4 assembly contains only the reference
strain C57BL/6J.
Build 32 includes 2.6 gigabases of sequence, 1.2 Gb of which is finished. We
estimate that 90-96 percent of the mouse genome is present
in the assembly. For more information about this version,
see the NCBI
assembly
notes and
Build 32 statistics.
The mm4 sequence and annotation data may be downloaded from
the UCSC Genome Browser
FTP
site or
downloads
page.
We'd like to thank the Deanna Church, Richa Agrawala, and
the Mouse Genome Sequencing Consortium
for this assembly. We'd also like to
acknowledge the work of the UCSC mm4 team: Hiram Clawson
(lead),
Terry Furey, Kate Rosenbloom, Heather Trumbower, Bob Kuhn
and Donna Karolchik, and our systems administrators Patrick
Gavin, Jorge Garcia and Paul Tatarsky.
Posted on 31 Oct. 2003
We have added the Drosophila melanogaster (fruitfly)
assembly to the growing collection of genomes available in
the UCSC Genome Browser and Blat servers. Release 3.1
(Jan. 2003) of the Drosophila annotated genome
sequence was provided by the
Berkeley
Drosophila Genome Project (BDGP). The 116.8 Mb euchromatic
sequence - which is virtually gap-free and of high accuracy -
contains six euchromatic chromosome arms represented by 13
scaffolds with a total of 37 sequence gaps. The sequence quality
of this release has an estimated error rate of less than one
in 100,000 base pairs in the unique portion of the sequence,
and less than one in 10,000 base pairs in the repetitive portion.
The Release 3.1 sequence was reannotated using the
Apollo
Genome Annotation and Curation Tool. We also provide data
comparing the genome of D. melanogaster with that of
D. pseudoobscura.
The fruitfly, one of the first organisms to be used in systematic
scientific investigations, has been the subject of intensive study
in genetics for nearly a century and remains a major model organism
in biomedical research, population biology and evolution.
We are pleased to add the fruitfly to the roster of assemblies
available on our site.
Downloads of the Drosophila data and annotations may be
obtained from the UCSC Genome Browser
ftp site.
We'd like to thank BDGP and the
Flybase Consortium
(Harvard University,
University of Cambridge,
Indiana University,
the University of
California, Berkeley and the European Bioinformatics Institute (EBI))
for providing the sequence, assembly, and analysis of this
genome. We'd also like to acknowledge the members of the
UCSC Genome Bioinformatics group who contributed to this
release: Angie Hinrichs (lead engineer), Heather Trumbower,
Robert Kuhn, Donna Karolchik, and Jim Kent and the system
administrators Jorge Garcia, Patrick Gavin and Paul Tatarsky.
Posted on 17 Oct. 2003
Daily and weekly incremental updates of mRNA, RefSeq,
and EST data are now in place for several of the UCSC Genome
Browser assemblies. Data sets that are updated incrementally
from GenBank include the latest human (hg16), mouse (mm3),
rat (rn3), and Fugu (fr1). Others will soon be added to the
list.
Previously, these tables were updated only when we loaded a
new genome assembly into the Genome Browser or made a major
revision to a table. By updating the data on a nightly basis,
we are able to provide researchers with the most current
versions available in GenBank. All new genome assemblies
released after this date will incorporate the incremental
update technology.
Data are updated on the following schedule:
- native and xeno mRNA and refSeq tracks - updated daily at
4:30 p.m. Pacific Time (weekdays), early Saturday morning
(weekends)
- EST data - updated once per week on Saturday morning
- downloadable data files - updated weekly on Sunday morning
- outdated sequences - removed once per quarter
Mirror sites are not required to migrate to an incremental
update process, and should not experience problems as a
result of this upgrade. Mirror site questions should be
addressed to
genome-mirror@soe.
ucsc.edu.
We'd like to acknowledge the hard work of Mark Diekhans in the
implementation of this new feature, and thank the QA and
sysadmin teams (particularly Paul Tatarsky) for their
support in this release.
Posted on 17 Oct. 2003
The UCSC Table
Browser is an excellent tool for retrieving and searching
the data underlying the Genome Browser. We've recently added
some new features to the Table Browser to make it even
easier to query and download data.
Many of our users have requested a batch query utility that
will allow them to paste in or upload a list of terms on
which to search. You can now do this by clicking the "Item
name/accession" button, then uploading a list of search
terms by selecting the "Paste in" or "Upload" option. Note
that the Paste option supports wildcards, but the Upload
option does not.
We've also added 2 new lists of searchable tables/tracks.
The Browser tracks list contains the names of annotation
tracks in the currently selected assembly as they appear in
the Genome Browser. This list is useful if you don't know
the name of the underlying database table that contains the
data in which you're interested. The Custom tracks list
contains the names of all custom annotation tables currently
loaded into the Genome Browser for the given assembly. This
includes tracks that have been created/loaded by the user as
well as custom annotations created via the Table Browser.
If you have feedback or questions about the Table
Browser, please send us email at
genome@soe.ucsc.edu.
Posted on 29 Sep. 2003
We are pleased to announce the release of the
UCSC
Gene
Family Browser. This major new addition to our website is
a useful tool for collecting information on groups of genes
that may be related in many different ways. The
Browser provides information on gene expression,
protein homology (both within and across species), GO terms,
and Pfam domains, and cross links to many other databases.
To access the Family Browser, click the link on the top
menu on this page. The first time you use the Browser, the
application will display a brief overview of the tool and
information for starting and configuring it. To read a more
detailed description of the Browser, see the Family Browser
User's
Guide.
We are always interested in hearing feedback
about the tools on our site. If you have comments or questions
about the Family Browser, please email us at
genome@soe.ucsc.edu.
Posted on 23 Sep. 2003
We have added the Takifugu rubripes (Japanese pufferfish) assembly to
the growing collection of genomes available in the UCSC Genome Browser
and Blat servers. The Fugu v.3.0
(Aug. 2002) whole genome shotgun assembly --
which is the fourth
vertebrate assembly to be added to the UCSC Genome Browser -- was
provided by
the US
DOE Joint Genome Institute (JGI) as part of the
International Fugu Genome Consortium led by JGI and the
Singapore Institute of Molecular and Cell Biology (IMCB).
This assembly was
constructed with the JGI assembler, JAZZ, from paired end
sequencing reads produced by JGI and IMCB, at JGI, Myriad Genetics, and
Celera Genomics, resulting in a sequence coverage of 5.7X.
All reads are plasmid, cosmid, or BAC end sequences, with
the predominant coverage derived from 2 Kb insert plasmids.
This assembly contains 20,379 scaffolds totaling 319 million
base pairs. The largest 679 scaffolds total 160 million base
pairs. The Fugu genome was annotated using the Ensembl
system by the Fugu informatics group at IMCB.
The Fugu, which was one of the first
vertebrate genomes to be draft-sequenced after human,
serves an important role in the exploration of the human
genome. In contrast
to other vertebrates that have been sequenced, the intergenic
and intron regions of the Fugu are highly compressed and
uncluttered with repetitive sequence, resulting in a
genome that is unusually compact in size. The Fugu genome
has proved useful in gene discovery and the identification and
characterization of gene regulatory elements in other genomes.
Bulk downloads of the Fugu sequence and annotation data are
available via FTP at
ftp://hgdownload.cse.ucsc.edu/goldenPath/fr1
or through the Downloads link on the Genome Browser home page. We recommend that FTP be used rather than
HTML for the download of large or multiple files.
We'd like to thank JGI and the other members of the
International Fugu Genome Consortium, including IMCB,
the UK Human Genome Mapping Project (Hinxton),
the Molecular Sciences Institute (Berkeley) and the
Institute for Systems Biology (Seattle),
for providing the sequence, assembly, and analysis of this
genome. We'd also like to acknowledge the members of the
UCSC Genome Bioinformatics group who contributed to this
release: Kate Rosenbloom (lead engineer), Heather Trumbower,
Robert Kuhn, Donna Karolchik, and Jim Kent.
Posted on 13 Aug. 2003
The UCSC Genome Bioinformatics group has released a browser
and blat server on the first of more than 100 targeted
genomic regions being sequenced in multiple species and
analyzed by the NIH Intramural Sequencing Center (NISC)
Comparative Sequencing Program sponsored by NHGRI. This
release coincides with the publication of the results of the
study in the 14 Aug 2003 issue of
Nature
(Thomas,
J.W. et. al. (2003) Comparative analyses of multi-species
sequences from targeted genomic regions. Nature 424:788- 793).
The browser displays sequence and annotations on a large
region
containing 10 previously identified genes - including the
gene mutated in cystic fibrosis - in 13 vertebrate species.
Organisms in the study include human, chimpanzee,
baboon, cat, dog, cow, pig, rat, mouse, chicken, zebrafish
and two species of pufferfish (Fugu and Tetraodon).
The NISC Comparative Sequencing Program data may be accessed
by clicking the Browser link on the Genome Browser home page
and then selecting the "Zoo" option from the genome list.
The research team, led by NHGRI Scientific Director Eric
D. Green, included scientists from Pennsylvania State
University, University of California Santa Cruz (UCSC),
and the University of Washington in Seattle. In the study,
the investigators systematically compared the patterns of
transposon insertions among the species' sequences. One
key result of the analysis was the confirmation of recently
proposed mammalian evolutionary
trees suggesting that primates are more closely related to
rodents than to carnivores or artiodactyls. Another
significant outcome was the discovery of a
substantial number of previously unidentified non-coding DNA
segments that are conserved across a wide range of species.
Many of these regions could be identified only through
comparisons of sequence from multiple species, demonstrating
the importance of studying the genomes of a wide range of
organisms as a means for identifying functional elements in
the human genome.
UCSC built a customized version of the browser to display the
target region for this study, allowing scientists to
interactively explore the data and predictions generated
by this project, contribute data of their own, and track
the project as data from additional species are generated.
In addition to the browser, the UCSC team also
contributed to the analytical portion of the
study. Mathieu Blanchette
identified the regions that are most highly conserved among
species. Adam Siepel performed the phylogenetic analysis of
rates of substitution. The UCSC team worked with Arian Smit
to obtain definitive evidence that rodents branched off from
the common ancestor later than carnivores and artiodactyls.
For more information on the NISC study, see the Science
Daily
press release.
Flat files of the assembled sequence and annotations may be
obtained from http://www.nisc.nih.gov/data/ or via the
Downloads link on the Genome
Browser home page.
We'd like to thank the NISC Comparative Sequencing Program
team for providing the data and comparative analysis
for this Genome Browser release. Special thanks go to
Elliott Margulies at NHGRI for serving as the main liaison
between NHGRI and UCSC, and for contributing several
annotation tracks to the browser. We'd also like to
acknowledge
the efforts of the many faculty, grad students, and staff
members of the UCSC Genome Bioinformatics group who
contributed to the research effort and browser
development for this project.
Posted on 8 Aug. 2003
The latest human genome reference sequence (NCBI Build 34,
July 2003) is now available as database hg16 in the UCSC
Genome Browser and blat server.
There are
2,843,433,602 finished sequenced bases in the ordered and
oriented portion of the assembly, which is an increase of
0.4 percent, or approximately 11 Mb, over the Build 33
assembly.
Of particular note in this release is the addition of the
pseudoautosomal regions of the Y chromosome. This sequence
was taken from the corresponding regions in the X chromosome
and is an exact duplication of that sequence.
Some sequence joins between adjacent clones in this assembly
could not be computationally
validated because the clones originated from different
haplotypes and contained polymorphisms in the overlapping
sequence, or the overlap was too small to be to be reliable.
In these instances, the sequencing center responsible for
the particular chromosome has provided data to support
the join in the form of an electronic certificate. The
Build 34 certificates may be reviewed
here.
Bulk downloads of the data are available via FTP at
ftp://hgdownload.cse.ucsc.edu/goldenPath/hg16
or through the Downloads
link. We recommend that FTP be used rather
than HTML for the download of large or multiple files.
We'd like to thank NCBI and the International Human Genome
Sequencing Consortium for
furnishing the data, and the UCSC team members who
contributed to this release: Terry Furey, Hiram Clawson,
Heather Trumbower, Mark Diekhans, Robert Baertsch,
Donna Karolchik, Jim Kent and our sysadmin team Patrick
Gavin, Jorge Garcia, and Paul Tatarsky.
Posted on 14 Jul. 2003
The UCSC Genome Bioinformatics Group has released a browser
and BLAT server on the v. 3.1 rat genome assembly from the
Rat Genome Sequencing Consortium. This
assembly (UCSC version rn3, June 2003) was produced by the
Atlas group at Baylor Human Genome Sequencing Center (HGSC).
This assembly is a minor update to the 3.0 release.
Sequence changes affect only chromosomes 7 and X. No
additional assembly releases are planned prior to the
publication of the rat genome analysis papers.
The 3.x assemblies reflect several sequence additions and
software improvements over the previous 2.x assemblies,
including the sequencing of over 1100 new BACs to cover gaps,
an improved marker set from the Medical College of Wisconsin,
a new FPC map from the BC Genome Sciences Centre, and
improved linking of bactigs. For detailed information and
statistics about the 3.x assemblies, see the Baylor HGSC
README.
Downloads of the rat sequence and annotation data are
available at
ftp://hgdownload.cse.ucsc.edu/goldenPath/rnJun2003/ or via
the Downloads link on this page. These data are made
available with
specific conditions for use.
We'd like to thank the Rat Genome Sequencing Consortium and
Baylor HGSC for providing this assembly, collaborators
from other institutions who have contributed annotations,
and Arian Smit for updating RepeatMasker for this release.
We'd also like
to acknowledge the contributions of several individuals at
UCSC, including Hiram Clawson, Heather Trumbower, Robert Kuhn,
Yontao Lu, Terry Furey, Mark Diekhans, Robert Baertsch,
Donna Karolchik, Jim Kent, and our sysadmin team Jorge
Garcia, Patrick Gavin, and Paul Tatarsky.
Posted on 24 Jun. 2003
UCSC has just released browsers and blat servers for 2 worms:
C. elegans version WS100 (May 2003) and C. briggsae version
cb25.agp8 (July 2002). The browsers are based on sequence
obtained from WormBase.
We are pleased to add the nematodes to the roster of genomes
available on our site. C. elegans is a major model organism
used for biomedical research, and is the first multicellular
animal to have a fully sequenced genome. In contrast, the
whole genome shotgun assembly of the C. briggsae genome is
estimated to have achieved 98% coverage. Draft chromosome
sequences are not available for C. briggsae, due to the lack
of dense chromosomal maps that allow assignment of
ultracontigs to chromosomal locations. As a result, all data
in the C. briggsae browser maps to chrUn.
Both worms played a significant role in the early history
of the UCSC Genome Browser. The browser code originated with
a C script that displayed a splicing diagram for a gene
prediction from C. elegans. Tracks for mRNA alignments and
for homology with C. briggsae were added, and the tool
morphed into the precursor of the Genome Browser, the
"Intronerator" (Kent, WJ and Zahler, AM (2000).
The intronerator: Exploring introns and alternative splicing
in C. elegans. Nucleic Acids Res. 28: 91-93).
Downloads of the C. elegans sequence and annotation data are
available at
ftp://hgdownload.cse.ucsc.edu/goldenPath/ceMay2003/; C. briggsae
downloads can be found at
ftp://hgdownload.cse.ucsc.edu/goldenPath/cbJul2002/.
Both genomes can also be downloaded via the Downloads link on
this page.
We'd like to thank the
Genome Sequencing Center
at Washington University in St. Louis and the
Sanger Institute
for their collaborative work in sequencing the C. elegans
and C. briggsae genomes. Many thanks to the
WormBase consortium
for making the worm sequence publicly available.
We'd also like to acknowledge several UCSC people who
contributed to this release: Hiram Clawson (browser and
annotation tracks engineering), Jim Kent (WABA and
chaining/netting), Al Zahler (WABA), Heather Trumbower (QA
and project management), and Donna Karolchik (project
management and documentation).
Posted on 23 Jun. 2003
You may notice that we've removed the Genome pulldown menu
and genome assembly information from our home page.
Genome-specific information and links, as well as genome
selection, are now available on the gateway pages for our
tools. To open up a gateway page, simply click the Browser,
Blat, or Tables link in the left sidebar.
Posted on 23 May 2003
Today we'd like to announce the release of a genome browser and BLAT
server for the SARS coronavirus TOR2 draft assembly. The browser - which is based on
sequence deposited into GenBank as of 14 April 2003 - provides seven
annotations showing gene predictions, locations of putative proteins, and
viral mRNA and protein alignments. Of particular note are the protein
structure analysis and predictions, determined by using the
Sequence Alignment and Modeling (SAM) T02 tool.
This browser marks a departure from our usual collection of vertebrate
genomes. Its inception was inspired by one of our engineers - Angie
Hinrichs - who was vacationing in New Zealand when the SARs draft assembly
was initially released. Struck by the impact of SARS in that part of the
world, she downloaded the sequence and built the initial tracks from a
terminal at an Internet cafe! The rest of the team joined in on the
grassroots effort, generating the additional annotations and SAM
T02 protein analyses and predictions. Victor Solovyev chimed in with
Fgenesv+ gene predictions from Softberry Inc. UCSC does not intend to
provide a comprehensive collection of viral genomes in the future, but
will maintain this browser as long as scientific and public interest in
SARS persists.
Downloads of the annotation data are available
at
ftp://hgdownload.cse.ucsc.edu/goldenPath/scApr2003/database or via the Downloads
link on this page.
We'd like to thank everyone who worked on this release, including
Angie Hinrichs, Robert Baertsch, Fan Hsu, Matt Schwartz, Heather Trumbower, Jim
Kent, Kevin Karplus, Donna Karolchik, Brian Raney, Hiram Clawson, Kate
Rosenbloom, Victor Solovyev, and
our extremely dedicated systems administrators Paul Tatarsky, Patrick
Gavin, and Jorge Garcia.
Posted on 21 Apr. 2003
The file that we originally used to build the agp files for the
April 2003 human release (Build 33) erroneously contained 2 contigs on
chromosome 8 that were listed twice: NT_078037, NT_008183. We've
received a corrected version and have updated the following files on our
website: contigAgp.zip, chromAgp.zip, liftAll.zip. You can obtain the
newer versions of these files from our ftp site at
ftp://hgdownload.cse.ucsc.edu/goldenPath/10april2003/bigZips/.
Posted on 14 Apr. 2003
The International Human Genome Sequencing Consortium today announced the
successful completion of the Human Genome Project. The most
significant outcome of this project is the reference sequence of
the human genome. The sequencing of
the 3 billion letters of DNA in the human genome - which many consider to
be one of the most ambitious scientific undertakings in history - was
completed 2 years ahead of schedule and at substantially less cost than
original estimates. The reference sequence will serve as a new foundation
for research in the fields of medicine and human biology.
In conjunction with this announcement, the UCSC Genome Bioinformatics
group is proud to release a genome browser and BLAT server on the
reference sequence (NCBI Build 33), along with bulk downloads of the
sequence and annotation data. The initial browser provides a preliminary
set of annotations that will be expanded in coming weeks. Bulk downloads
of the data are available via FTP at
ftp://hgdownload.cse.ucsc.edu/goldenPath/10april2003
or through the Downloads link on this page.
We
recommend that FTP be used rather than HTML for the download
of large or multiple files.
The reference sequence covers about 99 percent of the human genome's gene-containing
regions, and has been sequenced to an accuracy of 99.99 percent. The
missing portions are essentially contained in less than 400 defined gaps
that represent DNA regions with unusual structures that can't be reliably
sequenced using current technology. The average DNA letter now lies within
a stretch of approximately 27,332,000 base pairs of uninterrupted
sequence!
Chromosomal sequences for this release were assembled by the
International Human Genome Sequencing Consortium sequencing centers and verified by
NCBI and UCSC. In some cases, sequence joins between adjacent clones
could not be computationally validated, e.g. due to polymorphisms in the
overlapping sequence. In situations like this, supporting evidence for
the join has been provided by the sequencing center responsible for that
particular chromosome. The
Non-standard Join Certificates table displays this information. The
annotations on the UCSC website have been provided
by UCSC and collaborators worldwide. See the
Credits page for a detailed
list of organizations and individuals who contributed to the success of
this release.
We'd like to congratulate the many people worldwide who have worked on
the Human Genome Project for this landmark achievement. We'd also like to acknowledge the UCSC
Genome Browser project team who worked many long hours to ensure that
the genome browser and sequence data were released on time for this
announcement: David Haussler, Jim Kent, Terry Furey, Matt Schwartz, Heather Trumbower,
Angie Hinrichs, Fan Hsu,
Donna Karolchik, Jorge Garcia, Patrick Gavin, Chuck Sugnet, Yontao Lu, Mark Diekhans, Ryan Weber, Robert Baertsch, Krishna Roskin, and the many other students in the UCSC Genome Bioinformatics group.
Posted on 2 Apr. 2003
The Dec. 2001 Human assembly (hg10) and the Nov. 2001 Mouse assembly
(mm1) have been moved to the archives. They are no longer accessible
from the main browser, but instead can be found by clicking the
Archives link on this
page.
Posted on 26 Mar. 2003
We've added an updated rat assembly to our site: Rat Jan 2003 (rn2). This
corresponds to the Version 2.1 Jan 2003 Update of the rat genome
assembly, produced by the Atlas group at Baylor HGSC as part of the Rat
Genome Sequencing Consortium.
This update corrects duplications that were assembly artifacts in the
previous version and improves the linking of bactigs to create larger
"ultrabactigs". Compared with the previous rat assembly, sequence mapped
to specific chromosomal coordinates is reduced by about 1.6 percent.
Loosely mapped and unmapped sequence is reduced by 17 percent. For more
details and statistics on the Jan 2003 assembly, see the Baylor HGSC
README for this release.
UCSC has released a Genome Browser and BLAT server for this assembly
update. The initial
browser contains 16 annotation tracks, with more to follow in coming
weeks. Sequence downloads are currently available at
ftp://hgdownload.cse.ucsc.edu/goldenPath/rnJan2003/ or via the Downloads
link on this page. A complete set of database downloads will be available at the beginning
of next week. This data contains
specific
conditions for use.
Thanks to the Atlas group at Baylor HGSC, the Rat Genome Sequencing Consortium,
the UCSC Genome Bioinformatics group, and contributors worldwide for
making this release available.
Posted on 13 Mar. 2003
We're happy to announce an update to the mouse genome sequence. This new
version (Mouse Feb. 2003) includes 705 megabases of finished sequence,
compared to 96 megabases of finished sequence in the previous assembly.
Many people in the Mouse Genome Sequencing Consortium contributed to
this update. The Sanger Institute in particular contributed a large
amount of finished sequence. Richa Agarwala, Deanna Church, and
coworkers at NCBI layered the finished clones on top of the Arachne whole genome shotgun assembly. Arian Smit constructed a new RepeatMasker
library.
UCSC has released a Genome Browser and BLAT server for the Feb. 2003
Mouse genome. The initial
browser contains 14 annotation tracks, with more to follow in coming
weeks. Sequence downloads are currently available at
ftp://hgdownload.cse.ucsc.edu/goldenPath/mmFeb2003/ or via the Downloads
link on this page. Database downloads will be available at the beginning
of next week.
Thanks to everybody at UCSC and around the world that contributed to the
success of this release!
Posted on 5 Feb. 2003
We're proud to announce the release of version 17 of the UCSC Genome
Browser.
This version contains powerful new features, numerous improvements to
the annotation track display, additional annotation
tracks, and a number of bug fixes. In this release cycle, we've also
introduced
an enhanced QA process that formalizes our testing and verification
of the Genome Browser software and the data displayed in the browser.
New functionality in v.17:
-- Numerous enhancements to the table browser that allow the user to
conduct more complex and specific searches. New features include support
for intersections of tracks, a new summary statistics output format, and
the ability to output query results as a custom annotation track that
can be viewed in the Genome Browser.
The new
Table Browser User's Guide contains a detailed
description of the new features and provides a wealth of information and
examples for conducting various types of searches on the database tables.
-- Two new display modes available for most annotation tracks: pack and
squish modes. In pack mode display, annotation track features are f
ully
displayed, but more than one feature may be displayed on the same line.
This greatly reduces the amount of display space needed by a track
when a user wishes to view a large number of individual features at one
time. Squish mode is similar to pack mode, but displays features
at 50% height and without labels. This mode is particularly
useful for viewing tracks in which a large number of features align to
the same section of a chromosome, e.g. EST tracks.
-- Functional groupings of annotation track controls. This
makes it much easier to find a particular item in the track control list
and gives a better visual overview
of the annotations available in a particular category, e.g. comparative
genomics tracks or gene prediction tracks.
-- A mechanism for saving the annotation tracks image in postscript or
PDF format. This much requested feature enables Genome Browser users to print an image at
high resolution, edit it with a drawing program, or display it in a
postscript or PDF viewer.
-- A collection of custom annotation tracks supplied by
Genome Browser users and members of the UCSC Genome Bioinformatics lab.
Additional contributions to this collection are welcome! Contact
genome@soe.ucsc.edu
if you have an annotation you'd like to share.
New annotation tracks in v.17:
-- a new Known Genes track (Human Builds 30 & 31, Feb 2002 Mouse,
Nov 2002 Rat) that shows known
protein coding genes based on proteins from SWISS-PROT, TrEMBL, and
TrEMBL-NEW and their corresponding mRNAs from Genbank. Features within
the track are color-coded according to origin and review status.
-- a new Superfamily track (Human Build 30, Feb 2002 Mouse) that shows proteins having
homologs with known structures or functions. Each entry in the track
shows the coding region of a gene (based on Ensembl gene prediction).
The feature label consists of the names of all known protein domains coded
by the gene, and usually contains structural and/or function
descriptions that provide valuable information for getting a quick grasp
of the biological significance for the gene.
We have also released several additional annotation
tracks on the latest human and rat assemblies in the past
month.
Bug fixes in v.17:
-- Approximately 40 bugs (mostly minor problems) have been fixed in
this version.
To take full advantage of the new display features in this release, we
recommend that you reset your browser to the new default settings. NOTE:
you may not want to reset your browser if you have customized settings
that you wish to preserve. You can reset your browser by clicking
the "Click here to reset" link on the Browser Gateway page.
We hope this new release facilitates your work with the UCSC Genome Browser.
If you have any questions or comments about the new release, send email
to genome@soe.ucsc.edu.
Posted on 23 Dec. 2002
We're pleased to announce the release of the latest human genome
assembly, Build 31 (UCSC version hg13). This assembly was produced at NCBI
based on sequence information submitted into GenBank as of Nov. 14, 2002.
Release notes for this assembly are available from the
NCBI web site. Because UCSC now obtains its assembly directly from
NCBI, the UCSC Build 31 data is identical to that of NCBI and Ensembl.
Build 31 is an excellent high-quality assembly that shows a remarkable
amount of progress toward the milestone of finishing
the human genome. Greater than 95% of the euchromatic region of the
genome is now complete, with more than 90% of the sequence in a
finished state. The number of clone contig gaps has decreased by one
third from the previous assembly, and the overall number of sequenced
contigs has been reduced by one half. Seven chromosomes are considered to
be in a finished state: 6, 7, 13, 20, 21, 22, and Y.
The initial release of the Build 31 Genome Browser contains 25 annotation
tracks, with several more to follow in the upcoming weeks. Bulk
downloads of the data are available from our FTP site at
ftp://hgdownload.cse.ucsc.edu/goldenPath/14nov2002 or via the
Downloads link on this page.
UCSC has generated a set of high-level comparisons of the Build 31
draft sequence against various types of information (STS maps, BAC end
pairs, and clone overlaps). This information, as well as statistics for
Build 31, is accessible from the
Chromosome Reports,
Genome Map Plots, and
Summary Statistics
links in the "Technical Information about the Assembled Sequence"
section below.
We'd like to thank NCBI as well as all the people who collaborated on the
data and annotations for this release.
Posted on 6 Dec. 2002
We're pleased to announce the release of a UCSC Genome Browser on the
Nov. 2002 rat assembly produced by the Baylor College of Medicine Rat
Genome Sequencing Center and the Rat Genome Sequencing Consortium.
The sequence was assembled using a hybrid approach that combines the
clone by clone and whole genome shotgun methods. A new software program -
ATLAS - was developed for this effort. The assembly process resulted in
a 6.5-fold coverage of the rat genome, which is estimated to be
approximately 2.8 Gigabases in size.
Downloads of the rat data and annotations are available through our
ftp site at ftp://genome-archive.cse.ucsc.edu/goldenPath/rnNov2002
or via the archived downloads
link. This data contains
specific
conditions for use. The sequence is also available from the
Rat
Genome Project website for the Human Genome Sequencing Center at
Baylor College of Medicine or from GenBank.
We'd like to thank the Baylor team and the Rat Genome Sequencing
Consortium for their collaboration on this project. See the
Credits page
for a complete list of acknowledgments.. For more
information on the rat genome, the assembly process, and the
Rat Genome Sequencing Consortium, refer to the website for the
Human Genome Sequencing Center at Baylor College of Medicine.
Posted on 5 Dec. 2002
The International Mouse Genome Sequencing Consortium has announced
the publication of a high-quality draft sequence of the mouse genome,
together with a comparative analysis of the mouse and human genomes.
The results from this analysis can be found in the Mouse Genome
Browser on this website. The paper appears in the Dec. 5 issue of the
journal Nature at
http://www.nature.com/nature/mousegenome/.
The co-author list includes several members of the UCSC Genome
Bioinformatics Group: CBSE Director David Haussler, Research
Scientist Jim Kent and research team members
Robert Baertsch, Mark Diekhans, Terrence Furey, Angie Hinrichs, Fan Hsu,
Donna Karolchik, Krishna Roskin, Matt Schwartz, Charles Sugnet and Ryan
Weber.
Posted on 29 Oct. 2002
We've added several new directories of downloadable data to the 28 June
2002 human genome assembly. These directories contain mouse/human
alignments of the June 2002 human assembly vs. the Feb. 2002 mouse assembly. You can access these
directories from our archived downloads
link or or via our ftp site at ftp://genome-archive.cse.ucsc.edu/goldenPath/28jun2002/vsMm2/.
Within the main directory vsMm2 are 3 subdirectories that contain all
the alignments (axtAll), alignments filtered to provide only the best alignment
for any given region of the human genome (axtBest), and a relatively stringent
subset of the axtBest alignments (axtTight). For more information
about the format of the alignment files and the methods used to generate
the alignments, consult the README.txt file in the vsMm2 directory.
Posted on 18 Oct. 2002
We've rolled out a new version of the Genome Browser - v.16. In
addition to several bug fixes, this release contains some interesting
new features.
The Table Browser has undergone major enhancements. Users
can now restrict their queries by specifying a value or range for any of
the fields in a table, and by selecting which fields should be displayed in
the output. The Table Browser also provides the ability to do a free-form SQL
query on a table and supports several new output formats.
We've extended the capabilities of the DNA retrieval functionality
in the Genome Browser and the Table Browser. The new
mechanism offers the user several options for configuring the amount and
type of sequence region that is retrieved, and options for formatting the
sequence output. The retrieval options vary based on the type of table
selected.
The Genome Browser's gene prediction tracks now offer a Comparative
Sequence link in addition to the predicted protein, mRNA sequence, and
genomic sequence links. The Comparative Sequence feature displays
annotated codons and translated protein for the region in alignment to
another species.
We've recently added a few new tracks/tables to the hg12 and mm2 Browsers. On the
latest human assembly (hg12), we now have Chimp Blat and Chimp BAC tracks
provided by Ingo Ebersberger, Joshua Bacher, and Svante Pääbo at the Max
Planck Institute for Evolutionary Anthropology. Jim Kent at UCSC has generated a
new annotation - Gene Boundaries - that shows the boundaries of genes
and the direction of transcription as deduced from clustering spliced
ESTs and mRNAs against the genome. Daryl Thomas of UCSC has also added SNP tracks to both
hg12 and the latest mouse assembly (mm2), based on data from the SNP
Consortium and NIH.
We encourage you to experiment with these new features. Comments,
questions, and suggestions are always welcome at
genome@soe.ucsc.edu.
Posted on 17 Oct. 2002
We have found errors with the RepeatMasker track on the
Feb. '02 mouse assembly (mm2). This problem affects the RepeatMasker track
and RepeatMasked DNA obtained via the browser's DNA links. It does
not affect data downloaded from the browser's downloads page or ftp
site. We have replaced the erroneous data set with a corrected version.
If you have questions about how this change may affect your project, email
genome@soe.ucsc.edu.
Posted on 4 Oct. 2002
Please note that the coordinate range for a portion of the Build 30 chr22 in
the UCSC Genome
Browser differs from that of NCBI and the Sanger Centre. The latter version
contains a 100K bp gap
inserted into chr22 just after the centromere. This
modification was made after UCSC released the Build 30 assembly in the
Genome Browser. All of the chr22 annotations displayed by the UCSC
browser are correctly positioned relative to one another. However the
coordinates of all features past the centromere will be 100K less than those
of NCBI and Sanger. The Ensembl Genome Browser is consistent with the
UCSC browser.
Posted on 19 Sep. 2002
We're pleased to announce the release of the Mouse Cons (Human/Mouse
Evolutionary Conservation Score) annotation track for the June 2002
human genome assembly. This track allows a user to interactively explore
conservation between the human and mouse genomes and identify highly conserved
regions. Highest levels of conservation are typically seen over coding exons.
High levels of conservation are also frequently associated with noncoding RNA,
promoters, other regulatory elements, and pseudogenes.
The Mouse Cons annotation is displayed using a new type of Genome
Browser graphical track that plots a continuous function along a chromosome. The
conservation levels are calculated over 50bp windows in the human genome
that have at least 15 bp aligned to mouse. The score for a window reflects
the probability that the level of observed conservation in that 50bp region
would occur by chance under neutral evolution. This information is given on
a logarithmic scale and displays in the track as "mountain ranges". Details
pages associated with the individual peaks in the track provide access to
the base level alignments for the whole region and for the individual 50bp
windows.
The team that produced this track includes Ryan Weber, Krishna Roskin,
Mark Diekhans, Jim Kent, Scott Schwartz, and Webb Miller.
Posted on 12 Sep. 2002
Nature Genetics has just published User's Guide to the Human Genome, a
hands-on tutorial for using genome browsers as web tools for browsing and
analyzing data from the Human Genome Project and other sequencing efforts. The
3 browsers featured in the tutorial include the UCSC Genome Browser, NCBI's
Map Viewer, and the Ensembl Genome Browser. The guide is organized around a
collection of step-by-step solutions to 13 typical research questions,
and serves as a nice supplement to the documentation materials
available on the UCSC Genome Browser web site. The guide is accessible as a
link off the Nature Genetics home page at
http://www.nature.com.ng.
Posted on 12 Sep. 2002
We've corrected a problem with the Human June 2002 (hg12) cytoBand annotation
track that affected chromosome Y. The clones on this chromosome were
erroneously pushed from the q-arm onto the p-arm, creating some confusion. The
currently available version of the cytoBand data on this website contains this
correction.
Posted on 15 Aug. 2002
We've just released several new annotation tracks/tables for the human genome.
On the June 2002 assembly, we've added Gene Bounds,
UniGene, CpG Islands, Nonhuman mRNA & EST, SNPs, NC160, and GNF Ratio.
On the April 2002 assembly we've added a Fgenesh++ Genes track, and have also
updated the RepeatMasker track. The Dec. 2001 release now includes a Sanger 22
track.
On the mouse genome, we have 2 new tracks for the Feb. 2002 assembly:
TIGR Gene Index and RNA Genes.
Posted on 6 Aug. 2002
We have fixed an error with six of the chr3 contigs in the bigZips/contigAgp.zip
file. The following .agp files were corrected: NT_005684.agp, NT_005663.agp,
NT_022554.agp, NT_022459.agp, NT_006031.agp, and NT_022419.agp. The
chr3.agp file in bigZips/chromAgp.zip was also modified. This change does not
affect the .gl files, the .fa files, the
lift files, or the annotations. Alignments made on the previous version
of chr3 are still good. Updated versions of the contigAgp.zip and
chromAgp.zip files were
uploaded to our site today. You can download the new versions via ftp at
ftp://genome-archive.cse.ucsc.edu/goldenPath/28jun2002 or via the
archived
downloads link.
Posted on 2 Aug. 2002
The problems with the June 2002 Build 30 (hg12) RepeatMasker track have been
resolved. The new RepeatMasker track, along with regenerated Fish Blat
and Genscan tracks, are now available in the Browser and
through our
Downloads link. We've also added a few new annotation tracks for the
June 2002 release, and will be adding more over the next 2 weeks.
The latest Genome Browser has 2 new features. We've added filter
functionality to the Table Browser, accessible via the Filter Fields
button on the Table Browser main page. Also, some of the Dec. 2001 human genome tracks (eg. RefSeq Genes) now have a
Comparative Sequence link from the details page that shows annotated
codons and translated protein with alignment to the mouse genome.
Posted on 27 Jul. 2002
We've experienced some RepeatMasker problems on Build 30 and are
rerunning it. This will directly affect the RepeatMasker track and the
masking of the fasta files. The Fish Blat and Genscan tracks may also
change slightly once we're redone this. The EST, mRNA, and RefSeq tracks
should not be affected. We will also post a new RepeatMasker track for
Build 29 (see news item below) as soon as the Build 30 tracks are
completed. We apologize for any rework this may cause.
Posted on 26 Jul. 2002
Bulk downloads of the June 2002 Build 30 human genome assembly (hg12) are now
available from
ftp://genome-archive.cse.ucsc.edu/goldenPath/28jun2002. You can also
access the data via the archived downloads link. This initial
release of the annotation database download contains a limited set of
tables. Additional files will be available for download next week.
Posted on 24 Jul. 2002
The BLAT server and the coordinates conversion feature for human genome
assembly Build 30 (hg12) are now functional.
Posted on 23 Jul. 2002
We're pleased to announce the pre-release of a browser for
human genome assembly Build 30 from NCBI (UCSC version hg12). This assembly was
produced at NCBI based on sequence information submitted into GenBank as of
June 28, 2002. Build 30 release notes and statistics will soon be
available from the NCBI web
site.
Build 30 is an excellent high-quality assembly. It contains nearly 87%
finished sequence, and 94%-97% coverage. The sequence coverage of this build
is much higher than in previous releases, and there is a high level of
correspondence between the sequence and the map. Currently, the human genome
project appears to be on track to achieve the goal of finishing at least 95%
of the human genome (using Bermuda standards) by April 2003.
UCSC has generated a set of high-level comparisons of the Build 30 draft
sequence against various types of information (STS maps, BAC end pairs, and
clone overlaps), accessible from the Chromosome Reports and Genome
Map Plots links in the "Technical Information about the Assembled
Sequence" section below.
A Blat server for Build 30 is not yet available, but should be
accessible from this site later this week. Data for the mitochondrial genome
and several more annotation tracks will be posted for this release as they
become available. Bulk downloads of the hg12 data should be available from
this site in a few days.
Posted on 9 Jul. 2002
We've found some problems with the repeat-masking of the Build 29 (hg11)
human sequence. We're in the process of replacing the RepeatMasker
track, but do not plan to redo the other tracks due to the imminent
release of Build 30. Because of this, we advise that you do not use the
cross-species tracks for statistical purposes.
Posted on 1 Jul. 2002
The UCSC Genome Bioinformatics home page is sporting an updated interface to
accommodate the growing number of organisms supported by the UCSC Genome
Browser, BLAT, and Table Browser. The list of assembly versions accessible
through each of these tools can now be found on the tool's Gateway page. To
reach the Gateway page, choose an organism from the dropdown list on the left
sidebar of this page, then click the Browser, BLAT, or Tables link. New
organisms will be added to the list in the months ahead.
The UCSC site continues to provide a variety of bulk downloads of a genome
assemblies and annotations. The list of downloadable data has been removed
from the home page, but is readily available through the Downloads link on
the left sidebar. The downloads list can also be accessed directly at
http://hgdownload.cse.ucsc.edu/downloads.html or through
our ftp site at
ftp://hgdownload.cse.ucsc.edu/goldenPath/.
Several new annotation tracks have been added to our site in the past
couple weeks. The Feb. 2002 mouse assembly now has tracks for BAC End pairs,
Fgenesh++ gene predictions, and AltGenie gene predictions based on Affymetrix's
Genie gene-finding software. New to the Apr. 2002 human assembly is the
GenMapDB Clones track, which shows placements of BAC clones from the GenMapDB
database based on BAC end sequencing information and confirmed using STS
markers by Vivian Cheung's lab at U. Penn. We've also changed the Known Genes
track name to RefSeq Genes in all assemblies.
This release also includes an updated
User's Guide
and more detailed documentation on creating & using
custom annotation tracks.
Posted on 24 May 2002
Bulk downloads of the April 2002 hg11 human genome assembly (NCBI Build 29)
are now available from ftp://genome-archive.cse.ucsc.edu/goldenPath/05apr2002. You
can also access the data from the Downloads link in the left sidebar.
Posted on 22 May 2002
We've just released a browser and BLAT server on
the latest Build 29 human genome assembly from NCBI (UCSC
version hg11). This assembly is based
on sequence information submitted into GenBank as of Apr. 5 2002. As with the
Dec. 2001 (hg10) release, this assembly was produced at
NCBI rather than at UCSC. Consult NCBI's
Build 29 release notes and
statistics for more information about this release.
This assembly contains nearly 75% finished sequence.
Currently, the human genome project appears to be on track to achieve the goal
of finishing at least 95% of the human genome (using Bermuda standards) by April 2003.
Although the NCBI human genome assembly has been steadily
improving over the past year, mapping problems still exist in
the current release. Most are small, relatively local rearrangements.
Larger scale problems include a rearrangement in the p-arm of Chr16 and
several discrepancies in Chr17. Researchers - especially positional
cloners - are strongly encouraged to use the tools provided
(comparison plots, chromosome reports) to evaluate the
accuracy of the assembly in specific regions of interest.
Bulk downloads of the hg11 data should be available from this site
in approximately one week. New annotation tracks will be posted as soon
as they become available.
Posted on 24 Apr. 2002
Bulk downloads of the February 2002 mouse genome assembly are now available
from ftp://genome-archive.cse.ucsc.edu/goldenPath/mmFeb2002. You can
also access the data from the archived downloads link.
Posted on 19 Apr. 2002
The February 2002 mouse genome assembly is now available in the browser and
for BLAT searching. This assembly was produced at the Whitehead Institute
using their Arachne software. We'd like to thank them and the Mouse Genome
Sequencing Consortium for providing this assembly, which has
specific conditions for use.
Bulk downloads of the data should be available in approximately one week.
Coordination with mouse genome data access at
Ensembl and
NCBI is
in progress. We'd also like to acknowledge the UCSC team that produced this
release: Jim Kent, Terry Furey, Matt Schwartz, Fan Hsu, Yontao Lu,
Donna Karolchik, Chuck Sugnet, and Ryan Weber.
Posted on 9 Apr. 2002
Bulk downloads of the November 2001 mouse genome assembly are now available
from ftp://genome-archive.cse.ucsc.edu/goldenPath/mmNov2001.
You can also access the data from the archived downloads link.
Posted on 2 Apr. 2002
An updated version of the UCSC Genome Browser (v.11) is now available.
Along with the v.11 browser, we've released several new annotation tracks on
the latest human and mouse assemblies. The new Human Dec. 2001 tracks include:
Mouse Synteny, Ensembl, Genscan, CpG Islands, Mouse Blat, Fish Blat,
Unigene/SAGE, NCI60 Microarray, GNF Affymetrix Microarray, Rosetta
Microarray, and SNPs. An STS Markers track has been added to the Mouse
Nov. 2001 browser.
Posted on 14 Mar. 2002
The November 2001 mouse genome assembly is now available
for viewing in the browser and for BLAT searching. This assembly
was produced at the Sanger Center using the Phusion software developed
by Jim Mullikin and Zemin Ning, and was tied to the
mouse
fingerprint map by Tim Hubbard. We'd like to
thank them and the Mouse Genome Sequencing Consortium for providing this
assembly, which has specific
conditions for use. Bulk downloads of the data
will be available in approximately one week. Coordination with
mouse genome data access at
Ensembl and
NCBI
is in progress. We'd also like to acknowledge the UCSC team that produced
this release: Jim Kent, Terry Furey, Matt Schwartz, Fan Hsu, Yontao Lu,
and Donna Karolchik.
Posted on 16 Feb. 2002
A new assembly based on sequence submitted as of Dec. 22 in Genbank (Build 28) is
now available in the browser and for BLAT search.
This assembly was produced at NCBI rather than UCSC, primarily by
Richa Agarwala, Greg Schuler, and Paul Kitts. The NCBI assembly has
been steadily improving over the past year. Currently it shows slightly
better local order and orientation compared to the UCSC assembly on the
same sequence, but somewhat worse tracking of the chromosome level maps.
The NCBI assembly has the advantage that it can be generated significantly
faster than the UCSC assembly. With the human genome sequencing now
in the end game - over two thirds of the human clones are now finished -
we feel it more productive to focus worldwide annotation efforts on a
single assembly rather than continue producing competing assemblies.
We're working with NCBI to improve their map tracking.
Posted on 4 Feb. 2002
Chromosome Reports detailing correspondence with STS map, overlap,
and BAC end sequence information are available under the "Technical
Information About the Assembled Sequence" section below. This also
gives information about the clone map on which the assembled sequence
is based.
Posted on 18 Dec. 2001
There are some major enhancements to the browser. The complete user interface settings
including track controls, labels, and position are now saved from session to session.
You can configure the browser once to your liking and it will stay that way. This feature
will only work if cookies are enabled in your browser. If you
want to restore the default settings use the reset all button under the main graphic.
Also under the main graphic are new controls that move just the start or just the end
of the genome window. These are useful for getting exactly the right view without having
to do arithmetic on the position. These controls by default will move two guideline units
at a time, but you can specify other increments. There's a new page associated with each
track. This page is accessible by clicking on the mini-buttons to the left of the track
in the main graphic, or by clicking on the new hyperlink associated with the track in the
track controls section under the graphic. These pages contain a description of the track
and in many cases new controls. The mRNA and EST associated controls let you color or
filter the display according to tissue, author, organism, and so forth.
As with any new enhancement there are likely to be a few new bugs too.
Many of these have been spotted and fixed already. Please let us know
if you find a problem that persists more than a day or two. It's always helpful to include
the freeze and genomic position with a problem report.
Posted on 30 Nov. 2001
There is now a link from the known genes details page to the Jackson Lab's MGI Mouse Ortholog
when the ortholog is known. Thanks to Carol Bult for her help setting up this link.
Posted on 29 Nov. 2001
A duplications track is now available in the August browser. This track
shows duplicate blocks of sequence larger than 1000 bases. The track is hidden
by default. To open it look for 'Duplications' in the third row of track controls
under the main graphic window, and change the setting to 'dense'. Thanks to Evan Eichler
and Jeff Bailey for this track.
Posted on 28 Nov. 2001
Sanger curated gene annotations are now available on chromosome 20. Thanks to
Jennifer Ashurst, James Gilbert, and all the annotators at the Sanger Institute.
Posted on 27 Nov. 2001
A new track has been added to the August freeze browser showing
haplotype blocks derived from common SNPs on Chromosome 21 by Perlegen,
as described in "Common
High-Resolution Haplotypes." Patil, N. et. al. Science 294:1719-1723 (2001).
Posted on 19 Nov. 2001
The SNP and Mouse Blat tracks are now available
for August. The Mouse Blat track uses a partial assembly of
the public whole genome shotgun data courtesy of Whitehead's
Arachne program.
Posted on 8 Nov. 2001
The detail web pages for each of the tracks have been updated to reflect
the overall look and design of this site. You will now see the familiar
blue navigation bar with links to the Browser, BLAT, Downloads, and the
FAQ page from each of the track detail pages.
Posted on 8 Nov. 2001
The STS Markers track has been updated on the April and August browsers to
now include much more information on the detail page including links to
UniSTS and details on the alignments of the markers to the draft sequence.
In addition, all known aliases of the markers can be entered in the
"position" window, and the corresponding merker will be found and
displayed if its location has been determined.
Posted on 8 Nov. 2001
A new FISH Clones track has been added to the April and August browsers.
Previously, this information has been included in the STS Markers track.
Now, this has been broken out into a separate track with additional
information provided on the detail page not previously shown.
Posted on 6 Nov. 2001
The fgenesh++ gene prediction and the cross-species
mRNA tracks are now available in the August browser.
Posted on 31 Oct. 2001
The 'DNA' button at the top of the browser has been significantly
upgraded. By default it now returns DNA that has repeating elements
in lower case and other DNA in upper case. There is also an option to
color the DNA output with various tracks. You can have the case and font
features such as underline, bold, and italic follow tracks too.
Posted on 29 Oct. 2001
There is now a TIGR Gene Index track
in the April 2001 freeze browser. The TIGR Gene Index is based
on alignments of assembled ESTs from a number of species.
Be sure to click into the track and follow the outside links to
the TIGR site, which contains a wealth of information on the genes.
Posted on 29 Oct. 2001
The Acembly track on the August 2001 freeze
has been updated to include predictions based on human ESTs
and Genbank mRNAs as well as RefSeq human mRNAs. Protein
predictions are now also available in the details page for this
track. The outside link for this track is also very informative.
Posted on 27 Oct. 2001
You can now share your custom tracks with the
community. The easiest way to do this is to construct a link from your own
web pages to the browser. Here is an example of a URL for
such a link:
http://genome.ucsc.edu/cgi-bin/hgTracks?
position=chr22:1-20000&db=hg8&
hgt.customText=http://genome.ucsc.edu/test.bed
The position variable tells the browser which part of the genome to
display. The db variable refers to the freeze number.
'hg8' corresponds to the August 2001 freeze. The customText
variable should refer to a URL containing plain text in one of the
formats described in
http://genome.ucsc.edu/goldenPath/help/customTrack.html.
Note that generally we only keep the last three versions of the
genome online (hg6, hg7, and hg8). You'll have to update
your link and track about every 4 months as a result.
Please send in the URLs of tracks you'd like to share to
genome@soe.ucsc.edu,
along with a brief description of
the track and the genome version it is tied to. We'll create
an index page of these here.
You can also access an external custom track by including
the URL of the track data (on a separate line starting with
http://) in the custom track box at the bottom of the browser gateway.
Posted on 15 Oct. 2001
Fresh tab-delimited files from the browser database
are now available. They will now be updated automatically
every Sunday evening. The table browser queries the database
directly, so it is always up to the minute.
Posted on 12 Oct. 2001
Several new sets of gene predictions came in this week.
We now have fgenesh++ predictions for the April freeze,
and Genscan and Acembly predictions for the August freeze.
Posted on 11 Oct. 2001
Why struggle with massive genomic file downloads when the UCSC
Table Browser
lets you select exactly the track data desired via a convenient web
interface? Major new improvements by Krish Roskin have empowered and
simplified this feature available now for the three most recent assemblies.
Posted on 8 Oct. 2001
The October 2000 assembly has been moved to the
Archives
to make room for the August assembly.
Posted on 5 Oct. 2001
A revised August 2001 freeze assembly is now up. The problems
with flipped contigs of finished clones and high levels of
sequence duplication are fixed. You can now download this
assembly in bulk as
well as browse it. Chromosome by chromosome and annotation
database files will be following over the next day or two.
Posted on 2 Oct. 2001
You can now convert
coordinates between different versions of the draft
using a new program, hgCoordConv, by Chuck Sugnet.
hgCoordConv attempts to cut out sequences of the original
draft and align them to the new draft. When aligning the
sequences to the new draft hgCoordConv makes sure that the
sequences are in the same order, orientation, and have the
correct distances between them.
Chuck has also implemented a SAGE/Unigene track in the browser.
This track displays data from the
SAGEMap project at NCBI . UniGene cluster sequences are
displayed in the browser and colored according to their average SAGE
counts over a series of experiments. Selecting one of the UniGene
representative sequences displays the SAGE results for UniGene
sequences.
Posted on 21 Sep. 2001:
Some systematic problems were found in the clone order on the
preliminary August 2001 freeze assembly. The sequencing center at
Washington University, EBI's Ensembl group, and our group here at UCSC
are currently working together to revise the merged BAC clone maps and
the assembly process to fix these problems. We hope to update the August
browser with a revised assembly soon. Then, after further testing via
the browser, we will release the assembled August freeze genome sequence
itself.
Posted on 11 Sep. 2001
A preliminary assembly of the August 2001 freeze is now
available in the genome browser. Due to significant progress
by the mapping and finishing groups of the international
public consortium, this assembly is a major improvement over
the April 2001 freeze assembly. Imre Vastrik, Ewan Birney and
colleagues at Ensembl have computed a merge of BAC clone maps
provided by the individual sequencing centers with
fingerprint-based maps prepared at Washington University. These
merged maps were used for the first time in this August
assembly.
The August assembly has successfully passed our internal
quality control tests. We will release the sequence and
annotations in bulk downloadable form in a week or so, after
the external testers have had a chance to further verify
it. Meanwhile if you notice any systematic problems please let
us know at
genome@soe.ucsc.edu.
Though the state of the working
draft has improved considerably, remember that where you see
solid marks in the 'gap' track, the relative order and
orientation of flanking contigs is still uncertain. In some
cases of complex repeat structure it is also possible that
the assembly may be incorrect even in the absence of gaps.
Also, sometimes ambiguities in the data cause a BAC clone to be
split, with parts of it placed at opposite ends of a run of other clones.
Localized errors of this type should be corrected by additional
finishing efforts at the individual sequencing centers and
should not be reported to UCSC. However, please report any
large-scale or systematic problems you detect with this assembly
that could have been caused by our data processing.
The tracks available on the August 2001 browser are quite
limited at the moment. More tracks will show up over time.
Archived on 9/11/01:
The April 2001 assembly is now the default for the browser.
The SNP and Ensembl gene tracks have come in for this version.
There is also a new track depicting non-human vertebrate mRNA
alignments.
The Sept. 2000 and July 2000 versions of the genome are
now only available on our archive site. Please see the link
in the blue box to the left for more details.
The August 6 freeze is progressing through the pipeline.
We've recently received an updated accession map from Wash U.
Ensembl will shortly be integrating this with chromosome
specific maps from the sequencing centers. We are still
on track for an early September next release.
Archived on 8/28/01:
Meanwhile we've been continuing work on the genome browser.
It's now possible to upload your own annotations to be displayed
alongside the built-in tracks. Please scroll to the bottom of the
browser gateway pages for further information. The browser has
also been sped up, particularly on the larger chromosomes by
using a 'binning' technique suggested by Lincoln Stein and Richard Durbin.
Archived on 8/23/01:
Tracks continue to be added to the
April 2001 browser.
Our old friend the Exofish track is back. The blat mouse homology
track is now up as well, computed at somewhat more sensitive settings
than it was in the
December 2000 browser.
We've recently received some significant funding from NHGRI to
maintain and extend this site. This has allowed us among other things
to hire an artist, Jenny Draper, who is responsible for the new look.
|