Displaying Your Own Annotations in the Genome Browser
|
|
|
The Genome Browser provides dozens of aligned annotation tracks
that have been computed at UCSC or have been provided by
outside collaborators. In addition to these standard tracks, it is
also possible for users to upload their own annotations for
temporary display in the Genome Browser. These custom annotation
tracks are viewable only on the machine from which they were uploaded and
are kept only for 8 hours after the last time they were accessed. Users
can optionally make custom annotations viewable by others as well.
Genome Browser annotation tracks are based on files in line-oriented format.
Each line in the file defines a display characteristic for the annotation
track or defines a data item within the track.
Annotation files contain 3 types of lines: browser lines, track lines,
and data lines. Empty lines and lines starting with # in the
annotation file are ignored.
To construct an annotation file and display it in the Genome
Browser, follow these steps:
Step 1. Format the data set
Formulate your data set as a tab-separated file using one of the formats
supported by the Genome Browser. Annotation data can be in
standard GFF format or in a format designed specifically
for the Human Genome Project or UCSC Genome Browser, including
GTF, PSL, BED,
WIG, or MAF.
GFF and GTF files must be tab-delimited rather than
space-delimited to display correctly. You may include more than one data
set in your annotation file. However, all of the data lines for a given
annotation track must be in the same format.
Step 2. Define the Genome Browser display characteristics
Add one or more optional browser lines to the
beginning of your formatted data file to configure the overall
display of the Genome Browser when it initially displays your annotation
data. Browser lines allow you to configure such things
as the genome position that the Genome Browser will initially open to,
the width of the display, and the configuration of the other
annotation tracks that are shown (or hidden) in the initial display.
NOTE: If the browser position is not explicitly set in the annotation
file, the initial display will default to the position setting most
recently used by the user, which may not be an
appropriate position for viewing the annotation track.
Step 3. Define the annotation track display characteristics
Following the browser lines - and immediately preceding the formatted
data - add a track line to
define the display attributes for your annotation data set. Track lines
allow you to define annotation track characteristics
such as the name, description, colors, initial display mode, use score,
etc. If you have included more than one data set in your annotation
file, insert a track line at the beginning of each new set of data.
Example:
Here is an example of an annotation file that defines 2 separate
annotation tracks in BED format. The first track displays blue one-base
tick marks every 10000 bases at the beginning of chr 22.
The second track displays red 100-base features alternating with
blank space in the same region of chr 22.
browser position chr22:1-20001
track name=spacer description="My blue ticks" color=0,0,255
chr22 0 1
chr22 10000 10001
chr22 20000 20001
track name=even description="My red 100 bases tracks" color=255,0,0
chr22 0 100 first
chr22 200 300 second
chr22 400 500 third
Example:
This example shows an annotation file containing one data set in BED
format. The track displays paired features with a thick end and thin
end, and hatch marks indicating the direction of transcription. The track
labels display in green (0,128,0), and the gray level of the each feature
reflects the score value of that line. NOTE: The track name line in this example
has been split over 2 lines for documentation purposes.
If you paste this example into the Genome Browser, you must remove the line
break to display the track successfully.
Click
here for a copy of this example that can be pasted into the browser without editing.
browser position chr22:1000-10000
browser hide all
track name=pairedReads description="Clone Paired Reads" visibility=2
color=0,128,0 useScore=1
chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512
chr22 2000 6000 cloneB 200 - 2000 6000 0 2 433,399, 0,3601
Step 4. View your annotation track in the Genome Browser
To view your annotation data in the Genome Browser, open the
Genome Browser home page (http://genome.ucsc.edu/) and click the Genome
Browser link in the top menu bar. On the Genome Browser Gateway page that
displays, select the genome and assembly on which your
annotation data is based, then click the Add Your Own Tracks button. Upload
your annotation file by entering the name of your file in the Annotation File
box or by pasting the contents of your file into the large edit
box. Click the Submit button to
display the Genome Browser track window with your annotation. If you
encounter difficulties in displaying your annotation, read the section
Troubleshooting Annotation Display Problems.
To upload a custom annotation track from another machine or web site, paste the
URL of the track into the large edit box. Custom tracks can be displayed in
conjunction with ordinary BLAT tracks.
Step 5. (Optional) Add details pages for individual track features
After you've constructed your track and have successfully displayed it in the
Genome Browser, you may wish to customize the details pages for individual track
features. The Genome Browser automatically creates a default details page for each
feature in the track containing the feature's name, position information, and a
link to the corresponding DNA sequence. To view the
details page for a feature in your custom annotation track (in full, pack, or
squish display
mode), click on the item's label in the annotation track window.
You can add a link from a details page to an external web page containing
additional information about the feature by using the track line url attribute.
In the annotation file, set the url attribute in the track line to point
to a publicly available page on a web server. The url attribute
substitutes each occurrence of '$$' in the URL string with the name defined by the
name attribute. You can take advantage of this feature to provide
individualized information for each feature in your track by creating HTML anchors
that correspond to the feature names in your web page.
Example:
Here is an example of a file in which the url attribute has been set to
point to the file http://genome.ucsc.edu/goldenPath/help/clones.html. The '#$$'
appended to the end of the file name in the example points to the HTML NAME tag
within the file that matches the name of the feature (cloneA, cloneB, etc.).
NOTE: The track line in this example has been split over 2 lines for
documentation purposes. If you paste this example into
the browser, you must remove the line break to display the track successfully.
Click
here for a copy of this example that can be pasted into browser without editing.
browser position chr22:1000-10000
browser hide all
track name=clones description="Clones" visibility=2
color=0,128,0 useScore=1
url="http://genome.ucsc.edu/goldenPath/help/clones.html#$$"
chr22 1000 5000 cloneA 960
chr22 2000 6000 cloneB 200
chr22 5000 9000 cloneC 700
chr22 6000 10000 cloneD 600
chr22 11000 15000 cloneE 300
chr22 13000 17000 cloneF 100
Step 6. (Optional) Share your annotation track with others
The previous steps showed you how to upload annotation data for your own use
on your own machine. However, many users would like to
share their annotation data with members of their research group on
different machines or with colleagues at other sites. To learn how
to make your Genome Browser annotation track viewable by others, read
the section Sharing Your Annotation Track with
Others.
| |
|
|
|
|
Browser lines configure the overall display of the Genome Browser
window when your annotation file is uploaded. Each line defines one display
attribute. Browser lines consist of the format:
browser attribute_name attribute_value(s)
For example, if the browser line browser
position chr22:1-20000 is included in the annotation file, the Genome
Browser window will initially display the first 20000 bases of chr 22.
The following browser line attribute name/value options are available:
- position <position> - Determines the part of the genome
that the Genome Browser will initially open to, in chromosome:start-end format.
- pix <width> - Sets the Genome Browser window to the
specified width in pixels.
- hide all - Hides all annotation tracks except for custom ones.
- hide <track_name(s)> - Hides the listed tracks.
Tracks must be referenced by their symbolic names. Multiple track names should be space-separated.
- dense all - Displays all tracks in dense mode.
- dense <track_name(s)> - Displays the specified tracks in
dense mode. Symbolic names must be used. Multiple track names should be
space-separated.
- pack all - Displays all tracks in pack mode.
- pack <track_name(s)> - Displays the specified tracks in
pack mode. Symbolic names must be used. Multiple track names should be
space-separated.
- squish all - Displays all tracks in squish mode.
- squish <track_name(s)> - Displays the specified tracks in
squish mode. Symbolic names must be used. Multiple track names should be
space-separated.
- full all - Displays all tracks in full mode.
- full <track_name(s)> - Displays the specified tracks in
full mode. Symbolic names must be used. Multiple track names should be
space-separated.
Note that the Genome Browser will open to the range defined in the
Gateway page position box or the position saved as the default unless the browser
line position attribute is defined in the annotation file. Although this
attribute is optional, it's recommended that you set this value in your
annotation file to ensure that the track will appear in the display range when
it is uploaded into the Genome Browser.
| |
|
|
|
|
Track lines define the display attributes for all lines in an
annotation data set. If more than one data set is included in the
annotation file, each group of data must be preceded by a track line
that describes the display characteristics for that set of data.
A track line begins with the word track, followed by one or
more attribute=value pairs. Unlike browser lines - in which each
attribute is defined on a separate line - all of the track attributes for a
given set of data are listed on one line with no line
breaks. The inadvertent insertion of a line break into a track line
will generate an error when you attempt to upload the annotation track
into the Genome Browser.
The following track line attribute=value pairs are defined in the
Genome Browser:
- name=<track_label> - Defines the track label
that will be displayed to the left of the track in the Genome
Browser window, and also the label of the track control at the
bottom of the screen. The name can consist of up to 15
characters, and must be enclosed in quotes if the text contains
spaces. The default value is User Track.
- description=<center_label> - Defines the center label of the track
in the Genome Browser window. The description can consist of up
to 60 characters, and must be enclosed in quotes if the text contains
spaces. The default
value is User Supplied Track.
- visibility=<display_mode> - Defines the
initial display mode of the annotation track. Values for
display_mode include: 0 - hide, 1 - dense, 2 - full, 3 - pack,
and 4 - squish. The default is 1.
- color=<RRR,GGG,BBB> - Defines the main color
for the annotation track. The track color consists of three comma-separated
RGB values from 0-255. The default value is 0,0,0 (black).
- altColor=<RRR,GGG,BBB> - Defines the
secondary color for the track. The alternate color consists of
three comma-separated RGB values from 0-255. The default is a
lighter shade of whatever the color attribute is set to.
- useScore=<use_score> - If this attribute is
present and is set to 1, the score field in each of the track's data
lines will be used to determine the level of shading in which
the data is displayed. The track will display in shades of gray unless
the color attribute is set to 100,50,0 (shades of brown) or
0,60,120 (shades of blue). The default setting for useScore is 0.
- priority=<priority> - Defines the display
position of the track relative to other tracks in the Genome
Browser window.
- offset=<offset> - Defines a number to be added
to all coordinates in the annotation track. The default is 0.
- url=<external_url> - Defines a URL for an
external link associated with this track. This URL will be used
in the details page for the track. Any '$$' in this string this will be substituted
with the item name. There is no default for this attribute.
| |
|
|
|
|
BED format provides a flexible way to define the data lines that
are displayed in an annotation track. BED lines have three required fields and
nine additional optional fields. The number of fields per line must be consistent
throughout any single set of data in an annotation track.
The first three required BED fields are:
- chrom - The name of the chromosome (e.g. chr3, chrY,
chr2_random) or contig (e.g. ctgY1).
- chromStart - The starting position of the feature in the
chromosome or contig. The first base in a chromosome is numbered 0.
- chromEnd - The ending position of the feature in the
chromosome or contig. The chromEnd base is not included in the
display of the feature. For example, the first 100 bases of a
chromosome are defined as chromStart=0, chromEnd=100, and span
the bases numbered 0-99.
The 9 additional optional BED fields are:
- name - Defines the name of the BED line. This label is
displayed to the left of the BED line in the Genome Browser
window when the track is open to full display mode or directly to the
left of the item in pack mode.
- score - A score between 0 and 1000. If the track line
useScore attribute is set to 1 for this annotation data set, the
score value will determine the level of gray in which
this feature is displayed (higher numbers = darker gray).
- strand - Defines the strand - either '+' or '-'.
- thickStart - The starting position at which the feature
is drawn thickly (for example, the start codon in gene
displays).
- thickEnd - The ending position at which the feature is
drawn thickly (for example, the stop codon in gene displays).
- reserved - This should always be set to zero.
- blockCount - The number of blocks (exons) in the
BED line.
- blockSizes - A comma-separated list of the block
sizes. The number of items in this list should correspond to
blockCount.
- blockStarts - A comma-separated list of block starts.
All of the blockStart positions should be calculated relative to
chromStart. The number of items in
this list should correspond to blockCount.
Example:
Here's an example of an annotation track that uses a complete BED definition:
track name=pairedReads description="Clone Paired Reads" useScore=1
chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512
chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601
| |
|
|
|
|
PSL lines represent alignments, and are typically taken from files generated by BLAT
or psLayout. See the
BLAT
documentation for more details. All of the following fields are
required on each data line within a PSL file:
- matches - Number of bases that match that aren't repeats
- misMatches - Number of bases that don't match
- repMatches - Number of bases that match but are part of repeats
- nCount - Number of 'N' bases
- qNumInsert - Number of inserts in query
- qBaseInsert - Number of bases inserted in query
- tNumInsert - Number of inserts in target
- tBaseInsert - Number of bases inserted in target
- strand - '+' or '-' for query strand. In mouse, second '+'or '-' is for genomic strand
- qName - Query sequence name
- qSize - Query sequence size
- qStart - Alignment start position in query
- qEnd - Alignment end position in query
- tName - Target sequence name
- tSize - Target sequence size
- tStart - Alignment start position in target
- tEnd - Alignment end position in target
- blockCount - Number of blocks in the alignment
- blockSizes - Comma-separated list of sizes of each block
- qStarts - Comma-separated list of starting positions of each block in query
- tStarts - Comma-separated list of starting positions of each block in target
Example:
Here is an example of an annotation track in PSL format. Note that line
breaks have been inserted into the PSL lines in this example for
documentation display purposes.
Click
here for a copy of this example that can be pasted into the browser without editing.
track name=fishBlats description="Fish BLAT" useScore=1
59 9 0 0 1 823 1 96 +- FS_CONTIG_48080_1 1955 171 1062 chr22
47748585 13073589 13073753 2 48,20, 171,1042, 34674832,34674976,
59 7 0 0 1 55 1 55 +- FS_CONTIG_26780_1 2825 2456 2577 chr22
47748585 13073626 13073747 2 21,45, 2456,2532, 34674838,34674914,
59 7 0 0 1 55 1 55 -+ FS_CONTIG_26780_1 2825 2455 2676 chr22
47748585 13073727 13073848 2 45,21, 249,349, 13073727,13073827,
Be aware that the coordinates for a negative strand in a PSL
line are handled in a special way. In the
qStart and qEnd fields, the coordinates indicate
the position where the query matches from
the point of view of the forward strand, even when the match is on the reverse strand.
However, in the qStarts list, the coordinates are reversed.
Example:
Here is a 30-mer containing 2 blocks that align on the minus strand and
2 blocks that align on the plus strand (this sometimes can happen in response to
assembly errors):
0 1 2 3 tens position in query |
0123456789012345678901234567890 ones position in query |
++++ +++++ plus strand alignment on query |
-------- ---------- minus strand alignment on query |
|
|
Plus strand: |
qStart=12 |
qEnd=31 |
blockSizes=4,5 |
qStarts=12,26 |
|
Minus strand: |
qStart=4 |
qEnd=26 |
blockSizes=10,8 |
qStarts=5,19 |
Essentially, the minus strand blockSizes and qStarts are
what you would get if you reverse-complemented the query.
However, the qStart and qEnd are not reversed. To convert one to the other:
qStart = qSize - revQEnd
qEnd = qSize - revQStart
| |
|
|
|
|
GFF (General Feature Format) lines are based on the GFF standard file format. GFF
lines have nine required fields that must be tab-separated. If the fields are
separated by spaces instead of tabs, the track will not display correctly. For more
information on GFF format, refer to http://www.sanger.ac.uk/Software/formats/GFF.
Here is a brief description of the GFF fields:
- seqname - The name of the sequence. Must be a chromosome or a contig.
- source - The program that generated this feature.
- feature - The name of this type of feature. Some examples of
standard feature types are "CDS", "start_codon", "stop_codon", and
"exon".
- start - The starting position of the feature in the sequence.
The first base is numbered 1.
- end - The ending position of the feature (inclusive).
- score - A score between 0 and 1000. If the track line
useScore attribute is set to 1 for this annotation data set, the
score value will determine the level of gray in which
this feature is displayed (higher numbers = darker gray). If there is no
score value, enter ".".
- strand - Valid entries include '+', '-', or '.' (for don't know/don't care).
- frame - If the feature is a coding exon, frame should
be a number between 0-2 that represents the reading frame of the
first base. If the feature is not a coding exon, the value should be '.'.
- group - All lines with the same group are linked together into a single item.
Example:
Here's an example of a GFF-based track.
Click
here for a copy of this example that can be pasted into the browser without editing.
NOTE: Paste operations on some operating systems will replace tabs with spaces, which
will result in an error when the GFF track is uploaded. You can circumvent this
problem by pasting the URL of the above example
(http://genome.ucsc.edu/goldenPath/help/regulatory.txt) instead of the text itself
into the custom annotation track text box.
track name=regulatory description="TeleGene(tm) Regulatory Regions"
chr22 TeleGene enhancer 1000000 1001000 500 + . touch1
chr22 TeleGene promoter 1010000 1010100 900 + . touch1
chr22 TeleGene promoter 1020000 1020000 800 - . touch2
| |
|
|
|
|
GTF (Gene Transfer Format) is a refinement to GFF that tightens the specification.
The first eight GTF fields are the same as GFF. The group field has been
expanded into an attribute field that includes a list of
semicolon-separated attribute/value pairs. For more information on this format,
see http://genes.cs.wustl.edu/GTF2.html.
Some examples of entries for the attribute field include:
- gene_id value - A globally unique identifier for the genomic
source of the sequence.
- transcript_id value - A globally unique identifier for the
predicted transcript.
Example:
Here is an example of the ninth field in a GTF data line:
gene_id Em:U62317.C22.6.mRNA; transcript_id Em:U62317.C22.6.mRNA;
exon_number 1
The Genome Browser groups together GTF lines that have the same
transcript_id value. It only looks at EXON and CDS
type features.
| |
|
|
Sharing Your Annotation Track with Others
|
|
|
To make your Genome Browser annotation track viewable by people on
other machines or at other sites, follow the steps below. (Note that
some of the URL examples in this section have been broken up into 2
lines for documentation display purposes).
Step 1.
Put your formatted annotation file on your web site. Be sure that
the file permissions allow it to be read by others.
Step 2.
Construct a URL that will link this annotation file to the Genome
Browser. The URL must contain 3 pieces of information specific
to your annotation data:
- The genome freeze on which your annotation data is based. This
information is of the form db=database_name, where
database_name is to the UCSC code for the genome freeze. For a list
of these codes, see the Genome Browser FAQ.
Examples of this include: db=hg16 (Human July 2003 freeze), db=mm3
(Mouse Feb 2003 freeze).
- The genome position that the Genome Browser should initially
open to. This information is of the form
position=chr_position, where chr_position is a
chromosome number, with or without a set of
coordinates. Examples of this include: position=chr22, position=chr22:15916196-31832390.
- The URL of the annotation file on your web site. This information is
of the form hgt.customText=URL, where
URL points to the annotation file on your website. An example of an
annotation file URL is http://genome.ucsc.edu/goldenPath/help/test.bed.
Combine the above pieces of information into a URL of the following
format (the information specific to your annotation file is highlighted):
http://genome.ucsc.edu/cgi-bin/hgTracks?db=database_name& position=chr_position&hgt.customText=URL.
Example:
The following URL will open up the Genome Browser
window to display chr 22 of the July 2003 freeze of the
human genome (hg16), and will display the annotation track
pointed to by the URL
http://genome.ucsc.edu/goldenPath/help/test.bed:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg16&position=chr22& hgt.customText=http://genome.ucsc.edu/goldenPath/help/test.bed
Step 3. Provide the URL to others. To upload a custom annotation
pointed to by a URL into the Genome Browser, paste the URL into the large text edit
box in the Add Your Own Tracks section of the Genome Browser Gateway page, then click
the Submit button.
If you'd like to share your annotation track with a broader
audience, send the URL for your track - along with a description of
the format, methods, and data used - to the UCSC Genome mailing list
genome@soe.ucsc.edu.
| |
|
|
Troubleshooting Annotation Display Problems
|
|
|
Occasionally users encounter problems when uploading
annotation files to the Genome Browser. In most cases, these problems
are caused by errors in the format of the annotation file and
can be tracked down using the information displayed in the error
message. This section contains suggestions for resolving common
display problems. If you are still unable to successfully
display your annotation data after reading this section, contact
the genome mailing list at
genome@soe.ucsc.edu
for further assistance.
Problem: When I try one of your examples by cutting and pasting
it into the Genome Browser, I get an error message.
Solution: Check that none of the browser lines, track lines, or data
lines in your annotation file contains a line break. If the example contains GFF
or GTF data lines, check that all the fields are tab-separated rather than
space-separated.
Problem: When I click the Submit button, I get
the error message "line 1 of custom input:".
Solution: Check that none of the browser lines,
track lines, or data lines in your annotation file contains a line break. A
common source for this problem is the track line: all of the attribute
pairs must on the same line and must not be separated by
a line break. If you are uploading your annotation file by pasting it
into the text box on the Genome Browser Gateway page, check that the cut and paste operation
did not inadvertently insert unwanted line feeds into the longer lines.
Problem: When I click the Submit button, I get the
error message "line # of custom input: missing = in var/val pair".
Solution: Check for incorrect syntax in the track
lines in the annotation file. Be sure that each track line
attribute pair consists of the format attribute=attribute name.
Problem: When I click the Submit button, I get the
error message "line # of custom input: BED chromStarts[i] must be in ascending order".
Solution: This is most likely caused by a logical conflict in the
Genome Browse software. It accepts custom GFF tracks that have multiple "exons" at
the same position, but not BED tracks. Because the browser translates GFF tracks
to BED format before storing the custom track data, GFF tracks with multiple
exons will case an error when the BED is read back in. To work around this
problem, remove duplicate lines in the GFF track.
Problem: When I click the Submit button, the Genome Browser
track window displays OK, but my track isn't visible.
Solution: Check the browser and track lines in your annotation
file to make sure that you haven't accidentally set
the display mode for the track to hide. If you are using the
Annotation File box on the Genome Browser Gateway page to
upload the track, check that you've
entered the correct file name. If neither of these are the cause
of the problem, try resetting the Genome Browser to clear any
settings that may be preventing the annotation to display. To
reset the Genome Browser, click on "Click here to reset" on the Gateway
page. If the annotation track still doesn't display, you may
need to clear the cookies in your Internet browser as well
(refer to your Internet browser's documentation for further information).
Problem: I've gotten my annotation track to display,
but now I can't make it go away! How do I remove an annotation
track from my Genome Browser display?
Solution: Reset the Genome Browser by clicking the
"reset all" button on the Genome Browser tracks window or by clicking on
"Click here to reset" on the Gateway page. This should reset
your Genome Browser display to its default settings.
| |
|
|
|