Definitions for LLNL's Shotgun Sequencing



"Finished" Sequence:

    The finished sequence from a given clone (e.g. cosmid, BAC, PAC or P1) submitted to the database is completely contiguous, with all ambiguities resolved, and >95% of the sequence on both strands represented. We require a minimum coverage of 3 high quality reads, ideally from different sub-clones, with one of these reads on the opposite strand (e.g. 2 forward, 1 reverse strand). Any parts of the clone which are not physically double-stranded in this manner are sequenced with dye terminator chemistry to resolve putative compressions. Occasionally, the end of the insert from a clone may be missing, but the clone is submitted to the database "as is" if there is sequence overlap detected with the adjacent clone in the region.


    All sequences submitted to the database from Livermore have been verified by at least three separate restriction digests (EcoRI, BglII, and Eco/Bgl). Any potential weak joins in the assembly process are verified by PCR.


    Annotation has been completed as described below.

Here are our definitions of the stages to get to "FINISHED" SEQUENCE:


Project / Region:

    A project is defined as a cosmid, BAC, PAC, or P1 clone that has been submitted for shotgun sequencing. Multiple "projects" selected as a "minimal tiling path" (with minimal overlap) make up a region. Projects within a given region can be of any clone type, and are carefully selected from the available clones in that region based on restriction map analysis.

Random Phase:
    Only projects (e.g. cosmid, BAC, PAC or P1 clone) for which roughly 300-800 random reads generated from M13 sub-clones (for a cosmid-sized project) have been sequenced and assembled (we are using phred and phrap) are included in this category. This corresponds to the "shotgun" phase for many other sequencing centers. Our random reads are generated form M13 sub-clones, using Thermosequenase and Energy Transfer (ET) dye primers. Once the generation of random reads is complete, the project moves to the "gap closure " phase.

Gap Closure Phase:
    This phase contains projects for which >95% of the insert is represented in 2-5 contigs (depending on size of original clone), and which are assigned to a finisher for gap closure. Sequencing of reverse reads from the ends of contigs, use of walking primers and/or PCR are actively performed in this phase.

Ambiguity Resolution Phase:
    This phase contains projects in which the insert of the starting clone is contiguous, but require further double-stranding or resolution of GC compressions and/or poor quality regions. Most finishing reads are generated from M13 sub-clones or PCR templates using Taq Dye Terminator chemistry. Projects are currently edited using consed (D. Gordon, University of Washington).

Analysis and Annotation Phase:
    To complete the sequencing process, error-checking is performed to identify potential problems missed by the finisher. Then for all clones:


    Gene models are constructed from the XGRAIL and BLAST results, translated, and checked against the nr database. All clones have been submitted to GSDB (NCGR), where annotations are viewable with their "Annotator" tool, or to NCBI/Genbank.


BBRP home page
LLNL Disclaimer
Web page maintained by BBRP Webmaster (BBRPWebmaster@humpty.llnl.gov).
UCRL-MI 117208-95

Go to LLNL Home Page LLNL disclaimers

Last modified March 21, 1997.