Scientific Supercomputing at the NIH

STADEN

Staden is a package with sequence handling and analysis softwares.

Main Programs (more programs not listed):

Version

1.7.0

Sample Sessions

First login to helix. Replace 'user' in the following example with user's helix ID. Create a directory called /home/user/pregap_intro

% mkdir /home/user/pregap_intro
% cd /home/user/pregap_intro

Obtaining an initial set of ABI sequencer data:

% cp /usr/local/staden/course/data/ABI_Data/* .

Obtaining copies of the vector sequences for screening the reading:

For this exercise, the required cloning vector (lorist2) sequence file lorist2.vector is in the same directory as your ABI data. The sequencing vector is m13mp18 and is already held in the Staden Package installation directory. The cloning site used is SmaI.

Using pregap4 to prepare a set of ABI sequencer files for entry into a sequencing project database:

Make sure X-windows application is started. Make sure your current directory is /home/user/pregap_intro

% /usr/local/staden/linux-x86_64-bin/pregap4

A Pregap4 window pops up:

Click tab Files to Process, Add files, change to pregap4_intro directory, select Any as the file type. The browser should show all the files listed in this directory. We are to process all of the binary sample files which contain the raw trace data alone with sequence and other information.

Select all the Sample XXX files by holding down the control key and clicking on the five files, XXX is a three digit number. Press OK

Click on 'Run' button. A Trev window will popup:

Trev is a grphical tool that allows you to:

When trev is run as a pregap4 module, only the second two functions should be considered. By default, sequence editing is disabled in this context. For more info, see section 1.7 of the documentation.

Click File, save. A bunch of new files will be created by pregap4.

Customizing the modules of Pregap4

Select Modules, Add/Remove Modules, user can change modules freely. After finish, click File, Save module list. A new file called pregap4.config will be created.

Select file, exit to quit pregap4

Copy sample files from /usr/local/staden/course/data/phred_data/ into a new directory. In the following example, a couple of hundred ZTR files and a lorist6.vector file were copied into /home/user/exercise.

% cd /home/user/exercise
% pregap4

In the pregap4 window, Add files, ZTR, select all files by clicking any file then Ctrl-A, then press OK

Click Configure Modules tab

Disable Estimate Base Accuracies

Disable Trace Format Conversion

Enable Initialise Experiment Files

Enable Augment Experiment Files, click Experiment File Line Types, do this:

Click OK and save

Enable Quality Clip

Enable Sequencing Vector Clip

Enable Screen for Unclipped Vector

Enable Cloning Vector Clip, specify lorist6.vector as the vector file name

Enable Gap4 shotgun assembly, type in testdb as Gap4 database name, tick Create new database option. click on any other module name. This will change edit to ok next to the Gap4 shotgun assembly module.

Disable everything else.

Under File, Load Naming Scheme, click Browse, select sanger_names_old.p4t, ok, ok.

Click File, Save All Parameters (in all modules)

Quit pregap4

Type pregap4 -nowin *.ztr on unix window.

When finished, a gap4 database is created. These are files named testdb.0*

GAP4

Following example above, under /home/user/exercise, type /usr/local/staden/bin/gap4

Open database by File, Open, testdb.0.aux, ok

Contig Selector window will open

Select Edit, Contig Editor, OK, following window show:

Editing the consensus sequence

There are two types of editing action available, replace and insert. When you start the contig editor it will be in replace mode. You can toggle between modes by clicking (left mouse button) on the box that is labelled Insert. The contig editor will allow you to edit anything in any way. See section 2.6.4 in the documentation.

Finding problems and editing them

The places in your contig that will most probably require editing are where the consensus sequence is undetermined. Click Next Search, problem, forward, click search button. User will find * characters with very occasional -.

By altering the value in the box labelled Qual in the contig editor, user control a simple display showing the quality of bases, although there are better ways of showing this as you will see later. Increasing the Qual value will gradually turn base red (when they have a confidence value lower than the Qual value).

Checking the trace data

User can get gap4 to automatically display the traces which would best be used for erifying and solving problems. Click settings, trace display, auto-display traces. From now on the Search button will display up to three traces when searching for problems:

For more info, see documentation

Editing with confidence

Setup gap4 for use with confidence values. In GAP4 main window, options, consensus algorithm, ok:

Now to see what the phred case-calls look like, edit, edit contig, ok,setting, show reading quality and show consensus quality and highlight desagreement and by background color:

To list the error rates:

In contig editor, commands, List Confidence, accept default and click Apply. the editor information line (at the bottom of the window) now contains something like the following:

Expected no. of errors between 1 and 9569 is 3.87. Error rate = 1/2471

The main gap4 output window should show:

Documentation

http://staden.sourceforge.net/

course_unix.pdf