CAP3 is a sequence assembly program for small-scale assembly of EST sequences with or without quality values. PCAP is for large-scale assembly of genomic sequences with quality values and with or without forward-reverse read pairs. CAP3 & PCAP were developed at Iowa State University (CAP3/PCAP website).
PCAP can handle a genome of 300 Mb (requires ~ 5GB memory and 22 GB disk space) on Helix, and a genome of 3 Gb on the Biowulf cluster. Any genome assembly project larger than 300 Mb should be performed using PCAP on the Biowulf cluster. Typically, assembling N base pairs will require 15N memory and 75N disk space. Please contact the Helix staff (staff@helix.nih.gov) if you have any questions about disk space, memory, or where to run the program.
For a detailed description of the assembly protocol, see Generating a Genome Assembly with PCAP, by X. Huang and S-P Yang, in Current Protocols of Bioinformatics (2005). (available online through the NIH library).
Version
Type '/usr/local/cap3/cap3' or '/usr/local/pcap/pcap' with no parameters. The version date will be displayed on the terminal, along with a brief description of usage.
Usage
CAP3
Usage: ./cap3 File_of_reads [options] File_of_reads is a file of DNA reads in FASTA format If the file of reads is named 'xyz', then the file of quality values must be named 'xyz.qual', and the file of constraints named 'xyz.con'. Options (default values): -a N specify band expansion size N > 10 (20) -b N specify base quality cutoff for differences N > 15 (20) -c N specify base quality cutoff for clipping N > 5 (12) -d N specify max qscore sum at differences N > 20 (200) -e N specify clearance between no. of diff N > 10 (30) -f N specify max gap length in any overlap N > 1 (20) -g N specify gap penalty factor N > 0 (6) -h N specify max overhang percent length N > 2 (20) -i N specify segment pair score cutoff N > 20 (40) -j N specify chain score cutoff N > 30 (80) -k N specify end clipping flag N >= 0 (1) -m N specify match score factor N > 0 (2) -n N specify mismatch score factor N < 0 (-5) -o N specify overlap length cutoff > 15 (40) -p N specify overlap percent identity cutoff N > 65 (90) -r N specify reverse orientation value N >= 0 (1) -s N specify overlap similarity score cutoff N > 250 (900) -t N specify max number of word matches N > 30 (300) -u N specify min number of constraints for correction N > 0 (3) -v N specify min number of constraints for linking N > 0 (2) -w N specify file name for clipping information (none) -x N specify prefix string for output file names (cap) -y N specify clipping range N > 5 (100) -z N specify min no. of good reads at clip pos N > 0 (3)PCAP
The 'autopcap' script will run a sequence of PCAP programs with default parameters. Usage: ./pcap File_of_file_names [options] File_of_file_names is a file of names of read files If File_of_file_names is named 'xyz', then the file of constraints must be named 'xyz.con'. Options (default values): -a N specify band expansion size N > 10 (15) -c N specify base quality cutoff for clipping N > 5 (10) -e N specify segment pair score cutoff N > 30 (40) -f N specify chain score cutoff N > 60 (80) -g N specify gap penalty factor N > 0 (6) -i N specify max length of a read end to clip N > 50 (400) -j N specify max sum of quality values to clip N > 1000 (3500) -k N specify max sum of qv outside similarity N > 100 (400) -l N specify min depth of coverage for repeats N > 20 (75) -m N specify match score factor N > 0 (2) -n N specify mismatch score factor N < 0 (-5) -o N specify overlap length cutoff > 20 (30) -r N specify directory name for base/quality files (null) Note: If base/quality files are in the current directory, then the -r option must not appear on the command line. -s N specify overlap similarity score cutoff N > 100 (1000) -t N specify number of segment pairs cutoff N > 10 (150) -w N specify number of words cutoff N > 20 (500) -x N specify prefix string for output file names (pcap) -y N specify number of processors N > 0 (1) -z N specify processor id N >= 0 (0)
Sample session with the PCAP example data
helix% ls fofn others.fasta.screen.gz plasmid.fasta.screen.gz fofn.con others.fasta.screen.qual.gz plasmid.fasta.screen.qual.gz helix% cat fofn plasmid.fasta.screen others.fasta.screen helix% /usr/local/pcap/autopcap fofn -y 2 Stringent qual diff score cutoff: -d 130 Min depth of coverage for repeats: -l 75 Amount of available memory in GB: -m 1 Running pcap jobs in parallel: -p 1 Adjusted overlap score cutoff: -s 4500 Overlap percent identity cutoff: -t 92 Number of pcap jobs: -y 2 ProcessOverlaps: lowid 0 and highid 1035 Number of bdocs jobs must be set to 1 ProcessOverlaps: Space allocated ProcessOverlaps: depth of overlaps ReadLenAndNameSpace is done NameQualCalClip is done ProcessOverlaps is done ReadConstraints is done The autopcap job is completed. helix% ls contigs.bases fofn.pcap.contigs1.gz fofn.pcap.scaffold.new1 contigs.quals fofn.pcap.contigs1.links fofn.pcap.scaffold0 fofn fofn.pcap.contigs1.qual fofn.pcap.scaffold0.ace fofn.con fofn.pcap.contigs1.snp fofn.pcap.scaffold1 fofn.con.pcap.results fofn.pcap.docs.info0 fofn.pcap.scaffold1.ace fofn.con.pcap.results.bpair.info fofn.pcap.docs0.gz fofn.pcap.singleton0.ace fofn.con.pcap.sort fofn.pcap.goodoverlap0 fofn.pcap.singleton1.ace fofn.con.pcap.sort.stat fofn.pcap.goodoverlap1 fofn.pcap.singlets fofn.pcap.bform.info fofn.pcap.info0 fofn.pcap.super0 fofn.pcap.cap3out0 fofn.pcap.info1 fofn.pcap.super1 fofn.pcap.cap3out1 fofn.pcap.joins1 fofn.pcap.unused0 fofn.pcap.clean.info fofn.pcap.joins2 fofn.pcap.unused1 fofn.pcap.clustersize fofn.pcap.multiple0 others.fasta.screen.gz fofn.pcap.consen.info0 fofn.pcap.multiple1 others.fasta.screen.qual.gz fofn.pcap.consen.info1 fofn.pcap.n50 plasmid.fasta.screen.gz fofn.pcap.consen.pros0 fofn.pcap.overlap0.gz plasmid.fasta.screen.qual.gz fofn.pcap.consen.pros1 fofn.pcap.overlap1.gz readpairs.contigs fofn.pcap.contigs0.gz fofn.pcap.repeat0.gz readpairs.reads fofn.pcap.contigs0.links fofn.pcap.repeat1.gz reads.placed fofn.pcap.contigs0.qual fofn.pcap.scaffold.info reads.unplaced fofn.pcap.contigs0.snp fofn.pcap.scaffold.new0 supercontigs
Documentation
- Generating a Genome Assembly with PCAP, by X. Huang and S-P Yang, in Current Protocols of Bioinformatics (2005). (available online through the NIH library). This article has a a detailed description of the process of assembling a genome with PCAP.
- See /usr/local/pcap/doc and /usr/local/cap3/doc for brief documentation. (type 'more /usr/local/pcap/doc')
- Type the name of any program in the PCAP and CAP3 suite with no parameters to get a brief description of the program usage and options.