This is summary of the executables contained in this directory. They are from the Jim Kent library jksrc454.zip (Nov 2003). aNotB - List symbols that are in a but not b usage: aNotB aFile bFile outFile A symbol in this context is the first word in a line options: -xxx=XXX addCols - Sum columns in a text file. usage: addCols XXX Aladdin - not nearly as good as Genie, but still capable of telling exon from intron some of the time. usage: aladdin in.fa outFile ali2alx - produces an index file for each chromosome into an ali file. usage: ali2alx in.ali alxDir aliGlue - tell where a cDNA is located quickly. usage: aliGlue genomeListFile cdnaListFile ignore.ooc 5and3.pai outRoot The program will create the files outRoot.hit outRoot.glu outRoot.ok which contain the cDNA hits, gluing cDNAs, and a sign that the program ended ok respectively. ameme - find common patterns in DNA usage: ameme good=goodIn.fa [bad=badIn.fa] [numMotifs=2] [background=m1] [maxOcc=2] [motifOutput=fileName] [html=output.html] [gif=output.gif] where goodIn.fa is a multi-sequence fa file containing instances of the motif you want to find, badIn.fa is a file containing similar sequences but lacking the motif, numMotifs is the number of motifs to scan for, background is m0,m1, or m2 for various levels of Markov models, maxOcc is the maximum occurrences of the motif you expect to find in a single sequence and motifOutput is the name of afile to store just the motifs in. ave - Compute average and basic stats usage: ave file options: -col=N Which column to use. Default 1 aveCols - average together columns usage: aveCols file File may be stdin. axtAndBed - Intersect an axt with a bed file and output axt. usage: axtAndBed in.axt in.bed out.axt options: -xxx=XXX axtBest - Remove second best alignments usage: axtBest in.axt chrom out.axt options: -winSize=N - Size of window, default 10000 -minScore=N - Minimum score alignments to consider. Default 1000 -minOutSize=N - Minimum score of piece to output. Default 10 -matrix=file.mat - override default scoring matrix Alignments scoring over minScore (where each matching base counts about +100 in the default scoring scheme) are projected onto the target sequence. The score within each overlapping 1000 base window is calculated, and the best scoring alignments in each window are marked. Alignments that are never the best are thrown out. The best scoring alignment for each window is the output, chopping up alignments if necessary axtCalcMatrix - Calculate substitution matrix and make indel histogram usage: axtCalcMatrix files(s).axt axtChain - Chain together axt alignments. usage: axtChain in.axt tNibDir qNibDir out.chain options: -psl Use psl instead of axt format for input -faQ qNibDir is a fasta file with multiple sequences for query -minScore=N Minimum score for chain, default 1000 -details=fileName Output some additional chain details -linearGap=filename Read piecewise linear gap from tab delimited file sample linearGap file tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 axtDropOverlap - deletes all overlapping self alignments. usage: axtDropOverlap in.axt tSizes qSizes out.axt Where tSizes and qSizes are tab-delimited files with axtDropSelf - Drop alignments that just align same thing to itself usage: axtDropSelf in.axt out.axt options: -xxx=XXX axtFilter - Filter axt files. Output goes to standard out. usage: axtFilter file(s) options: -q=chr1,chr2 - restrict query side sequence to those named -notQ=chr1,chr2 - restrict query side sequence to those not named -notQ_random - restrict query side sequence, no *_random to be used -t=chr1,chr2 - restrict target side sequence to those named -notT=chr1,chr2 - restrict target side sequence to those not named -minScore=N - restrict to those scoring at least N -maxScore=N - restrict to those scoring less than N -qStartMin=N - restrict to those with qStart at least N -qStartMax=N - restrict to those with qStart less than N -tStartMin=N - restrict to those with tStart at least N -tStartMax=N - restrict to those with tStart less than N -strand=? -restrict strand (to + or -) axtIndex - build index of axt file usage: axtIndex in.axt out.axt.ix options: -xxx=XXX axtPretty - Convert axt to more human readable format. usage: axtPretty in.axt out.pretty options: -line=N Size of line, default 70 axtQueryCount - Count bases covered on each query sequence usage: axtQueryCount in.axt options: -xxx=XXX axtRecipBest - create file for dot plot using recip best usage: axtRecipBest chrom tSizes qSizes <.axt file(s) for target> where tSizes, qSizes is a tab-delimited file with options: -minScore=5000 (default 5000) throw out any alignment below this score axtSort - Sort axt files usage: axtSort in.axt out.axt options: -query - Sort by query position, not target axtSplitByTarget - Split a single axt file into one file per target usage: axtSplitByTarget in.axt outDir axtSwap - Swap source and query in an axt file usage: axtSwap source.axt target.sizes query.sizes dest.axt options: -xxx=XXX axtToBed - Convert axt alignments to simple bed format usage: axtToBed in.axt out.bed options: -xxx=XXX axtToMaf - Convert from axt to maf format usage: axtToMaf in.axt tSizes qSizes out.maf Where tSizes and qSizes is a file that contains the sizes of the target and query sequences. Very often this with be a chrom.sizes file options: -qPrefix=XX. - add XX. to start of query sequence name in maf -tPrefex=YY. - add YY. to start of target sequence name in maf axtToPsl - Convert axt to psl format usage: axtToPsl in.axt tSizes qSizes out.psl Where tSizes and qSizes are tab-delimited files with columns. options: -xxx=XXX bedCoverage - Analyse coverage by bed files - chromosome by chromosome and genome-wide. usage: bedCoverage database bedFile Note bed file must be sorted by chromosome -restrict=restrict.bed Restrict to parts in restrict.bed binGood - convert text format alignment file to binary format usage: binGood good.txt good.ali blastStats - gather statistics on blast files usage: blastStats blastFile(s) bwana - do batch coarse alignment of C. briggsae and C. elegans genomes. usage: bwana lots outFile file-with-list-of-fa-files bwana few outFile file1.fa file2.fa ... fileN.fa bwana frag outFile file.fa start end calc - Little command line calculator usage: calc this + that * theOther / (a + b) calcGap - calculate gap scores usage: calcGap chainFile options: -scale=N Amount to scale scores by - default 94. -maxGap=N Largest gap to look at. Default 100 -near=N How close to consider something 'almost' a indel -samples=N,M - List of points to sample catDir - concatenate files in directory to stdout. For those times when too many files for cat to handle. usage: catDir dir(s) options: -r Recurse into subdirectories -suffix=.suf This will restrict things to files ending in .suf '-wild=*.???' This will match wildcards. -nonz Prints file name of non-zero length files catUncomment - Concatenate input removing lines that start with '#' Output goes to stdout usage: catUncomment file(s) ccCp - copy a file to cluster usage: ccCp sourceFile destFile [hostList] This will copy sourceFile to destFile for all machines in hostList options: -crossMax=N (default 40) - maximum copies across switches example: ccCp h.zip /var/tmp/h.zip newHosts cdnaOff - creates sorted offset files that position cDNAs in chromosome. usage: cdnaOff good.txt outputDir\ chainFilter - Filter chain files. Output goes to standard out. usage: chainFilter file(s) options: -q=chr1,chr2 - restrict query side sequence to those named -notQ=chr1,chr2 - restrict query side sequence to those not named -t=chr1,chr2 - restrict target side sequence to those named -notT=chr1,chr2 - restrict target side sequence to those not named -id=N - only get one with ID number matching N -minScore=N - restrict to those scoring at least N -maxScore=N - restrict to those scoring less than N -qStartMin=N - restrict to those with qStart at least N -qStartMax=N - restrict to those with qStart less than N -qEndMin=N - restrict to those with qEnd at least N -qEndMax=N - restrict to those with qEnd less than N -tStartMin=N - restrict to those with tStart at least N -tStartMax=N - restrict to those with tStart less than N -tEndMin=N - restrict to those with tEnd at least N -tEndMax=N - restrict to those with tEnd less than N -strand=? -restrict strand (to + or -) -long -output in long format -zeroGap -get rid of gaps of length zero -minGapless=N - pass those with minimum gapless block of at least N -qMinGap=N - pass those with minimum gap size of at least N -tMinGap=N - pass those with minimum gap size of at least N -qMinSize=N - minimum size of spanned query region -tMinSize=N - minimum size of spanned target region -noRandom - suppress chains involving 'random' chromosomes chainMergeSort - Combine sorted files into larger sorted file usage: chainMergeSort file(s) Output goes to standard output options: -saveId - keep the existing chain ids. chainNet - Make alignment nets out of chains usage: chainNet in.chain target.sizes query.sizes target.net query.net where: in.chain is the chain file sorted by score target.sizes contains the size of the target sequences query.sizes contains the size of the query sequences target.net is the output over the target genome query.net is the output over the query genome options: -minSpace=N - minimum gap size to fill, default 25 -minFill=N - default half of minSpace -minScore=N - minimum chain score to consider, default 0 -verbose - make copious output chainPreNet - Remove chains that don't have a chance of being netted usage: chainPreNet in.chain target.sizes query.sizes out.chain options: -dots=N - output a dot every so often -pad=N - extra to pad around blocks to decrease trash (default 1) chainSort - Sort chains. By default sorts by score. Note this loads all chains into memory, so it is not suitable for large sets. Use chainMergeSort for that usage: chainSort inFile outFile Note that inFile and outFile can be the same options: -target sort on target start rather than score -query sort on query start rather than score chainSplit - Split chains up by target or query sequence usage: chainSplit outDir inChain(s) options: -q - Split on query (default is on target) chainSwap - Swap target and query in chain usage: chainSwap in.chain out.chain chainToPsl - Convert chain file to psl format usage: chainToPsl in.chain tSizes qSizes target.lst query.lst out.psl Where tSizes and qSizes are tab-delimited files with columns. The target and query lists can either be fasta files, nib files, or a list of fasta and/or nib files one per line options: -xxx=XXX chopFaLines - Read in FA file with long lines and rewrite it with shorter lines usage: chopFaLines in.fa out.fa cluster usage: cluster good.txt c2g.sanger c2g.merged nameless.html convolve - perform convolution of probabilities usage: convolve [-count=N] [-logs] \ [-html] -count=N - number of times to run convolution -logs - input data is in log base 2 format, not probabilities -html - output in html table row format correctEst - Correct ESTs by passing them through genome usage: correctEst oldEst.fa ali.psl nibDir out.fa The corrected sequence will be in upper case options: -xxx=XXX countChars - Count the number of occurences of a particular char usage: countChars char file(s) Char can either be a two digit hexadecimal value or a single letter literal character detab - remove tabs from program usage: detab inFile outFile options: -tabSize=N (default 8) Note currently this just replaces tabs with tabSize spaces. Tabs actually add a variable number of spaces if properly implemented, going to the next column that's an even multiple of tabSize. If you need this please implement it and put in a -proper flag or something. -jk dnsInfo - Get info from DNS about a machine usage: dnsInfo machine dynAlign - used dynamic programming to find alignment between two nucleotide sequence usage: dynAlign test string1 string2 aligns two short sequences in command line. dynAlign two query.fa target.fa outfile aligns two sequences in fa files dynAlign worms bwana.out outFile aligns C. briggsae genomic fragments against full C. elegans genome. cbali.out is the output of the cbAli program and outFile is the file to produce. dynAlign starting cbali.out outFile startIx same as dynAlign worms, but starts part way through cbali.out dynAlign range cbali.out outFile startIx endIx same as dynAlign worms, but specifies start and end areas to work on endsInLf - Check that last letter in files is end of line usage: endsInLf file(s) options: -zeroOk This program aligns cDNA with genomic sequence. usage: exonAli named output cdnaName(s) exonAli in output listFile exonAli all output faFile ntDir exonAli starting output faFile ntDir startingIx [count] exonAli resume output faFile ntDir faCmp - Compare two .fa files usage: faCmp [options] a.fa b.fa options: -softMask - use the soft masking information during the compare Differences will be noted if the masking is different. default: no masking information is used during compare. It is as if both sequences were not masked. faCount - count base statistics and CpGs in FA files. usage: faCount file(s).fa faFilterN - Get rid of sequences with too many N's usage: faFilterN in.fa out.fa maxPercentN options: -out=in.fa.out -uniq=self.psl faFlyBaseToUcsc - Convert Flybase peptide fasta file to UCSC format usage: faFlyBaseToUcsc in.faa out.faa options: -xxx=XXX faFrag - Extract a piece of DNA from a .fa file. usage: faFrag in.fa start end out.fa options: -mixed - preserve mixed-case in FASTA file faNcbiToUcsc - Convert FA file from NCBI to UCSC format. usage: faNcbiToUcsc inFile outFile options: -split - split into separate files -ntLast - look for NT_ on last bit -wordBefore=xx The word before the accession, default 'gb' -wordIx=N The word (starting at zero) the accession is in faNoise - Add noise to .fa file usage: faNoise inName outName transitionPpt transversionPpt insertPpt deletePpt chimeraPpt options: -upper - output in upper case faOneRecord - Extract a single record from a .FA file usage: faOneRecord in.fa recordName faRc - Reverse complement a FA file usage: faRc in.fa out.fa In.fa and out.fa may be the same file. options: -keepName - keep name identical (don't prepend RC) faSimplify - Simplify fasta record headers usage: faSimplify in.fa startPat endPat out.fa This will write out the stuff between startPat and endPat options: -prefix=XXX This will add XXX as a prefix -suffix=XXX This will add YYY as a suffix faSize - print total base count in fa files. usage: faSize file(s).fa Command flags detailed=on outputs name and size of each record faSomeRecords - Extract multiple fa records usage: faSomeRecords in.fa listFile out.fa options: -xxx=XXX faSplit - Split an fa file into several files. usage: faSplit how input.fa count outRoot where how is either 'base' 'sequence' or 'size'. Files split by sequence will be broken at the nearest fa record boundary, while those split by base will be broken at any base. Files broken by size will be broken every count bases. Examples: faSplit sequence estAll.fa 100 est This will break up estAll.fa into 100 files (numbered est001.fa est002.fa, ... est100.fa Files will only be broken at fa record boundaries faSplit base chr1.fa 10 1_ This will break up chr1.fa into 10 files faSplit size input.fa 2000 outRoot This breaks up input.fa into 2000 base chunks faSplit about est.fa 20000 outRoot This will break up est.fa into files of about 20000 bytes each by record. faSplit byname scaffolds.fa outRoot This breaks up scaffolds.fa using sequence names as file names. faSplit gap chrN.fa 20000 outRoot This breaks up chrN.fa into files of at most 20000 bases each, at gap boundaries if possible. options: -maxN=N - Suppress pieces with more than maxN n's. Only used with size. default is size-1 (only suppresses pieces that are all N). -oneFile - Put output in one file. Only used with size -out=outFile Get masking from outfile. Only used with size. -lift=file.lft Put info on how to reconstruct sequence from pieces in file.lft. Only used with size and gap. -minGapSize=X Consider a block of Ns to be a gap if block size >= X. Only used with gap. faToNib - Convert from .fa to .nib format usage: faToNib [options] in.fa out.nib options: -softMask - create nib that soft-masks lower case sequence Note gfServer/gfClient don't know about this yet -hardMask - create nib that hard-masks lower case sequence faTrans - Translate DNA .fa file to peptide usage: faTrans in.fa out.fa options: -stop stop at first stop codon (otherwise puts in Z for stop codons) -offset=N start at a particular offset. fato4nt - a program to convert .fa files to .4nt files usage: fato4nt in.fa out.4nt usage: ./i386/findCdna inputFile fixCr - strip s from ends of lines fixcr - removes trailing carraige returns from files. usage: fixcr file(s) gb2cdi - convert GeneBank (GB) files to .fa and cDna Info (CDI) file. usage: gb2cdi file(s).gb file.fa file.cdiFile gbtofa - converts from GeneBank to fa format. usage: gbtofa in.gb out.fa gcForBed - Calculate g/c percentage and other stats for regions covered by bed usage: gcForBed in.bed nibDir options: -xxx=XXX geniegff - makes up a gdf file from Genie gene predictions usage: geniegff genigene.gdf c2gFile This must be run in the same directory as I.gff, II.gff, etc. generated by Genie gffPeek - Look at a gff file and report some basic stats usage: gffPeek file.gff file can be stdin. options: -seq - include seq in output gffgenes - creates files that store extents of genes for intronerator usage: gffgenes c2g file.gl This needs to be run in the directory with the Xgenes.gff files. htmlPics - create an html file from a list of pictures usage: htmlPics picFile(s) The html will be printed to standard out. This program makes an index file for a .fa file usage: ./i386/indexfa file.fa indexFile This program makes an index file for a .gl file usage: indexgl file.gl indexFile Introns - finds the introns in a file and writes them to gff. usage: introns good.txt introns.gff introns.txt altintrons.txt altgenes.txt introns.fa ixali This program makes a name index file for an .ali file usage: ixali file.ali file.ix ixword1 This program makes an index file for text file, indexing the first word of each line. usage: ./i386/ixword1 textFile indexFile ixword3 This program makes an index file for text file, indexing the third word of each line. usage: ./i386/ixword3 textFile indexFile jkUniq - remove duplicate lines from file. Lines need not be next to each other (plain Unix uniq works for that) usage: jkUniq file(s) knownVsBlat - Categorize BLAT mouse hits to known genes usage: knownVsBlat database table output.stats options: -dots=N - Output a dot every N known genes -chrom=chrN - Restrict to a single chromosome -percentId - calculate percent identity. Only works for psl tables. Slow -format=type. Type = 'bed' or 'psl' kvsSummary - Summarize output of a bunch of knownVsBlats usage: kvsSummary outputFile inputFile(s) options: -xxx=XXX lavToAxt - Convert blastz lav file to an axt file (which includes sequence) usage: lavToAxt in.lav tNibDir qNibDir out.axt options: -fa qNibDir is interpreted as a fasta file of multiple dna seq instead of directory of nibs lavToPsl - Convert blastz lav to psl format usage: lavToPsl in.lav out.psl options: -target-strand=c set the target strand to c (default is no strand) libScan - Scan libraries to help find g' capped ones usage: libScan database output.html options: -dots=N - write a dot every N mrnas/ests lineCount - Count lines in a file usage: lineCount file(s) options: -xxx=XXX mafCoverage - Analyse coverage by maf files - chromosome by chromosome and genome-wide. usage: mafCoverage database mafFile Note maf file must be sorted by chromosome,tStart -restrict=restrict.bed Restrict to parts in restrict.bed -count=N Number of matching species to count coverage. Default = 3 mafToAxt - Convert from maf to axt format usage: mafToAxt in.maf tName qName out.axt Where tName and qName are the names for the target and query sequences respectively. tName should be maf target since it must always be oriented in "+" direction. makepgo - Make Predicted Gene Offset files. One for each chromosome. usage: makepgo c2g outputDir .pgo/ This results in a bunch of i.pgo, ii.pgo etc. in output dir. makepgo c2c outputDir .coo/ This results in a bunch of i.coo, ii.coo etc. in output dir. mmUnmix - Help identify human contamination in mouse and vice versa. usage: mmUnmix xAli.pslx fragDir suspect.out mouseBad.out humanBad.out cloneBad.out options: -bed=contam.bed -html=contam.html moresyn - find more gene/ORF synonyms usage: moresyn oldSyn newSyn newOrf2Gene orfInfo mousePoster - Search database info for making foldout usage: mousePoster chrM chrN ... Note - you'll need to edit the source to use different data netChainSubset - Create chain file with subset of chains that appear in the net usage: netChainSubset in.net in.chain out.chain options: -gapOut=gap.tab - Output gap sizes to file -type=XXX - Restrict output to particular type in net file netClass - Add classification info to net usage: netClass in.net tDb qDb out.net options: -tNewR=dir - Dir of chrN.out.spec files, with RepeatMasker .out format lines describing lineage specific repeats in target -qNewR=dir - Dir of chrN.out.spec files for query -noAr - Don't look for ancient repeats netFilter - Filter out parts of net. What passes filter goes to standard output. Note a net is a recursive data structure. If a parent fails to pass the filter, the children are not even considered. usage: netFilter in.net(s) options: -q=chr1,chr2 - restrict query side sequence to those named -notQ=chr1,chr2 - restrict query side sequence to those not named -t=chr1,chr2 - restrict target side sequence to those named -notT=chr1,chr2 - restrict target side sequence to those not named -minScore=N - restrict to those scoring at least N -maxScore=N - restrict to those scoring less than N -minGap=N - restrict to those with gap size (tSize) >= minSize -minAli=N - restrict to those with at least given bases aligning -syn - do filtering based on synteny. -nonsyn - do inverse filtering based on synteny. -type=XXX - restrict to given type -fill - Only pass fills, not gaps. Only useful with -line. -gap - Only pass gaps, not fills. Only useful with -line. -line - Do this a line at a time, not recursing -noRandom - suppress chains involving 'random' chromosomes netSplit - Split a genome net file into chromosome net files usage: netSplit in.net outDir options: -xxx=XXX netStats - Gather statistics on net usage: netStats summary.out inNet(s) options: -gap=gapFile -fill=fillFile -top=topFile -nonSyn=topFile -syn=synFile -inv=invFile -dupe=dupeFile netSyntenic - Add synteny info to net. usage: netSyntenic in.net out.net options: -xxx=XXX netToAxt - Convert net (and chain) to axt. usage: netToAxt in.net in.chain tNibDir qNibDir out.axt options: -qChain - net is with respect to the q side of chains. -maxGap=N - maximum size of gap before breaking. Default 100 -gapOut=gap.tab - Output gap sizes to file netToBed - Convert target coverage of net to a bed file. usage: netToBed in.net out.bed options: -maxGap=N - break up at gaps of given size or more -minFill=N - only include fill of given size of above. newIntron - find introns present in one species but not the other usage: newIntron firstSpecies wabaFile outFile where 'firstSpecies' is either 'elegans' or 'briggsae' newProg - make a new C source skeleton. usage: newProg progName description words This will make a directory 'progName' and a file in it 'progName.c' with a standard skeleton options: -cvs This will also check it into CVS. 'progName' should include full path in source repository nibFrag - Extract part of a nib file as .fa (all bases/gaps lower case by default) usage: nibFrag [options] file.nib start end strand out.fa options: -masked - use lower case characters for bases meant to be masked out -hardMasked - use upper case for not masked-out and 'N' characters for masked-out bases -upper - use uppper case characters for all bases -name=name Use given name after '>' in output sequence nt4Frag - Extract a piece of a .nt4 file to .fa format usage: nt4Frag file.nib start end strand out.fa olly - Look for matches and near matches to short sequences genome-wide Output can be loaded as a wiggle track usage: olly nibDir chrom start stop out.sample example: olly ~/oo/mixedNib chr1 0 1000 chr1_0_1000.sample options: -maxDiff=N (default 3) Maximum variation in bases. Must be 3 or less. -ollySize=N (default 25) Size of oligo. -makeBatch=parasolSpec Make batch file for parasol In this case just do olly nibDir -makeBatch=spec. Spec will be a parasol spec to do everything in nibDir -batchSize=N Default number of oligoes to query in batch For maxDiff 3, 10000 is good, for maxDiff 2 or less 100000 is good -batchChrom=chrN Restrict batch to one chromosome -extendedOut=file Put extended output in file orthologBySynteny - Find syntenic location for a list of gene predictions on a single chromosome usage: orthologBySynteny from-db to-db geneTable netTable chrom chainFile options: -name=field name in gene table (from-db) to get name from [default: name] -tName=field name in gene table (to-db) to get name from [default: name] -track=name of track in to-db - find gene overlapping with syntenic location -psl=geneTable is psl instead of gene prediction -gff=output gene prediction -filter=max gene prediction allowed [default 2mb] phToPsl - Convert from Pattern Hunter to PSL format usage: phToPsl in.ph qSizes tSizes out.psl options: -tName=target (defaults to 'in') pslMrnaCover - Make histogram of coverage percentage of mRNA in psl. usage: pslMrnaCover mrna.psl mrna.fa options: -minSize=N - default 100. Minimum size of mRNA considered -listZero=zero.tab - List accessions that don't align in zero.tab pslToXa - Convert from psl to xa alignment format usage: pslToXa [options] in.psl out.xa qSeqDir tSeqDir options: -masked - use lower case characters for masked-out bases randomLines - Pick out random lines from file usage: randomLines inFile count outFile options: -decomment - remove blank lines and those starting with refineAli - This program turns rough alignments into fine ones. usage: refineAli roughInputFile cdnaBase chromDir goodAlignFile badAlignFile coolFile errorFile startIx endIx [c2gFile] example: refineAli ea\all.out cDNA\allcdna chrom ra\good.txt ra\bad.txt ra\cool.txt ra\err.txt 0 100000 features\c2g regionPicker - Code to pick regions to annotate deeply. Stratifies genome based on mouse non-transcribed homology and spliced EST density. usage: regionPicker database axtBestDir output options: -html=output.html - where to write hyperlinks for region -region=chrN - restrict to a single chromosome -region=file - File has chrN:start-end on each line -printWin - Print stats on each window -avoid=file - File of regions to avoid -randSeed=N - Seed for random number generator -bigWinSize=N -default 500000 -bigStepSize=N - default 100000 -smallWinSize=N - default 125 -theshold=0.N - minimum base identity in small window. default 0.8 -chromLimit=file - File that has limits for picks per chromosome -picksPer=N - number of picks per strata, default 5 rikenBestInCluster - Find best looking in Riken cluster usage: rikenBestInCluster database output.tab options: -xxx=XXX rmFaDup - remove duplicate records in FA file usage rmFaDup oldName.fa newName.fa scaffoldFaToAgp - generate an AGP file, gap file, and lift file from a scaffold FA file. usage: scaffoldFaToAgp source.fa options: -minGapSize Minimum threshold for calling a block of Ns a gap.The resulting files will be source.{agp,gap,lft} Note: gaps of 1000 bases are inserted between scaffold records as contig gaps in the .agp file. N's within scaffolds are represented as frag gaps in the .gap file only scrambleFa - scramble the order of records in an fa file usage: scrambleFa in.fa out.fa dynAlign - used dynamic programming to find alignment between two strings usage: dynAlign string1 string2 sortFilt - merge, sort, and filter patSpace .hit output. usage: sortFilt output.sf histogram.txt matchThreshold sizeThreshold repeatMax infile(s).hit where the matchThreshold is the minimum 'psuedo percentage' match to take, size threshold is the minimum size (on the query sequence to take, and repeatMax is the maximum number of repeats to allow spacedToTab - Convert fixed width space separated fields to tab separated Note this requires two passes, so it can't be done on a pipe usage: spacedToTab in.txt out.tab options: -xxx=XXX splitFile - Split up a file usage: splitFile source linesPerFile outBaseName options: -head=file - put head in front of each output -tail=file - put tail at end of each output splitSim - Simulate gapless distribution size usage: splitSim XXX options: -xxx=XXX stToXao - make indices into st file, one for each chromosome. usage: stToXao infile.st outDir/ stitchea - joins together EA files into one big one, throwing out overlaps. Will complain if there's any missing data. usage: stitchea outFile inFile(s) stitcher - third pass of genomic/genomic alignment. Stitches together 2000x5000 base 7-state alignments into longer contigs. usage: stitcher in.dyn out.sti stitcher in.dyn out.st compact stringify - Convert file to C strings usage: stringify in.txt A stringified version of in.txt will be printed to standard output. subChar - Substitute one character for another throughout a file. usage: subChar oldChar newChar file(s) oldChar and newChar can either be single letter literal characters, or two digit hexadecimal ascii codes Subs - a utility to perform massive string substitutions on source usage: subs [options] file1 ... filen [options] options: -f file (get files to do subs on from file) -s file (get substitutions to perform from file. by default subs looks for subs.in) -r (read only - don't write out substitutions) -b (don't create .bak files on changed files) -e (looks for embedded substrings as well as entire C symbols -c char (use char as the separator in substitution file. Only matters in embedded case. '|' by default.) -i (interactive query on each substitution.) The format of subs.in is oldstringnewstring. In the normal case can be any white space and newstring is required. In the embedded case defaults to '|' and if there is no newstring, oldstring will be eliminated. Hashing algorithm doesn't work for one character sub sources. There can be more than one substitution in the file. Subs does take wildcards in the list of files to substitute. subsetAxt - Rescore alignments and output those over threshold usage: subsetAxt in.axt out.axt matrix threshold options: -xxx=XXX subsetTraces - Build subset of mouse traces that actually align usage: subsetTraces traceDir pslDir subset.fa options: -abbr=junk - abbreviate names in fa files. '-tracePat=*.fa' - just use .fa files in traceDir '-pslPat=*.psl' - just use .psl files in pslDir tableSum - Summarize a table somehow usage: tableSum tableFile options: -row Sum all rows -col Sum all columns -colEven=N Output columns that are sum of N columns of input -rowEven=N Output rows that are sum of N rows of input -colDiv=30,60,10 Produce table that sums columns first 30 columns in input to first column in output, next 60 columns to second column in output, and next 10 columns to third column in output. -rowDiv=30,60,10 Similar to colDiv, but for rows, may be combined -scale=X Multiply everything by X -average Compute average instead of sum textHist2 - Make two dimensional histogram table out of a list of 2-D points, one per line. usage: textHist2 input options: -xBins=N - number of bins in x dimension -yBins=N - number of bins in y dimension -xBinSize=N - size of bins in x dimension -yBinSize=N - size of bins in x dimension -xMin=N - minimum x number to record -yMin=N - minimum y number to record -ps=output.ps - make PostScript output -psSize=N - Size in points (1/72th of inch) -labelStep=N - How many bins to skip between labels -margin=N - Margin in points for PostScript output -log - Logarithmic output (only works with ps now) -postScale=N (default 1.000000) - What to scale by after normalization textHistogram - Make a histogram in ascii usage: textHistogram inFile Where inFile contains one number per line. options: -binSize=N Size of bins, default 1 -maxBinCount=N Maximum # of bins, default 25 -minVal=N Minimum value to put in histogram, default 0 -log Do log transformation before plotting -noStar Don't draw asterisks -col=N Which column to use. Default 1 -aveCol=N A second column to average over. The averages will be output in place of counts of primary column. tickToDate - Convert seconds since 1970 to time and date usage: tickToDate ticks Use 'now' for current ticks and date toLower - Convert upper case to lower case in file. Leave other chars alone usage: toLower in out toUpper - Convert lower case to upper case in file. Leave other chars alone usage: toUpper in out trackOverlap- Overlap how much of a track is overlapped by other tracks and vice versa. This is done by correlating series of bitmap projections (i.e. featureBits multiple times). usage: trackOverlap database chromosome track listFile Where listFile is a file with one featureBits specification per line. For example: 'intronEst:exon:10', see featureBits for more examples of syntax. undupFa - rename duplicate records in FA file usage: undupFa faFile(s) upper - strip numbers, spaces, and punctuation turn to upper case usage: upper in out venn - Do venn diagram calculations usage: venn aSize abSize bSize options: -xxx=XXX wabToSt - Convert WABA output to something Intronerator understands better usage: wabToSt out.st in1.wab ... inN.wab whyConserved - Try and analyse why a particular thing is conserved usage: whyConserved database chromosome homologyTrack Use 'all' in the chromosome to cover the whole genome wigAsciiToBinary - convert ascii Wiggle data to binary file usage: wigAsciiToBinary [-offset=N] [-binsize=N] [-dataSpan=N] \ [-chrom=chrN] [-wibFile=] [-name=] \ [-verbose] -offset=N - add N to all coordinates, default 0 -binsize=N - # of points per database row entry, default 1024 -dataSpan=N - # of bases spanned for each data point, default 1 -chrom=chrN - this data is for chrN -wibFile=chrN - to name the .wib output file -name= - to name the feature, default chrN or -chrom specified -verbose - display process while underway - list of files to process If the name of the input files are of the form: chrN.<....> this will set the output file names. Otherwise use the -wibFile option. Each ascii file is a two column file. Whitespace separator First column of data is a chromosome location. Second column is data value for that location, range [0:127] wordLine - chop up words by white space and output them with one word to each line. usage: wordLine inFile(s) Output will go to stdout. xmfaToMaf - Convert from xmfa to maf format usage: xmfaToMaf in.xmfa out.maf org1=db1 org2=db2 ... orgN=dbN