User's guide to TagZilla version 1.0

Introduction

This user's guide explains the use of the TagZilla program. TagZilla generates tag SNPs for any given set of SNPs with genotypes.  The TagZilla SNP selection algorithm estimates pair-wise r2 and d' (d-prime) linkage disequilibrium (LD) statistics on genotype data from unrelated individuals.It then estimates bins using a greedy maximal approach similar to that of Carlson et al. (2004), evaluates all the tags for each bin based upon user-specified criteria, and recommends an optimal tag for each bin if possible.

 

The program provides many options, such as minimum MAF (minor allele frequency), d' threshold and r2 threshold, include/exclude/subset, optimization of numbers of bins for fixed sized panels, or thresholds to reach a desired level of coverage. It also provides options for different input formats such as HapMap, Linkage, and FESTA. Data in assay design scores and weighting criteria can be incorporated into the analysis in order to choosing optimal tags for each bin.

notice

TagZilla version 1.1 will be released at the end of next week. It will contain numerous updates, including several corrections and extensions to the multi-population tagging capabilities.

 

TagZilla options

 

TagZilla provides you with many options for controlling its execution. It can read in multiple genotype files that contain data from disjoint genomic regions. TagZilla allows each input genotype file to utilize different minimum MAF, r2, completion rate and other parameters.

 

Usage:python tagzilla.py [options] genotype_file [options] genotype_file

 

Example:

python tagzilla.py -p pedinfo2sample_CEU.txt -b summary -o pairs.out -O loci.out –D designscore.txt:0.6 genotypes_chr21_CEU.txt.gz

 

TagZilla supports both short options and long options.

 

You can use the --version option to check the program version and --help/-h option to print out help messages. The options are grouped into four categories:

 

The table for each category serves as a quick reference to the options. More detailed discussions about the purpose and usage of the options are in the notes below each table.

 

Genotype and LD estimation options:

 

Option

Explanation

-a FREQ, --minmaf=FREQ

Minimum minor allele frequency (MAF) (default=0.05)

-A FREQ, --minobmaf=FREQ

Minimum minor allele frequency (MAF) for obligate tags (defaults to -a/--minmaf)

-c N, --mincompletion=N

Drop loci with less than N valid genotypes(Default=0)

--mincompletionrate=N%

Drop loci with completion rate less than N% (0-100, Default=0)

-m D, --maxdist=D

Maximum inter-marker distance in kb for LD comparison(default=200)

-P p, --hwp=p

Filter out loci that fail to meet a minimum significance level (p value) for the test of Hardy-Weinberg proportion

 

Notes:

-a/--minmaf and -A/--minobmaf:

Both options specify the MAF threshold to filter out loci with low MAF from the analysis. You can set different thresholds for obligates versus any other loci, but the default for -a/--minmaf is 0.05, and the default for -A/--minobmaf will take the value set for -a/--minmaf option.

 

-c/--mincompletion and --mincompletionrate:

These options are used to drop loci with low number or rate of valid genotypes among all the genotyped samples.

 

-m/--maxdist:

The linkage between loci usually decreases as the distance between loci increases. We won't consider the linkage disequilibrium between two loci if the distance between them is greater than the number specified in this option.

 

-P/--hwp:

This option is used to specify the threshold of P value for the Hardy-Weinberg Equilibrium test. If the count of the minor alleles in the set of genotypes is less than 1000, TagZilla applies the exact test based on Wigginton JE et al. (2005), otherwise it simply uses the standard Chi-square test. Loci that fail to meet this threshold are filtered from the analysis.

 

Binning options:

 

Option

Explanation

-C crit, --tagcriteria=crit

Use the specified criteria to choose the optimal tag for each bin

  

Currently supported tag selection criteria:

maxtag:   choose the tag having largest minimum-r2 with any tag snps in the bin

maxsnp: choose the tag having largest minimum-r2 with all snps in the bin

avgtag:    choose the tag having maximum average- r2 with non-tag snps in the bin

avgsnp:   choose the tag having maximum average- r2 with all snps in the bin

 

-d DPRIME,--dthreshold=DPRIME

Minimum d-prime threshold to output (default=0)

-r N, --rthreshold=N

Minimum r-squared threshold to output (default=0.8)

-t N, --targetbins=N

Stop when N bins have been selected (default=0 for unlimited)

-T N, --targetloci=N

Stop when N loci have been tagged (default=0 for unlimited)

-M N, --multipopulation=N

Multipopulation tagging where N is the number of populations

--multimerge

Merge populations when performing multipopulation tagging [not recommended]

-z N, --locipertag=N

Ensure that bins contain more than one tag per N loci. Bins with insufficient tags will be reduced.

-Z B, --loglocipertag=B

Ensure that bins contain more than the ceiling of log_B(loci) tags. Bins with insufficient tags will be reduced.

 

Notes:

-C/--tagcriteria:

 

Example: -C maxsnp:2

Give half the weight the each tag that does not meet the maxsnp criteria.

 

This option can be used together with the -D/--designscores option to specify how the optimal tag should be selected for each bin. -C/--tagcriteria provides the weights, and -D/--designscores provides the designscores. TagZilla will compute a weighted score and thus determine which tag is recommended to the user.

 

-d/--dthreshold and -r/--rthreshold:

Both are used as cut-off criteria so that only locus pairs satisfying these thresholds are considered in the binning process.

 

 -t/--targetbins and -T/--targetloic:

Both options are used as stopping criteria. In either case, once the criteria are met, Tagzilla produces residual bins instead of maximal bins.

 

-M/--multipopulation and --multimerge:

You can specify the number of populations via -M/--multipopulation option. Tagzilla uses minLD method if -multimerge hasn't been set to bin the loci with genotypes from different populations and thus generate a set of tags applicable for all the populations. --multimerge option is not recommended.

 

-z/--loicpertag and -Z/--loglocipertag:

Both options control the ratio between the tags and loci.If the size of the bin is too large and thus the number of loci per tag is too big, the genotype failure on the tag will lead to losing information on lots of loci surrogated only by that tag.Instead of picking another candidate tag from large bin as in a post-process, TagZilla incorporates this user requirement into the binning process and generates bins only satisfying these requirements.

 

Input options:

 

Option

Explanation

-p FILE, --pedfile=FILE

Pedigree file for HapMap, PrettyBase or raw genotype files (optional)

This option can be specified multiple times on the command line.

-s FILE, --subset=FILE

File containing loci that define the subset to be analyzed of the loci that are read

-l FILE, --loci=FILE

Locus description file for genotypes input in Linkage format

-i FILE, --includetag=FILE

File containing loci that are obligates

-L N, --limit=N

Limit the number of loci considered to N for testing purposes (default=0 for unlimited)

-f NAME, --format=NAME

Format for genotype/pedigree or LD input data.Values: hapmap(default), linkage, festa, prettybase, raw.

-e FILE, --excludetag=FILE

File containing loci that are excluded from being a tag

-D FILE, --designscores=FILE

Read in design scores or other weights to use as criteria to choose the optimal tag for each bin.

Example: -D designscore1.txt:0.5:1

0.5 is the threshold, 1 is the scale. Both are optional for this option entry, the default value for threshold is 0 and the default value for scale is 1.

This option can be specified multiple times on the command line.

-R S-E,… --range=S-E,…

Ranges of genomic locations to be analyzed. They are specified as a comma separated list of start and end coordinates "S-E". If either S or E is not specified, then the ranges are assumed to be open. The end coordinate is exclusive and not included in the range.

Example: -R 10000-20000, 30000-80000

 

 Notes:

-p/--pedfile:

This option specifies the pedigree file for those genotypes provided in the format of Hapmap, Prettybase or raw. It is not meaningful to specify a pedigree file when reading genotype or LD data in linkage or FESTA format. The genotypes for the non-founders as found in the pedigree file won't be considered in the binning process. Note that if the pedigree file is incomplete, we assume all the individuals not contained in the pedigree file are founders.

 

-s/--subset:

Besides providing a file containing the subset of loci to be analyzed, the user can also specify a comma separated list of loci from the command line. The string value for this option has to start with a colon.For example,

-s :rs12355,rs12365,rs12488

 

-l/--loci:

The locus description file for genotype input in linkage format. TagZilla reads in the location for each locus from this file.

 

-i/--includetag and -e/--excludetag:

Similar to -s/--subset option you can also specify a list of loci as the string value for both options in addition to a file name. The specified list of loci are either forced in as tags or excluded from being chosen as tags for non-excludes.

 

-D/--designscores:

This option can be used alone or together with -C/--tagcriteria to choose the optimal tag among all the valid tags for a bin.

 

-L/--limit:

This option is useful for testing purposes. If the genotype data are too big to complete a run quickly, you can limit the number of loci by specifying a value for the option.

 

-f/--format:

Current version supports five different formats: hapmap(default), linkage, festa, prettybase and raw (case insensitive)

 

-r/--range:

A list of genomic location pairs can be specified via this option to filter out the loci located outside these ranges.

 

Output options:

 

Option

Explanation

-O FILE, --locusinfo=FILE

 

Output locus information to FILE

('-'for standard out)

-o FILE, --output=FILE

Output tabular LD information for bins to FILE ('-'for standard out)

-x, --extra

Output inter-bin LD statistics to the file specified in -o/--output option.

-H N, --histomax=N

Largest bin size output in summary histogram output (default=10)

-k, --skip

Skip output of untagged or excluded loci

-b FILE, --summary=FILE

Output summary tables for all bins to FILE (Default to standard out)

-B FILE, --bininfo=FILE

Output summary information about each bin to FILE

 

 

Notes:

-o/--output, -x/--extra and -k/--skip:

-o/--output specifies the name of the output file containing LD information for the bins,-x/--extra triggers appending the inter-bin LD statistics to the same file, and -k/--skip skips output of the pair of loci if the disposition of the bin is either obligate-exclude or residual, or either one of the pair is in the exclude set.

 

-b/--summary and -H/--histomax:

-b/--summary specifies the name of the output file containing the histogram table summaries for all the bins, and -H/--histomax is the largest bin size that is included in the table.

 

-B/--bininfo:

This option specifies the name of the output file containing the summary information including tags, non-tags, bin size, location and spacing about each bin.

-O/--locusinfo:

This option specifies the name of the output file containing the locus information such as location, MAF, bin number and disposition for each locus.

 

File formats

 

This section describes the file format for both input and output files. Following are the allowable input files:

 

Following are the output files:

 

Some simple examples are included to illustrate the format more clearly, and some example files are included in the TagZilla package, which can be referred to if needed.

 

 

The header of Hapmap format files looks like

'rs#SNPalleleschromposstrand genome_build centerprotLSID assayLSIDpanelLSIDQC_code' followed by a list of sample identifiers.The program will check against a pedigree file if it is specified from the command line by '-p' option to set only the genotypes from non-related individuals (i.e. founders) for each locus for further analysis.Here is a sample line of the hapmap format genotype file:

 

 

The following table describes the valid values for each column:

 

Column header

Description

rs#

A string of characters starting with letters 'rs' then followed by digits, e.g. rs12345

SNPalleles

All possible alleles (A, G, C or T) for the SNP with each separated by a forward slash, e.g. A/G

Chromo

Three-letter String 'Chr' followed by a number from 1 to 22 or a letter X or Y for sex chromosome, e.g. Chr22

Pos

position of the SNP, an integer number

Strand

One single character, either '+' or '-'. '+' refers to a strand going from 5-prime telomere to 3-prime telomere, and'-' refers to a strand going from 3-prime telomere to 5-prime telomere.

genome_build

A string of characters, e.g.ncbi_b35.1

Center

A string of characters,e.g. broad

protLSID

A string of characters, e.g. urn:LSID:affymetrix.hapmap.org:Protocol:genotype_protocol_1:1

assayLSID

A string of characters, e.g. urn:lsid:affymetrix.hapmap.org:Assay:1612756:1

panelLSID

A string of characters, e.g. urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1

QC_code

Either 'QC+' or 'QC-'

genotypes

A pair of letters, with each letter chosen from the set of (A,G,C,T,N), e.g. AG

 

Note that all columns are case-sensitive if not mentioned otherwise, no space is allowed within each column

 

 

This is specified by the '-p' option from the command line. The valid value for each columns of the file include:

 

Column number

Description

1

pedigree id, an integer

2

individual id, an integer

3

father id, an integer

4

mother id, an integer

5

Sex, a single digit, 1 for male, and 2 for female

6

hapmap individual id, a string of characters. example: urn:lsid:dcc.hapmap.org:Individual:CEPH1420.09:1

7

hapmap sample id, a string of characters. Example: urn:lsid:dcc.hapmap.org:Sample:NA12003:1

 

Rows having the same pedigree id constitute individuals belonging to the same family. If an individual is a founder, they will have both father and mother id set to 0, otherwise, the values will relate to the individual identifiers from other lines within the file. TagZilla will utilize only founders in the input data.

 

 

Linkage format is the required input format for Haploview. This file should not have any header lines. The valid values for each column are:

 

Column number

Description

1

Pedigree name: a unique alphanumeric identifier for this individual's family. Unrelated individuals shouldn't share a common pedigree name.

2

Individual Id: a unique alphanumeric identifier for this individual within his family.

3

Father ID: father's individual ID or '0' for unknown father. Note that if a father ID is specified, the father must also appear in the file.

4

Mother ID: mother's individual ID or '0' for unknown mother. Note that if a mother ID is specified, the mother must also appear in the file.

5

Genders: Individual's gender(1 for male, 2 for female)

6

Affectation status:used for association tests(0 for unknown, 1 for unaffected and 2 for affected).

>6

Marker genotypes: each marker is represented by two columns (one for each allele, separated by a space) and coded 1-4 where: 1=A, 2=C, 3=G, T=4. A 0 in any of the marker genotype position indicates missing data.

 

 

Files should also follow these two guidelines:

·        Families should be listed consecutively within the file (i.e. all the lines with the same pedigree ID should be adjacent).

 

31289121 23 30 04 2

 

 

This file is required when processing linkage format genotype data, and it is specified by using '-l' option from the command line.Each line of the file has two columns. The first column is the locus name and the second column is locus location. For example:

 

rs169757 9928594

 

 

TagZilla reads in the genotype data in this format if the '-f raw' option is specified from the command line. It has a header line which has this format 'rs#<tab>chr<tab>pos<tab>' followed by a tab delimited list of sample ids. The following table describes the valid values for each column:

 

Column

Description

rs#

A string of characters startin A string of characters starting with letters 'rs' then followed by digits, e.g.rs12345

chr

Three-letter String 'Chr' followed by a number from 1 to 22 or a letter X or Y for sex chromosome, e.g. Chr22

pos

position of the SNP, an integer number, e.g. 54321

genotypes

A pair of letters with each letter chosen from the set of (A,G,C,T,N), e.g. AG

 

 

TagZilla checks against a pedigree file and set only the genotypes from founders for each locus for further analysis.

 

Following is a sample line of the raw format genotype file.

 

rs169757 Chr21 9928594 AA AA AA AC AA AA AC AA AA AA AA AC AA AA

 

 

This file is optionally used when processing raw format genotype data.

Note that the individual id must be unique in the file and can be mapped to one of the genotype columns in the header line of the raw format genotype data file. Refer to “Pedigree file for Hapmap format genotype data” for more details.

 

The following table describes the valid values for each column of the prettybase format genotype file:

 

Column

Description

Site position

An integer uniquely identifying the locus

Individual id

A string of characters uniquely identifying the individual, case-sensitive

First allele

One character chosen from the set (A,G,C,T,?) , with '?' for unknown

Second allele

One character chosen from the set (A,G,C,T,?) , with '?' for unknown

 

Following are some sample lines of a prettybase genotype file:

 
10110 PT01B    C        C
10110 PT02B    G        G
10110 PT03B    G        G
10110 PT04B    G        G
10110 PT05B    G        G
10110 PT06B    G        G
10287 PT01B    ?        G
10287 PT02B    C        C
10287 PT03B    C        C
10287 PT04B    C        C
10287 PT05B    C        C
10287 PT06B    C        C
 

 

TagZilla can read in these files containing the pre-computed pair-wise LD parameter between the SNPs in certain region.For details about the format of these files, user can refer to this link: http://www.sph.umich.edu/csg/qin/FESTA/sample_files/

 

 

These files contain lists of loci for the purpose of sub-setting, specifying loci that must be included as tags, or excluding loci from being tags during the analysis process.

 

These set of files have the same simple format, no headers, with one locus name on each line, and the locus name is case-sensitive. For example:

 

rs150379

rs469673

rs212121

rs210499

rs469536

 

However, if the first character of the argument on any of these options is a colon ':', then the remainder of the argument is processed as a comma-delimited list of loci.For example, -i :rs512331, rs1221.This method is sometimes convenient when running TagZilla iteratively from the command-line.

 

 

These files contain the design score information for SNPs. Each line of the file must contain the name of the SNP and its design score. TagZilla allows multiple design score files to be specified from the command line, and information in all files will considered during tag selection stage.

If the design score for a SNP is 0 or below the given threshold, that SNP will be forced into the exclude set. If this SNP also happens to be in the include set, then the disposition of the bin containing this SNP will be obligate-include, the SNP will be reported as obligate_tag (because it is in the include set) and also as one of the excluded_as_tags (because it is forced into the exclude set). Therefore, include will take priority over exclude in our program.

 

     Following are some sample lines of a design score file:

 

rs150379  0.8

rs469673  0.9

rs212121  0.7

rs210499  0.6

rs469536  0.5

 

There are four different output files, and only one of these files can be directed to standard output, others must be output to the files with names specified in the command line options. The output will contain the following information about each bin chosen by TagZilla:

 

 

The name and location of this file are specified in the '-B' option. The bin number will appear multiple times as we output all the information for that bin.This format is an expanded version of the output produced by the program ldSelect (Carlson et al., 2004).The following table describes the information produced for each bin:

 

Row number

Description

1

summary line: contains the total number of sites for the bin, the number of tags, the number of non-tags, the number of required tags, the width, and the average MAF for the bin

2

detailed location information: minimum, median, average and maximum location of all the loci in the bin

3

detailed spacing information: minimum, median, average and maximum spacing among all the loci in the bin

4

tag SNPs

5

recommend tag SNP

6

Other SNPs

7

excluded tag SNPs

8

bin disposition (four possible values: 'obligate-include', 'maximal-bin', 'residual', 'obligate-exclude')

9

Number of loci that would have been covered by the bin, note that for obligate include bins only the obligatory tags are considered.

 

Here is an example bin info file generated by TagZilla:

 

 

 

The name and location of this file are specified in '-o' option. The first line is the header line. The following table describes each column in the LD data output file:

 

Column number

Description

1

a sequence number for identifying each bin

2

the first locus name of the pair

3

the second locus name of the pair

4

the rsquared value for the pair

5

Disposition ( see the table below for details)

 

All possible values for the disposition of each LD pair are summarized in the two tables below. The first table describes different dispositions for the tags paired with themselves, and the second table is for the rest of the LD pairs within each bin.

 

Tags paired with themselves:

Disposition

Description

obligate-tag

An obligate tag

alternate-tag

A tag in an obligate-include bin, but not the obligate tag

excluded-tag

A tag for a bin that contains all obligatorily excluded loci

candidate-tag

A tag for a non obligate bin with more than one possible tags

necessary-tag

A tag for a bin that has only one possible tag

lonely-tag

A tag for bin with no other loci, but originally covered more loci. These additional loci were removed by previous iterations of the binning algorithm. This disposition is primarily to distinguish these bins from singletons, which intrinsically are in insufficient LD with any other locus.

singleton-tag

A tag that is not in significant LD with any other locus based upon specified LD threshold.

 

Note: 'recommended' will be appended to the above disposition to indicate that it is also an optimal tag chosen among all the possible tags for a bin by comparing the score and checking certain criteria provided that these options are set from the command line.

 

Other LD pairs in the bin:

Disposition

Description

tag-tag

LDbetween tags within a bin

other-tag

LD between a no-tag and a tag

tag-other

LD between atag and non-tag

other-other

LD between non-tags within a bin

 

Note that for residual bins, the dispositions for all LD pairs within each bin will have a 'residual' qualifier appended to them, and for obligate exclude bins, the dispositions for all LD pairs will have an 'excluded' qualifier appended to them.Also if the user specifies the '-x' option, the 'interbin' qualifier will appear in the disposition column for all residual LD pairs that sit in the bottom part of this output file.The LD pairs are formed based on each individual genotype input file, i.e., TagZilla doesn't look for significant LD among loci in multiple input files. The LD data is presorted by rsquared, and then locus1, and then locus2 for each bin. Following is an example of an LD data output file:

 

 

 

The file name and location can be specified by the '-O' option on the command line. The first line is the header line. The following table describes each column in the locus info data output file:

 

Column

Description

1

Locus name

2

Location of the locus

3

MAF(Minor Allele Frequency)

4

Bin number

5

Disposition

 

There are two possible disposition categories for each locus.

 

Following is an example of a locus data info output file. The contents in the file are sorted by bin number, and within each bin sorted by tags first and then non tags.

 

 

 

There are four types of bins: obligate-include, maximal-bin, residualand obligate-exclude.

 

The output file includes a table summarizing the bin statistics by bin size (a histogram) for each type of bin. The maximum size of bin shown as one row in the table can be configured with the '-H' option, and these tables in this output file share the common set of columns. The following table describes the content of each column:

 

Column

Description

Bin size

The number of loci in the bin

Bins

The number of bins with specified bin size

%

The percent of bins with specified bin size

Loci

The number of loci contained in all the bins with specified bin size

%

The percent of loci contained in all the bins with specified bin size

Tags

The number of tags for all the bins with specified bin size

Non tags

The number of non tags for all the bins with specified bin size

Avg tags

The average number of tags per bin for all the bins with specified bin size

Avg width

The average width for all the bins with specified bin size. The width for a bin is the difference between the maximum location and minimum location of the loci in a bin.

 

At the end of the file there is a final summary table showing the bin statistics further summarized by bin disposition. Following is an example of the bin summary file:

 

 

TagZilla Package

 

The package contains the following:

 

Users need to download and install python from its official website http://www.python.org/ for the desired platform. TagZilla version 1.0 requires python 2.4 and above and it can run on different OS platforms including Unix/Linux and Windows. UNIX users will need to run 'python setup.py install' or 'python setup.py build_ext -i' in case of having no root access to install TagZilla and build the accelerators.Install command for users without root access: Windows users will receive precompiled binary accelerators distributed with the package and those libraries need to be placed in the site-packages directory of your python installation.

 

NOTE:

Users should expect significantly increased computation times when TagZilla is run without native code accelerators. A warning will be printed when accelerators are not available or properly installed. Please do not post comparative timing or benchmark results when running TagZilla in this manner.

 

TagZilla License

 

Copyright 2006 Science Applications International Corporation ("SAIC").

 

The software subject to this notice and license includes both human readable source code form and machine readable, binary, object code form ("the TagZilla Software"). The TagZilla Software was developed in conjunction with the National Cancer Institute ("NCI") by NCI employees and employees or contractors of SAIC. To the extent government employees are authors, any rights in such works shall be subject to Title 17 of the United States Code, section 105.

 

This TagZilla Software License (the "License") is between NCI and You. "You (or "Your") shall mean a person or an entity, and all other entities that control, are controlled by, or are under common control with the entity. "Control" for purposes of this definition means (i) the direct or indirect power to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

 

This License is granted provided that You agree to the conditions described below. NCI grants You a non-exclusive, worldwide, perpetual, fully-paid-up, no-charge, irrevocable, transferable and royalty-free right and license in its rights in the TagZilla Software to (i) use, install, access, operate, execute, copy, modify, translate, market, publicly display, publicly perform, and prepare derivative works of the TagZilla Software; (ii) distribute and have distributed to and by third parties the TagZilla Software and any modifications and derivative works thereof; and (iii) sublicense the foregoing rights set out in (i) and (ii) to third parties, including the right to license such rights to further third parties. For sake of clarity, and not by way of limitation, NCI shall have no right of accounting or right of payment from You or Your sublicensees for the rights granted under this License. This License is granted at no charge to You.

 

1.      Your redistributions of the source code for the Software must retain the above copyright notice, this list of conditions and the disclaimer and limitation of liability of Article 6, below. Your redistributions in object code form must reproduce the above copyright notice, this list of conditions and the disclaimer of Article 6 in the documentation and/or other materials provided with the distribution, if any.

 

2.      Your end-user documentation included with the redistribution, if any, must include the following acknowledgment: "This product includes software developed by SAIC and the National Cancer Institute." If You do not include such end-user documentation, You shall include this acknowledgment in the Software itself, wherever such third-party acknowledgments normally appear.

 

3.      You may not use the names "The National Cancer Institute", "NCI" "Science Applications International Corporation" and "SAIC" to endorse or promote products derived from this Software. This License does not authorize You to use any trademarks, service marks, trade names, logos or product names of either NCI or SAIC, except as required to comply with the terms of this License.

 

4.      For sake of clarity, and not by way of limitation, You may incorporate this Software into Your proprietary programs and into any third party proprietary programs. However, if You incorporate the Software into third party proprietary programs, You agree that You are solely responsible for obtaining any permission from such third parties required to incorporate the Software into such third party proprietary programs and for informing Your sublicensees, including without limitation Your end-users, of their obligation to secure any required permissions from such third parties before incorporating the Software into such third party proprietary software programs. In the event that You fail to obtain such permissions, You agree to indemnify NCI for any claims against NCI by such third parties, except to the extent prohibited by law, resulting from Your failure to obtain such permissions.

 

5.      For sake of clarity, and not by way of limitation, You may add Your own copyright statement to Your modifications and to the derivative works, and You may provide additional or different license terms and conditions in Your sublicenses of modifications of the Software, or any derivative works of the Software as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.

 

6.      THIS SOFTWARE IS PROVIDED "AS IS," AND ANY EXPRESSED OR IMPLIED WARRANTIES, (INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT AND FITNESS FOR A PARTICULAR PURPOSE) ARE DISCLAIMED. IN NO EVENT SHALL THE NATIONAL CANCER INSTITUTE, SAIC, OR THEIR AFFILIATES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 

References

 

  1. Carlson C.S. et al. (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analysis using linkage disequilibrium. Am. J. Hum. Genet. 74, 106-120
  2. Wigginton J.E. et al. (2005) A Note on Exact Tests of Hardy-Weinberg Equilibrium. Am. J. Hum. Genet. 76, 887-93
  3. Zhaohui S. Qin et al. (2006) An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics. 22(2):220-5.

 

Contact information

 

In case of any questions regarding this documentation or the TagZilla program, please contact Kevin Jacobs by email at jacobske@mail.nih.gov.