************************************************ * * * PDT 5.1 * * * * written by: Eden Martin, Carol Haynes, * * Liling Warren, and Meredyth Bass * * * * * ************************************************ ************** Introduction ************** PDT is designed to look at both linkage and association in extended pedigrees. It will perform allele-specific analysis and genotype-specific analysis for single markers. In addition, PDT will now run analyses for combinations of genotypes over multiple marker loci. This is not the same as haplotype analysis, however, as no phase information is utilized. PDT takes up to five command line arguments: *.dat, *.ped, *.out, the triad/ discordand sibpair(dsp) flag, and the statistic flag. The first three arguments (the names of input and output files) are required, and the last two use default values unless otherwise specified by the user. The two input files for PDT are a *.dat file and a *.ped file which are in a (post-MAKEPED) format readable by the LINKAGE programs. You can use publically available MAKEPED and PRELINK programs to make these files. The triad/dsp flag values are TD0 all individuals [default], TD1 dsp's only, TD2 triads only, or TD3 use only triads unless both parents aren't typed, then use dsp's if they are available for that family. The statistic flag values are S0 both allele-specific statistics, sumPDT and avePDT [default], S1 avePDT only, S2 sumPDT only, S3 geno-PDT, or S4 multi-locus geno-PDT. Here are some examples of commands executed on the command line (for more details, see below): pdt example.dat example.ped example.out > example.log pdt example.dat example.ped example.out TD0 S0 > example.log ********************** References ********************** When publishing PDT analyses, please cite the following references: "A Test for Linkage and Association in General Pedigrees: The Pedigree Disequilibrium Test" by Martin ER, Monks SA, Warren LL, and Kaplan NL. Am J Hum Genet 67:146-154, 2000. "Correcting for a Potential Bias in the Pedigree Disequilibrium Test" by Martin ER, Bass MP, and Kaplan NL. Am J Hum Genet 68:1065-1068, 2001. When using the geno-PDT, please site the following reference: "A genotype-based association test for general pedigrees: the genotype-PDT" by Martin ER, Bass MP, Gilbert JR, Pericak-Vance MA, and Hauser ER. Genetic Epid. 2003. (in press) ********************** System Information ********************** Solaris The PDT was compiled in C++ on a Solaris 7 (unix) workstation using the gcc 2.95.3 compiler. Linux The PDT was compiled in C++ on a Linux 2.4 workstation using the gcc 3.2 compiler. If you require an alternate version, please check our web site at http://wwwchg.duhs.duke.edu/index.html to see whether additional versions have been made available. For technical support (including requests for alternate platforms), please feel free to contact pdt@chg.duhs.duke.edu. ********************** Disclaimer ********************** No warranty, either expressed or implied, is made with respect to the functioning and accuracy of this program. No responsibility is assumed by the authors. Please report any problems or bugs to the authors. Please do not alter or distribute this program without the permission of the authors. A patent for the methods implemented in this program is pending. ********* Dat file ********* The dat file should look like the following (comments -- "<< ..." should NOT be included): 2 0 0 5 << NO. of LOCI, RISK LOCUS, RISK ALLELE, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT RATE, HAP GREQUENCIES (IF 1) 1 2 << DX ALLELES (normal, affected) 1 2 #DISEASE << LOCUS CODE (use 1 for dx locus), NO. OF ALLELES, DX NAME (USE # TO START IT) 0.999990 0.000010 << DX GENE FREQUENCIES 1 << NO. OF LIABILITY CLASSES 0.0000 1.0000 1.0000 << PENETRANCES (0, 1, and 2 dx alleles, respectively) 3 2 #MARKER << LOCUS CODE (use 3 for markers), NO. OF ALLELES, MARKER NAME (USE # TO START IT) 0.50000 0.50000 << ALLELE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.1000 << RECOMBINATION VALUES 1 0.050 0.1500 << REC VARIED, INCREMENT, FINISHING VALUE (for LOD scores at varying thetas) 0.200 0.10 0.400 << ADDITIONAL THETA LINE (begin, inc, end) ** You need to fill in the disease and marker name, as well as the number of alleles. The essential information for PDT is contained in lines 1, 4, 6, and 8. ** For an example using multiple markers, see the multi.dat example file provided with the download. Be sure that the first value of the first line is set to the number of markers plus 1 (for the disease locus). For more information about the *.dat file, see the online user's manual for the LINKAGE software package, accessible at http://linkage.rockefeller.edu/soft/linkage/. ********** Ped file ********** The ped file contains family structure information, as well as genotype information for all pedigrees in your data set. All individuals in a pedigree must be listed with sequential id numbers (ie, 1, 2, 3, ...), with no gaps in the sequence. For nuclear pedigrees, both parents of children must always be listed, even if both aren't genotyped. In larger pedigrees, all connecting relatives must be listed. For example, with two affected cousins, at a minimum all parents (4) and both grandparents would be listed. Individuals who aren't genotyped or who are missing a genotype would be listed with a "0 0" genotype. For more detail about formatting a pedigree file, please see the Pedigree Information (PEDFILE) section of the Linkage online user's guide at http://linkage.rockefeller.edu/soft/linkage/. Each line of the ped file should have the following format: Column 1: Pedigree number (must be an integer) Column 2: Individual ID number (must be an integer and in sequential order) Column 3: ID of father (0 if this individual is a founder) Column 4: ID of mother (0 if this individual is a founder) Column 5: First offspring ID Column 6: Next paternal sibling ID Column 7: Next maternal sibling ID Column 8: Sex (1=male, 2=female, 0=unknown) Column 9: Proband status (1=proband, all others have a 0 in this field.) Column 10: affection status (2=affected, 1=unaffected, 0=unknown) Column 11: liablity class (if # of liability classes > 1, otherwise, marker data starts here) Column 12+: marker data ("0 0" for missing genotype; "." is not allowed) The essential information for PDT is contained in columns 1,2,3,4,10 and 11 or 12+. Liability class information, if included in the *.ped file, isn't used in the calculation of the PDT statistics. However, this column must be present if you have specified more than 1 liability class in your *.dat file. ************ Running PDT ************ To run PDT, on the command line type: pdt example.dat example.ped example.out With triad_dsp and/or stat flags, a command might look like this: pdt example.dat example.ped example.out TD1 pdt example.dat example.ped example.out S2 pdt example.dat example.ped example.out TD2 S1 We recommend redirecting output which is normally printed to the screen into a log file in order to verify that data has been input as intended. To do this add "> logfile" to the end of your command: pdt example.dat example.ped example.out > example.log pdt example.dat example.ped example.out TD1 > example.log pdt example.dat example.ped example.out S2 > example.log pdt example.dat example.ped example.out TD2 S1 > example.log Note: For all two-point analyses (statistic flags S0, S1, S2, and S3), the PDT program expects one marker per pedigree file. The only exception to this rule is when using the multimarker.pl script provided with the download (see below "Multiple markers"). The multi-locus genoPDT option (statistic flag S4) expects all markers to appear in one pedigree file. ******** Output ******** The log file: Family information is printed to the screen. This includes: individual Id #, marker genotype(s) and affection status. This data can be output to a log file as well by redirecting the output to a file (see above). The main output file (*.out): The summary counts given are: Number of families used the number of extended pedigrees with at least one typed triad and/or dsp. Number of triads used the number of parent-affected child triads in which both parents and the affected child are typed. Number of discordant sib pairs used the number of typed discordant sib pairs The output file gives the test results for each allele, genotype, or combination of genotypes, as well as a global statistic for the marker locus or marker set. The counts of the number of individuals used to calculate the allele- or genotype-specific and global statistics are given below the table. Counts are the number of typed parents used for triads, and the number of typed affected and unaffected children in families with at least one discordant sibpair. If an affected sibling is used for more than one discordant sibpair, s/he is still only counted once in the table of observed counts. If you have a bi-allelic marker, the p-values for each allele and for the global test should all be the same. ********************** VERY IMPORTANT NOTES! ********************** 1. Check for genotype inconsistencies before using the program. PDT assumes the genotypes are correct. 2. Use integers for family number and individual numbers. Individuals need to be numbered sequentially and in ascending order. There should be no gaps in the sequence of individual numbers. 3. Use the SAME pedigree # for each individual in an extended pedigree. 4. Do not nuclearize the pedigrees first because that will lead to an invalid test of association. 5. For two-point analyses (stat flags S0, S1, S2, and S3), there must only be one marker per pedigree file. The exception to this is when using the multimarker.pl script (see next section). For multi-locus genoPDT analysis, all markers must appear in one pedigree input file. ***************** Multiple markers ***************** For two-point analysis of multiple markers, you can use the multimarker.pl perl program. For allele-specific analysis, type multimarker.pl multi.dat multi.ped pdt multi.out where multi.dat, multi.ped, and multi.out are filenames used as examples. You may name your files anything that you choose. For genotype-specific two-point analyses, you will need to type 'genopdt' on the command line: multimarker.pl multi.dat multi.ped genopdt multi.out The script creates separate *.dat and *.ped files for each marker and outputs all the results into a specified *.out file. Output normally printed to the screen (or redirected to a log file) is suppressed. When running analyses on multiple markers using multimarker.pl, you must put all marker data in a single ped file, and all marker information into a single *.dat file. **************** Example Data **************** The files example.dat, example.ped, and example.out are included in your download. An example log file (example.log) is also included. You may use these files to verify that the PDT program is running correctly on your system. Examples for input with multiple markers is now included in your download as well. These files are called multi.dat, multi.ped, and multi.out. *************** Modifications *************** Version 2.1 of PDT addresses the following: 1) There was a bug in the calculation of the Z statistic. In the case where there is a discordant sibship in an extended family or a nuclear family with informative parents in which the average number of times the allele of interest occurs in the affected sibs is equal to the average number of times it occurs in the unaffected sibs, then the sibship was not included in the calculation of the statistic but should have been. For example, if the following sibship with informative parents occurs in a pedigree, then it should have been included in the calculations. parent 1 -- 1 / 2 parent 2 -- 1 / 2 unaff sib -- 1 / 1 aff sib -- 1 / 2 unaff sib -- 2 / 2 So for the 1 allele: The average number of times it occurs in the affected sib is 1. The average number of times it occurs in the unaffected sibs is also 1. 2) The counts of informative independent pedigrees and informative discordant sibships used have been corrected. Version 2.11 of PDT addresses the following: Whether or not discordant sibpairs were informative was not being assessed for each allele separately. This caused an error in the statistic when using markers with more than 2 alleles. This has been resolved. For example, the following discordant sibpairs would have been counted as informative for all alleles (including allele 3), even though all should be considered "homozygous not-3", and thus not informative in that case. unaff sib -- 1 / 1 aff sib -- 1 / 2 unaff sib -- 2 / 2 Version 3.11 of PDT Version 3.11 of the PDT now calculates two PDT statistics: the avePDT and the sumPDT. Briefly, the sumPDT gives more weight to larger families, while the avePDT gives equal weight to all families in a data set. For more information on these statistics, please refer to Martin ER, et al., 2001 (listed above). The log file now lists D statistics for both statistics, unless a single statistic is specified by the user. Version 3.12 of PDT Version 3.12 of the PDT adds a new triad_dsp flag, TD3. When this option is chosen, only triads are used to calculate the PDT statistics, except in the case where one or the other parent is not genotyped. In such cases, discordant sib pairs are used if they are available for that nuclear family. Version 4.0 of PDT Version 4.0 of the PDT now allows you to calculate genotype-specific PDT statistics and a global PDT statistic across all observed genotypes. For more information on the geno-PDT, please refer to Martin ER, et al., 2002 (listed above). Version 5.1 of PDT Version 5.1 of the PDT allows the user to calculate the geno-PDT statistics for a combination of genotypes from a set of more than one marker. A global multi-locus geno-PDT score is also calculated.