************************************************
                   *                                              *
		   *		     PDT 5.1                      *
                   *                                              *
                   *	written by: Eden Martin, Carol Haynes,    *
                   *	  Liling Warren, and Meredyth Bass        *
                   *                                              *
                   *                                              *
		   ************************************************

**************
Introduction
**************

PDT is designed to look at both linkage and association in extended pedigrees.  
It will perform allele-specific analysis and genotype-specific analysis 
for single markers.  In addition, PDT will now run analyses for combinations of 
genotypes over multiple marker loci.  This is not the same as haplotype analysis,
however, as no phase information is utilized.

PDT takes up to five command line arguments: *.dat, *.ped, *.out, the triad/
discordand sibpair(dsp) flag, and the statistic flag.  The first three arguments (the
names of input and output files) are required, and the last two use default values 
unless otherwise specified by the user.  The two input files for PDT are a *.dat 
file and a *.ped file which are in a (post-MAKEPED) format readable by the LINKAGE 
programs. You can use publically available MAKEPED and PRELINK programs to make these 
files. The triad/dsp flag values are 
	TD0 	all individuals [default], 
	TD1 	dsp's only, 
	TD2 	triads only, or 
	TD3 	use only triads unless both parents aren't typed, then use 
		dsp's if they are available for that family.  

The statistic flag values are 
	S0 	both allele-specific statistics, sumPDT and avePDT [default], 
	S1 	avePDT only, 
	S2 	sumPDT only, 
	S3 	geno-PDT, or
	S4	multi-locus geno-PDT.  
	
Here are some examples of commands executed on the command 
line (for more details, see below):

	pdt example.dat example.ped example.out > example.log
	pdt example.dat example.ped example.out TD0 S0 > example.log


**********************
References
**********************

When publishing PDT analyses, please cite the following references:

"A Test for Linkage and Association in General Pedigrees: The Pedigree Disequilibrium 
Test" by Martin ER, Monks SA, Warren LL, and Kaplan NL. Am J Hum Genet 67:146-154, 
2000.

"Correcting for a Potential Bias in the Pedigree Disequilibrium Test" by Martin ER, Bass
MP, and Kaplan NL.  Am J Hum Genet 68:1065-1068, 2001.

When using the geno-PDT, please site the following reference:

"A genotype-based association test for general pedigrees: the genotype-PDT" by Martin ER, Bass
MP, Gilbert JR, Pericak-Vance MA, and Hauser ER.  Genetic Epid. 2003. (in press)

**********************
System Information
**********************

Solaris 
The PDT was compiled in C++ on a Solaris 7 (unix) workstation using the gcc 2.95.3
compiler.  

Linux
The PDT was compiled in C++ on a Linux 2.4 workstation using the gcc 3.2 compiler.

If you require an alternate version, please check our web site at 
http://wwwchg.duhs.duke.edu/index.html to see whether additional versions have been
made available.

For technical support (including requests for alternate platforms), please feel free
to contact pdt@chg.duhs.duke.edu.

**********************
Disclaimer
**********************

No warranty, either expressed or implied, is made with respect to the
functioning and accuracy of this program.  No responsibility is assumed
by the authors.  Please report any problems or bugs to the authors.

Please do not alter or distribute this program without the permission of 
the authors.  A patent for the methods implemented in this program is 
pending.

*********
Dat file
*********
The dat file should look like the following (comments -- "<< ..." should NOT be included):

2 0 0 5 		<< NO. of LOCI, RISK LOCUS, RISK ALLELE, SEXLINKED (IF 1) PROGRAM
0 0.0 0.0 0 		<< MUT LOCUS, MUT RATE, HAP GREQUENCIES (IF 1)
 1 2			<< DX ALLELES (normal, affected)
 1 2 #DISEASE 		<< LOCUS CODE (use 1 for dx locus), NO. OF ALLELES, DX NAME (USE # TO START IT) 
0.999990 0.000010       << DX GENE FREQUENCIES
 1    			<< NO. OF LIABILITY CLASSES
0.0000 1.0000 1.0000 	<< PENETRANCES (0, 1, and 2 dx alleles, respectively)
 3 2 #MARKER 		<< LOCUS CODE (use 3 for markers), NO. OF ALLELES, MARKER NAME (USE # TO START IT)
0.50000 0.50000 	<< ALLELE FREQUENCIES
0 0 			<< SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2)
0.1000 			<< RECOMBINATION VALUES
1 0.050 0.1500 		<< REC VARIED, INCREMENT, FINISHING VALUE (for LOD scores at varying thetas)
0.200 0.10 0.400	<< ADDITIONAL THETA LINE (begin, inc, end)

** You need to fill in the disease and marker name, as well as the number of alleles. 
   The essential information for PDT is contained in lines 1, 4, 6, and 8.

** For an example using multiple markers, see the multi.dat example file provided with the
   download.  Be sure that the first value of the first line is set to the number of markers
   plus 1 (for the disease locus).

For more information about the *.dat file, see the online user's manual for the LINKAGE
software package, accessible at http://linkage.rockefeller.edu/soft/linkage/.

**********
Ped file
**********
The ped file contains family structure information, as well as genotype information for
all pedigrees in your data set.  All individuals in a pedigree must be listed with sequential
id numbers (ie, 1, 2, 3, ...), with no gaps in the sequence.  For nuclear pedigrees, 
both parents of children must always be listed, even if both aren't genotyped.  In larger 
pedigrees, all connecting relatives must be listed.  For example, with two affected cousins, 
at a minimum all parents (4) and both grandparents would be listed.  Individuals who aren't 
genotyped or who are missing a genotype would be listed with a "0 0" genotype.  For more detail
about formatting a pedigree file, please see the Pedigree Information (PEDFILE) section of 
the Linkage online user's guide at http://linkage.rockefeller.edu/soft/linkage/.

Each line of the ped file should have the following format:

Column 1: Pedigree number (must be an integer)
Column 2: Individual ID number (must be an integer and in sequential order)
Column 3: ID of father (0 if this individual is a founder)
Column 4: ID of mother (0 if this individual is a founder)
Column 5: First offspring ID
Column 6: Next paternal sibling ID
Column 7: Next maternal sibling ID
Column 8: Sex (1=male, 2=female, 0=unknown)
Column 9: Proband status (1=proband, all others have a 0 in this field.)
Column 10: affection status (2=affected, 1=unaffected, 0=unknown)
Column 11: liablity class (if # of liability classes > 1, otherwise, marker data starts here)
Column 12+: marker data ("0 0" for missing genotype; "." is not allowed)

The essential information for PDT is contained in columns 1,2,3,4,10 and 11 or 12+.
Liability class information, if included in the *.ped file, isn't used in the
calculation of the PDT statistics.  However, this column must be present if you
have specified more than 1 liability class in your *.dat file.

************
Running PDT
************

To run PDT, on the command line type:
  pdt example.dat example.ped example.out 

With triad_dsp and/or stat flags, a command might look like this:
  pdt example.dat example.ped example.out TD1
  pdt example.dat example.ped example.out S2
  pdt example.dat example.ped example.out TD2 S1

We recommend redirecting output which is normally printed to the screen into a log file
in order to verify that data has been input as intended.  To do this add "> logfile"
to the end of your command:

  pdt example.dat example.ped example.out > example.log
  pdt example.dat example.ped example.out TD1 > example.log
  pdt example.dat example.ped example.out S2 > example.log
  pdt example.dat example.ped example.out TD2 S1 > example.log

Note: For all two-point analyses (statistic flags S0, S1, S2, and S3), 
the PDT program expects one marker per pedigree file.  The only exception
to this rule is when using the multimarker.pl script provided with the 
download (see below "Multiple markers").  The multi-locus genoPDT option 
(statistic flag S4) expects all markers to appear in one pedigree file.

********
Output
********

The log file:

Family information is printed to the screen. This includes:
individual Id #, marker genotype(s) and affection status.  This data can be output
to a log file as well by redirecting the output to a file (see above).


The main output file (*.out):

The summary counts given are:

Number of families used 		the number of extended pedigrees with at least
					one typed triad and/or dsp.
Number of triads used 			the number of parent-affected child triads in
					which both parents and the affected child are typed. 
Number of discordant sib pairs used 	the number of typed discordant sib pairs 

The output file gives the test results for each allele, genotype, or combination of
genotypes, as well as a global statistic for the marker locus or marker set. The counts of the 
number of individuals used to calculate the allele- or genotype-specific and global 
statistics are given below the table.  Counts are the number of typed parents used for 
triads, and the number of typed affected and unaffected children in families with at 
least one discordant sibpair.  If an affected sibling is used for more than one discordant 
sibpair, s/he is still only counted once in the table of observed counts.  If you have 
a bi-allelic marker, the p-values for each allele and for the global test should all be the same.

**********************
VERY IMPORTANT NOTES!
**********************
1. Check for genotype inconsistencies before using the program. PDT assumes the 
	genotypes are correct.
2. Use integers for family number and individual numbers. Individuals need to be 
	numbered sequentially and in ascending order.  There should be no gaps in the 
	sequence of individual numbers.
3. Use the SAME pedigree # for each individual in an extended pedigree.
4. Do not nuclearize the pedigrees first because that will lead to an invalid test of 
	association.
5. For two-point analyses (stat flags S0, S1, S2, and S3), there must only be one
	marker per pedigree file.  The exception to this is when using the 
	multimarker.pl script (see next section).  For multi-locus genoPDT analysis,
	all markers must appear in one pedigree input file.

*****************
Multiple markers
*****************

For two-point analysis of multiple markers, you can use the multimarker.pl perl 
program.  For allele-specific analysis, type

  multimarker.pl multi.dat multi.ped pdt multi.out

where multi.dat, multi.ped, and multi.out are filenames used as examples.  You may
name your files anything that you choose.  For genotype-specific two-point analyses, 
you will need to type 'genopdt' on the command line:

  multimarker.pl multi.dat multi.ped genopdt multi.out

The script creates separate *.dat and *.ped files for each marker and outputs all the 
results into a specified *.out file.  Output normally printed to the screen (or redirected
to a log file) is suppressed.  When running analyses on multiple markers using
multimarker.pl, you must put all marker data in a single ped file, and all marker
information into a single *.dat file.


****************
Example Data
****************
The files example.dat, example.ped, and example.out are included in your 
download.  An example log file (example.log) is also included.  You may use 
these files to verify that the PDT program is running correctly on your 
system.

Examples for input with multiple markers is now included in your download
as well.  These files are called multi.dat, multi.ped, and multi.out.


***************
Modifications
***************
Version 2.1 of PDT addresses the following:

1) There was a bug in the calculation of the Z statistic.  In the case where
   there is a discordant sibship in an extended family or a nuclear family with
   informative parents in which the average number of times the allele of 
   interest occurs in the affected sibs is equal to the average number of times 
   it occurs in the unaffected sibs, then the sibship was not included in the 
   calculation of the statistic but should have been.  For example, if the 
   following sibship with informative parents occurs in a pedigree, then it 
   should have been included in the calculations.

                       parent 1 -- 1 / 2
                       parent 2 -- 1 / 2
                  
                      unaff sib -- 1 / 1
		        aff sib -- 1 / 2
                      unaff sib -- 2 / 2
   
   So for the 1 allele:  The average number of times it occurs in the affected 
   sib is 1.  The average number of times it occurs in the unaffected sibs is
   also 1.  
   
2) The counts of informative independent pedigrees and informative discordant
   sibships used have been corrected.

Version 2.11 of PDT addresses the following:

   Whether or not discordant sibpairs were informative was not being assessed
   for each allele separately.  This caused an error in the statistic when using 
   markers with more than 2 alleles.  This has been resolved.

   For example, the following discordant sibpairs would have been counted as
   informative for all alleles (including allele 3), even though all should be
   considered "homozygous not-3", and thus not informative in that case.

 		    unaff sib -- 1 / 1
                      aff sib -- 1 / 2
                    unaff sib -- 2 / 2

Version 3.11 of PDT
  
   Version 3.11 of the PDT now calculates two PDT statistics: the avePDT and 
   the sumPDT.  Briefly, the sumPDT gives more weight to larger families, while
   the avePDT gives equal weight to all families in a data set.  For more 
   information on these statistics, please refer to Martin ER, et al., 2001 
   (listed above). 

   The log file now lists D statistics for both statistics, unless a single
   statistic is specified by the user.

Version 3.12 of PDT 

   Version 3.12 of the PDT adds a new triad_dsp flag, TD3.  When this option is
   chosen, only triads are used to calculate the PDT statistics, except in the 
   case where one or the other parent is not genotyped.  In such cases, discordant
   sib pairs are used if they are available for that nuclear family.

Version 4.0 of PDT

   Version 4.0 of the PDT now allows you to calculate genotype-specific PDT 
   statistics and a global PDT statistic across all observed genotypes.  
   For more information on the geno-PDT, please refer to Martin ER, et al.,
   2002 (listed above).

Version 5.1 of PDT

   Version 5.1 of the PDT allows the user to calculate the geno-PDT statistics for a
   combination of genotypes from a set of more than one marker.  A global multi-locus 
   geno-PDT score is also calculated.