Peter J. Munson

Unit on Biostatistical Methodology

Laboratory of Theoretical and Physical Biology

National Institute of Child Health and Human Development

National Institutes of Health

Bethesda, Md 20892 USA

November 1991

second printing

Acknowledgment

The author wishes to acknowledge the contributions of many people to this project. Most notable are Drs. David Rodbard and Karen Oerter, whose work with the DETECT project originally sparked my interest. Ms. Kim T. Chen converted the MATLAB code for PULSEFIT into a reliable, portable Fortran 77 code on the IBM PC. Dr. Alex Genazzani provided substantial data sets which motivated many of the techniques used in PULSEFIT. Dr. Vincenzo Guardabasso provided many helpful discussions and suggestions about the program.

Mr. Arnold Cushing aided in the creation and production of the user guide, especially with the program GRAFIT.

NOTICES

The PULSEFIT program and accompanying material is provided for scientific research purposes only. It is not intended for applications in clinical practice, except in a research setting. No claims of suitability, fitness for a particular purpose, or merchantability are made by the author, National Institutes of Health or any government agency. The sole liability for its use lies with the user of the program. 

Table of Contents

Appendix:

Pulse Detection in Hormone Data: Simplified, Efficient Algorithm 18

1 Introduction

PULSEFIT is a program used to analyze a time series of (usually pituitary) hormone data. The method is fully described in the accompanying reprint. Briefly, PULSEFIT uses a strict mathematical definition of a "pulse" as an instantaneous rise followed by an exponential decay, to model circulating hormone measurements. The optimal peak heights and clearance rate is determined by least squares. The optimal peak locations are determined by stepwise regression, and the optimal peak number (or peak frequency) is determined by minimizing the predictive error (using the GCV index).

Moreover, since it has been observed that the most interesting factor, the peak frequency is often poorly determined by data, we provide an alternative, more precise index of pulsatility (PULSATILITY INDEX), which quantitatively and objectively measures the "pulsiness" of a series. Specifically, the Pulsatiliy Index is defined to be the standard deviation of the positively constrained, optimal, discrete deconvolution of the logarithm of the circulating hormone measurements. The Pulsatility Index has been shown to be a sensitive measure of clinical differences between groups of patients when measuring LH during puberty, and in adults (in a forthcomming publication).

The program is written in Fortran 77, and implemented for an IBM-PC microcomputer. The program requires an input data file (ASCII format) containing a single column of numbers representing the series of hormone values. In addition, the program requires the user to enter certain additional information. Upon completion of the calculation, the program produces several output files. The .GRF file is suitable for graphing using the program GRAFIT, included on the disk.

Version 1.1 was issued in November 1990. Several bugs were removed and version 1.2 came out in July 1991.

2 Distribution Diskette

Pulsefit is distributed on a 5 1/4 inch floppy diskette. The distribution diskette should contain the following files:

Don't worry if the files PULSE.SSN and/or SERIES1.GRF are missing.

PULSEFIT.EXE should run on any IBM-PC or IBM compatable. PULSE87.EXE will only run on PCs equipped with a math coprocessor

.

3 Input File

In our example, the single input file is named SERIES1, although any other name is suitable. Create the input file on your disk with any word processor or editor (save in ASCII mode) or even with a spreadsheet such as LOTUS 1-2-3 (save a .PRN file). When you TYPE the file it should look like this

A>TYPE SERIES1

6.437
4.678
5.468
4.595333
5.057666
5.886666
5.637
8.125
9.545666
8.718
8.441666
8.362666
7.86
7.559
7.207
7.291
7.460333
7.285
6.333666
6.316333
5.765
6.553666
6.522333
5.736
5.768333
5.398333
5.872
5.707
5.426
5.083333
4.955666
4.993333
5.006666
4.7185
7.314
4.747
7.919
7.493666
7.871333
7.652
6.697666
6.570666
6.917666
6.643
6.041666
5.930666
5.488666
5.924333
6.277
5.867
5.68
6.118333
6.047333
5.626
5.520333
5.464333
5.519666
5.322
5.646333
5.887333
5.826666
5.128
5.169333
4.772333
5.244333
5.812333
7.196333
7.342
8.474
8.436
9.475
7.781333
7.994
8.340666
6.438333
7.436
7.754
8.124
7.842
8.332666
7.084
6.407666
6.180333
4.837
6.6
6.861
5.981333
6.565333
5.992666
6.465333
6.341666
6.682666
5.653333
6.335666
5.609666
4.884666
5.76
6.688666
7.801666
9.122
8.806666
8.406
8.364333
7.904666
7.854333
7.303333
7.097666
7.007666
6.704
6.553666
6.879
6.336666
6.537666
6.318666
6.076
5.799666
6.988666
6.165666
5.259333
6.116
5.592
5.116
5.095333
4.941333
7.147666
5.083333
5.443
6.504666
5.175333
5.353
4.5
4.759666
5.651333
3.877
6.042
6.598333
5.035
6.232
5.957
6.266
6.321666

4 Running Pulsefit

After you have entered the data file, run the program PULSEFIT

A> PULSEFIT (What the user typed is shown in bold)

*******************************************
*******************************************
*****                                                                  *****
*****                  WELCOME TO                             *****
*****                                                                  *****
*****           PULSEFIT  VERSION 1.3                   *****
*****                                                                  *****
*******************************************
*******************************************

ESTIMATE CLEARANCE RATE PARAMETER?  (Y) --> N 
	
Entering N results in the program 
estimating the Clearance Rate (Yes
and No are inadvertantly switched in
Version 1.2). 

TIME UNIT PER SAMPLING INTERVAL --> 5
 
For this study, the sampling interval

(time between two successive

samples) was 5 minutes. ONE FILE NAME PER LINE OR [RETURN] FOR "NO MORE" ENTER DATA FILE NAME -->series1 Type in the file name ENTER DATA FILE NAME --> RETURN Type in a second name, or hit the ENTER key to go on. DATA FILE series1 OPENED. FILE series1 CONTAINS 141 DATA VALUES. ******************************************* ********** RESULTS OF PULSEFIT ********** ******************************************* DATA FILENAME = series1 MEAN = 6.410661 STANDARD DEVIAION = 1.160897 NUMBER OF DATA POINTS = 141 STARTING PEAK = 1.771969 CLEARANCE RATE (PER TIME UNIT) = -4.656681E-03 NUMBER OF PEAKS = 12 NUMBER OF PEAKS IN 5% GCV RANGE = [ 5 <-->13 ] T_1/2 = 148.850 SUM OF SQUARES FOR ERROR = .926987 MINIMUM GCV * 10000 = .908722 MEAN SQUARE ERROR = .729915E-02 RMS = .854350E-01 MEAN PEAK HEIGHT = .270038 PULSATILITY INDEX = .8483670E-01 PLOT FILE: series1.GRF IS CREATED.< THE RESULTS OF THIS FIT IS SAVED IN PULSE.SSN. Stop - Program terminated. EXPLANATIONS:

MEAN - The arithmetic mean of the original data points

STANDARD DEVIATION - The sample standard deviation of the original data series

NUMBER OF DATA POINTS - check this to make sure all points were read in

STARTING PEAK - Every series requires at least 1 initial peak, po in the published description of PULSEFIT. Its value is the fitted value for the first data point in the natural log scale.

CLEARANCE RATE - Estimated rate of hormone clearance. In this case, units are [per minute] because the sampling interval was given in minutes.

NUMBER OF PEAKS - Estimated number of peaks in this series, NOT counting the inital peak

NUMBER OF PEAKS IN 5% GCV RANGE - This is a rough confidence interval for the number of peaks. A model with a number of peaks within this range would give a GCV score within 5% of the minimum GCV score obtained by any model. We do not include the intial peak in this count.

T_1/2 - Half life for the hormone being measured, calculated from the clearance rate. T1/2 = 0.69314/ Clearance Rate.

MEAN SQUARE ERROR - Average squared deviations (MSE) of data from fitted model, in natural log scale.

RMS - Square root of the Mean Square Error. Because we have used natural logarithms, this is also approximately a Coefficient of Variation measure in the original units. Thus, a value of 0.08 corresponds to an average 8% deviation of the observed from the fitted LH concentration.

GCV - Minimum Generalized Cross Validation score. The GCV is used to pick from among the models fit to the data. The model with the lowest GCV score is likely to be best. We have multiplied the GCV value by 10,000 before printing to make the magnitude more convenient.

GCV = SS/(n -3(p+1)-k)2, where n is the number of observations, 141 in this case, and p is the number of peaks, or 12 in this case.

THERE IS A SLIGHT DETAILED PROBLEM HERE, AS THE FACTOR SHOULD BE 2, NOT 3!

MSE - Average squared departure from model, MSE = SS/(n-p-2) = .927/127 in this case, where p=number of peaks ( not including first).

RMS - Square root of the MSE

MEAN PEAK HEIGHT- Mean value of the non-zero peaks found by the fitting procedure. In this case this is the mean of 12 numbers.

PULSATILITY INDEX - Standard deviation of ISR. The ISR is the instantaneous secretion rate, or the discrete, postitvely constrained, deconvolution of the data, in the natural log scale. The pulsatility index is a more stable measure of the importance of the peaks found in this series, than simply the peak count.

Pulsefit Notes

- Each Pulsefit run can accept at most 20 data files.

- Each data file can contain at most 200 data values. No non-numeric characters areallowed.

- The default K value is 1.

- the sampling interval cannot be zero.

- One session file is generated for each run and named PULSE.SSN. The order of parameters are the same as screen output.

- Three different plots are generated and kept in <fname.grf> for each run. Where fname is the name of your input data file.

Graphics

Three graphs of the calculated data are available. They can be produced on your screen using the accompanying program GRAFIT.

First is a plot of the natural logarithm of the original data versus sample number. Use this plot to judge if the fit of the model to the data is adequate.

Second, is a plot of the original and fitted data, the detected peaks (or ISR), and the residuals (observed data minus fitted value), all shown on a natural log scale, versus sample number. The upper trace is similar to the first plot. The middle trace gives the information about the position and relative size of the peaks. The bottom trace can again be used to judge the adequacy of fit. Obvious patterns in the lower trace suggest that the model did not fully capture all of the "signal" in the data stream. Ideally, the lower trace should look like "white" noise.

Third, we give a plot summarizing the various models fit by program PULSEFIT, before it arrived at the GCV minimal model. Again, we use a natural log scale (although future versions will use decimal logs) on the vertical axis. The number of peaks in each model is plotted along the horizontal axis. First, the GCV index for each model is shown by the closed-square symbol. (Ignore the initial rise in this curve.) The chosen model should coincide with the minimum of this curve. Next, is the clearance rate (open square) estimated for each model. Next, is the average estimated pulse height and standard deviation of pulse heights (Pulseindex), estimated for each model (X and diamond).

(Note that there appears to be a bug in Version 1.1 with this plot).

Output Files

After the program completes, two output files are created. One of them PULSE.SSN contains simply the output information already on the screen. However, this file is a handy place to store the results of many files for further analysis, possibly in a spread sheet. The file PULSE.SSN should look like the following

A> TYPE PULSE.SSN

"series1 " ,6.410661, 1.165036, 1.771969,

-4.656681E-03, 12, 5, 13,

148.850100, 9.269873E-01, 9.087220E-01, 7.299152E-03

8.543508E-02, 2.700375E-01, 8.483670E-02

A second file contains graphic information, and is rather lengthy so we only list part of the file. This file is ordinarily read by the program GRAFIT, and some useful plots produced.

The file SERIES1.GRF looks like this

A> TYPE SERIES1.GRF

PULSEFIT V. 1.1 == DATA AND ESTIMATES == DATA FILE: series1

1

141

1.000000 141.000000 20.000000

1.355062 2.256087 2.252564E-01

1.000000 1.862063

2.000000 1.542871

3.000000 1.698913

4.000000 1.525041

5.000000 1.620905

6.000000 1.772690

.

.

.

32.000000 -1.907713 33.000000 -1.908686

33.000000 -1.908686 34.000000 -1.909556

34.000000 -1.909556 35.000000 -1.910338

5 Creating Plots on the IBM-PC

The GRAFIT program is an IBM-PC based program designed to assist users in plotting the results obtained from ALLFIT, FLEXIFIT, PULSEFIT and EXPFIT.

The GRAFIT program plots the data written to .GRF files by the applications mentioned above. GRAFIT has been designed to work on machines with the following graphics cards.

The type of video card is automatically detected by the program.

GRAFIT now supports laser printers that use the picture drawing language postscript. The best hard copy plots are achieved using a laser printer. Dot matrix printers are still supported. Previously, the plots were produced at screen resolution, the software has now been modified to take advantage of the higher resolution available on 9 pin dot matrix printers. GRAFIT uses a printing protocol which works on most dot matrix printers.

In order to support still more printers, GRAFIT is capable of producing a Lotus 123 .PIC file. This file can be printed using the print graph program distributed with Lotus 123 version 2.

GRAFIT Operation

It is a good idea to make a backup copy of the distribution disk before using any of the programs.

The GRAFIT program exits as a single executable file. You may run it from a floppy drive or copy it to your hard disk.

Start the program by typing grafit at the DOS prompt.

> grafit

You will be prompted for a .grf file name.

Graphics file name (no extension) ? series1

(Don't enter the .grf file extension, it is assumed.)

The file series1.grf, found on the Pulsefit distribution diskette, will be used to demonstrate GRAFIT operation.

The following plot will be displayed on the screen.

The line at the bottom of the screen contains 7 different commands...

C:Change N:Next F:First K:Key P:Print R:Restart Q:Quit

A command is entered by typing the first letter of the command. No return is necessary. These commands are only active while a plot is on the screen and the commands are displayed at the bottom of the screen.

Change Command

Type the letter 'C', the following screen will be displayed.

Graph Set Up Screen

Label 1: PULSEFIT V. 1.3 == DATA AND ESTIMATES

Label 2: DATA FILE: series1.dat

Title: title

X label: x

Y label: y

x min: -1.0000E+01

x max: -3.0000E+00

y min: 5.0000E+03

y max: 1.3000E+04

X divisions (1-15): 2

Y divisions (1-15): 2

Size of plotted points (1-10): 3

F2:Accept new values Esc:Cancel

You can change the plot labels or axis limits. The number of tick marks on the X and Y axis can be adjusted along with the size of the symbols used to plot the data points. Strike the Esc key to ignore your changes and return to the plot. F2 will register your changes and display the updated plot.

Next Command

To display the next plot in the file, type the letter 'N'. The graph file will be read and the next plot will be displayed on the screen. If no more plots are in the file a message to this effect will be displayed.

First Command

You can display the first plot in the file at any time by typing the letter 'F'.

Key Command

After entering 'K' from the keyboard you will be presented with the following screen.

Key Set Up Screen

Write curve labels? (Y/N): N

Name of curve 1: Draw curve 1? (Y/N): Y

Name of curve 2: Draw curve 2? (Y/N): Y

F2:Accept new values Esc:Cancel

You may enter a label for each curve in the plot. Initially the key is not displayed. Display the key by answering 'Y' to the question 'Write curve labels?' . The key consisting of the curve labels and associated symbols will be displayed to the right of the plot. Be careful when printing not to position the plot too far to the right as this may push the key off of the page.

You may choose to display or not display any particular curve. Answering yes to the question 'Draw curve 1?' will cause the curve to be drawn; answer no and the curve will not be drawn.

Print Command

GRAFIT supports several methods of printing your plots. Printing may be done directly to a 9 pin dot matrix printer, a postscript file may be created and lastly a Lotus 123 .PIC format file may be created. All of these methods are initiated from the print screen.

While a plot is displayed on the screen, entering 'P' will result in the display of the print menu.

Printer Set Up Screen

These coordinates alter the size and position of the plot box.

The lower left hand corner of the page is (0,0).

Upper Left x page coordinate (inches): 2.00

Upper Left Y page coordinate (inches): 9.00

Lower Right x page coordinate (inches): 7.00

Lower Right y page coordinate (inches): 6.00

Post Script File (.ps will be appended): series1

Lotus 123 file (.pic will be appended):..series1

F1:PostScript F2:Save F3:Dot Matrix F4:Lotus Esc:Cancel

Creating a PostScript File

Entering F1 will result in the creation (or replacement) of the postscript file gsample.ps. The default postscript file name is the prefix of the .grf file. You may enter your own file name instead of using the default. To print this file you will have to exit this program (i.e. hit escape and then Quit), and then download the postscript file gsample.ps to your laser printer. The page coordinates apply to postscript file.

Dot Matrix Printing

Enter F3 to print the plot on a dot matrix printer hooked up to printer port LPT1. Once printing has started you may interrupt it by striking any key. GRAFIT supports 9 pin dot matrix printers which emulate the Epson FX series of printers. This method creates a plot at a much higer resolution than available with screen dumps. You may alter the size of the plot using the page coordinates.


Creating Lotus 123 Format .PIC Files

Entering F4 will result in the creation of the Lotus 123 .PIC file gsample.pic. (You may enter a different file name. .PIC is always appended.) This .PIC file may be printed using the Lotus 123 program Printgraph which is supplied with version 2 of Lotus 123. This enables GRAFIT to indirectly support a wide range of printers. Note, the page coordinates do not apply to this method.

F1, F2, F3 or F4 will result in your changes being saved. If you make changes in this screen and then enter Esc, without having entered F1, F2 or F3, then your changes will not be saved.

Restart Command

The restart command is used to change the .grf file.

Quit Command

Enter 'Q' to exit the program. Again, this command is only active when a plot is displayed.

Prt-Screen Keyboard Key

The prt-scr key is supported only when a plot is being displayed. Screen dumps are supported in Hercules, CGA, EGA and VGA video modes. You do not have to have the MS-DOS graphics driver loaded to use this function.

GRAFIT File Format

It is not necessary to know anything about the format of the .grf files in order to create plots. This information is provided for those who may wish to input the data into other plotting packages or perhaps develop their own plotting routines.

The file format for GRAFIT is discussed below, along with a listing of the sample graph file. gsample.grf. (This file is included on the distribution diskette (GSAMPLE.GRF).

The number of lines in any graph file is variable, depending on the number of curves. Here is a listing of GSAMPLE.GRF.

ALLFIT-PC: GSAMPLE.ALL FIT # 1

SESSION: GSAMPLE.SES

These first two lines are graph labels. They will appear at the top of the graph.

2

This line is the number of curves.

10 10

These two numbers indicate how many points are in each curve respectively. Remember that if the number of curves were 3, then there would be three numbers here.

-10 -3 1

5500 13000 999.9999

The first line is the minimum x value, the maximum x value, and the default step value for the x axis. The second line contains the same values for the y axis. These values are for the whole graph, not any specific curve.

-10 12570

-9 12208

-8 11789

-7.522879 11273

-7 10382

-6.522879 9828

-6 7918

-5.522879 6351

-5 6135

-4 5724

These points are the x,y pairs for the 10 points in curve 1.

-10 12556

-9 12421

-8 11743

-7.522879 11287

-7 11333

-6.522879 10328

-6 9443

-5.522879 8610

-5 7853

-4 5984

These points are the x,y pairs for the 10 points in curve 2.

55

This number is the number of lines to follow. The lines contain x,y pairs for the two end points of line segments. These line segments make up the curve on the graph. These are for curve 1.

-10 12299.78 -9.872727 12296.29

-9.872727 12296.29 -9.745455 12291.98

-9.745455 12291.98 -9.618182 12286.63

-9.618182 12286.63 -9.49091 12280.02

-9.49091 12280.02 -9.363637 12271.83

.

.

Not all 55 lines of numbers are shown here.

.

.

-3.509097 5503.595 -3.381825 5492.997

-3.381825 5492.997 -3.254552 5484.425

-3.254552 5484.425 -3.127279 5477.498

-3.127279 5477.498 -3.000007 5471.9

55

This number is the number of lines to follow. The lines contain x,y pairs for the two end points of line segments. These line segments make up the curve on the graph. These are for curve 2.

-10 12459.92 -9.872727 12443.04

-9.872727 12443.04 -9.745455 12424.3

-9.745455 12424.3 -9.618182 12403.5

-9.618182 12403.5 -9.49091 12380.42

-9.49091 12380.42 -9.363637 12354.82

.

.

Not all 55 lines of numbers are shown here.

.

.

-3.509097 5387.667 -3.381825 5248.426

-3.381825 5248.426 -3.254552 5118.917

-3.254552 5118.917 -3.127279 4998.845

3.127279 4998.845 -3.000007 4887.851

End of graph file. GRAFIT can read multiple sets of curve data. For another graph, the above format of input can be repeated in the same file. Make sure not to put any blank lines in the graph file.

APPENDIX:

"Pulse Detection in Hormone Data: Simplified, Efficient Algorithm"

Munson, P.J. and Rodbard, D., in 1989 Proceedings of the Statistical Computing Section of the American Statistical Association, Washington, DC.


Last updated : 1/20/2000