|
KERNEL DENSITY PLOTName:
where K is the kernel function h is the smoothing parameter or window width. Currently, Dataplot uses a Gaussion kernel function. This downweights points smoothly as the distance from x increases. The width parameter can be set by the user (see Note: below), although Dataplot will provide a default width that should produce reasonable results for most data sets. A kernel density plot can be considered a refinement of a histogram or frequency plot.
where <x> is the variable of raw data values; and where the <SUBSET/EXCEPT/FOR qualification> is optional.
KERNEL DENSITY PLOT Y SUBSET TAG = 2 KERNEL DENSITY PLOT Y FOR I = 1 1 800
This algorithm is based on the Fast Fourier Transform (FFT). The use of the FFT results in much greater computational efficiency. The article that accompanies this algorithm provides the details of how the FFT is used and provides timing estimates of this implemenation relative to an algorithm based on the definition of the kernel function.
You can set the number of points for the density curve using the following command:
where <value> defines the number of points.
where s is the sample standard deviation and IQ is the sample interquartile range. Silverman provides justification for this choice. Basically, it should perform reasonably for a wide variety of distributions. Note that the optimal width depends on the underlying function, which is what we are trying to estimate. If the underlying data is in fact normally distributed, then Silverman (1986) shows that the optimal width is
where n is the number of points in the raw data and s is the sample standard deviation of the raw data. It may be worthwhile to generate the density curve using several different values for the width. Silverman also recommends trying to transform skewed data sets to be symmetric. The width can be set with the following command:
LET Y = NORMAL RANDOM NUMBERS FOR I = 1 1 1000 KERNEL DENSITY PLOT Y LET YPDF = YPLOT LET XPDF = XPLOT LET YCDF = CUMULATIVE INTEGRAL YPDF XPDF TITLE ESTIMATE OF UNDERLYING CUMULATIVE DISTRIBUTION PLOT YCDF XPDFYou can also obtain an estimate of the percent point function (inverse cdf) with the following additional commands: LET YPPF = XCDF LET XPPF = YCDFNote:
where YMINIMUM and YMAXIMUM are the minimum and maximum values of the raw data and H is the window width.
"Density Estimation for Statistics and Data Analysis", B. W. Silverman, Chapman & Hall, 1986. "Multivariate Density Estimation", David Scott, John Wiley, 1992.
MULTIPLOT SCALE FACTOR 2 MULTIPLOT 2 2 MULTIPLOT CORNER COORDINATES 0 0 100 100 . LET Y = NORMAL RANDOM NUMBERS FOR I = 1 1 1000 X3LABEL 1,000 NORMAL RANDOM NUMBERS KERNEL DENSITY PLOT Y . LET Y = LOGNORMAL RANDOM NUMBERS FOR I = 1 1 1000 X3LABEL 1,000 LOGNORMAL RANDOM NUMBERS KERNEL DENSITY PLOT Y . LET GAMMA = 2 LET Y = WEIBULL RANDOM NUMBERS FOR I = 1 1 1000 X3LABEL 1,000 WEIBULL RANDOM NUMBERS (GAMMA = 2) KERNEL DENSITY PLOT Y . LET Y = LOGISTIC RANDOM NUMBERS FOR I = 1 1 1000 X3LABEL 1,000 LOGISTIC RANDOM NUMBERS KERNEL DENSITY PLOT Y END OF MULTIPLOT
Date created: 8/14/2001 |