SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Auxiliary Chapter

KERNEL DENSITY PLOT

Name:
    KERNEL DENSITY PLOT
Type:
    Graphics Command
Purpose:
    Generates a kernel density plot.
Description:
    The kernel density estimate, f(n), of a set of n points from a density f is defined as:

      f(n)(x) = (1/(n*h))*SUM[j=1 to n]K{(x - X(j)/h}

    where K is the kernel function h is the smoothing parameter or window width.

    Currently, Dataplot uses a Gaussion kernel function. This downweights points smoothly as the distance from x increases. The width parameter can be set by the user (see Note: below), although Dataplot will provide a default width that should produce reasonable results for most data sets.

    A kernel density plot can be considered a refinement of a histogram or frequency plot.

Syntax:
    KERNEL DENSITY PLOT <x>             <SUBSET/EXCEPT/FOR qualification>
    where <x> is the variable of raw data values;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    KERNEL DENSITY PLOT TEMP
    KERNEL DENSITY PLOT Y SUBSET TAG = 2
    KERNEL DENSITY PLOT Y FOR I = 1 1 800
Note:
    Dataplot computes the kernel density estimate using Algorithm 176 from Applied Statistics (see Reference below). This code was contributed by B. W. Silverman.

    This algorithm is based on the Fast Fourier Transform (FFT). The use of the FFT results in much greater computational efficiency. The article that accompanies this algorithm provides the details of how the FFT is used and provides timing estimates of this implemenation relative to an algorithm based on the definition of the kernel function.

Note:
    By default, the density curve is generated with 256 points. Note that this is the number of points on the density curve, not the number of points in the raw data.

    You can set the number of points for the density curve using the following command:

      KERNEL DENSITY POINTS <value>

    where <value> defines the number of points.

Note:
    Following the recommendation of Silverman (1986), DATAPLOT uses a default width of

      0.9*min(s,IQ/1.34)*n-1/5

    where s is the sample standard deviation and IQ is the sample interquartile range. Silverman provides justification for this choice. Basically, it should perform reasonably for a wide variety of distributions. Note that the optimal width depends on the underlying function, which is what we are trying to estimate.

    If the underlying data is in fact normally distributed, then Silverman (1986) shows that the optimal width is

      1.06*s*n**(-1/5)

    where n is the number of points in the raw data and s is the sample standard deviation of the raw data.

    It may be worthwhile to generate the density curve using several different values for the width. Silverman also recommends trying to transform skewed data sets to be symmetric.

    The width can be set with the following command:

      KERNEL DENSITY WIDTH <value>
Note:
    Dataplot will not generate the density curve unless the input data set contains at least 20 data points. In fact, the sample size should be larger than this for density plots to be an appropriate mehtod.
Note:
    The KERNEL DENSITY PLOT command generates an estimate of the underlying density function. You can convert this to an estimate of the cumulative distribution function by integrating the density estimate. The following shows an example of doing this in Dataplot.
             LET Y = NORMAL RANDOM NUMBERS FOR I = 1 1 1000
             KERNEL DENSITY PLOT Y
             LET YPDF = YPLOT
             LET XPDF = XPLOT
             LET YCDF = CUMULATIVE INTEGRAL YPDF XPDF
             TITLE ESTIMATE OF UNDERLYING CUMULATIVE DISTRIBUTION
             PLOT YCDF  XPDF
          
    You can also obtain an estimate of the percent point function (inverse cdf) with the following additional commands:
             LET YPPF = XCDF
             LET XPPF = YCDF
          
Note:
    Dataplot computes the density curve from

      (YMINIMUM - 3*H, YMAXIMUM + 3*H)

    where YMINIMUM and YMAXIMUM are the minimum and maximum values of the raw data and H is the window width.

Default:
    The default window width is 1.06*s*n**(-1/5) where n is the number of points in the raw data and s is the sample standard deviation. The density trace is generated with 256 points.
Synonyms:
    The following are synonyms for the KERNEL DENSITY PLOT command.
      KERNEL PLOT DENSITY PLOT DENSITY TRACE
Related Commands:
    HISTOGRAM = Generates a histogram.
    FREQUENCY PLOT = Generates a frequency plot.
    KERNEL DENSITY WIDTH = Sets the width factor, h, for the kernel density plot.
    KERNEL DENSITY POINTS = Sets the number of points generated for the kernel density plot.
Reference:
    "Kernel Density Estimation using the Fast Fourier Transform", B. W. Silverman, Applied Statistics, Royal Statistical Society, (1982), Vol. 33.

    "Density Estimation for Statistics and Data Analysis", B. W. Silverman, Chapman & Hall, 1986.

    "Multivariate Density Estimation", David Scott, John Wiley, 1992.

Applications:
    Density Estimation
Implementation Date:
    2001/8
Program:
        MULTIPLOT SCALE FACTOR 2
        MULTIPLOT 2 2
        MULTIPLOT CORNER COORDINATES 0 0 100 100
        .
        LET Y = NORMAL RANDOM NUMBERS FOR I = 1 1 1000
        X3LABEL 1,000 NORMAL RANDOM NUMBERS
        KERNEL DENSITY PLOT Y
        .
        LET Y = LOGNORMAL RANDOM NUMBERS FOR I = 1 1 1000
        X3LABEL 1,000 LOGNORMAL RANDOM NUMBERS
        KERNEL DENSITY PLOT Y
        .
        LET GAMMA = 2
        LET Y = WEIBULL RANDOM NUMBERS FOR I = 1 1 1000
        X3LABEL 1,000 WEIBULL RANDOM NUMBERS (GAMMA = 2)
        KERNEL DENSITY PLOT Y
        .
        LET Y = LOGISTIC RANDOM NUMBERS FOR I = 1 1 1000
        X3LABEL 1,000 LOGISTIC RANDOM NUMBERS
        KERNEL DENSITY PLOT Y
        END OF MULTIPLOT
        
    plot generated by sample program

Date created: 8/14/2001
Last updated: 4/4/2003
Please email comments on this WWW page to alan.heckert@nist.gov.