1.4.2.9.2. Graphical Output and Interpretation

1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.9. Airplane Polished Window Strength

1.4.2.9.2. Graphical Output and Interpretation

Goal

The goal of this analysis is to determine a good distributional model for these data. A secondary goal is to provide estimates for various percent points of the data. Percent points provide an answer to questions of the type "What is the polished window strength for the weakest 5% of the data?".

Initial Plots of the Data

The first step is to generate a histogram to get an overall feel for the data.

histogram of the data

The histogram shows the following:

The polished window strength ranges between slightly greater than 15 to slightly less than 50.
There are modes at approximately 28 and 38 with a gap in-between.
The data are somewhat symmetric, but with a gap in the middle.

We next generate a normal probability plot.

normal probability plot of the data

The normal probability plot has a correlation coefficient of 0.980. We can use this number as a reference baseline when comparing the performance of other distributional fits.

Other Potential Distributions

There is a large number of distributions that would be distributional model candidates for the data. However, we will restrict ourselves to consideration of the following distributional models because these have proven to be useful in reliability studies.

Normal distribution
Exponential distribution
Weibull distribution
Lognormal distribution
Gamma distribution
Power normal distribution
Fatigue life distribution

Approach

There are two basic questions that need to be addressed.

Does a given distributional model provide an adequate fit to the data?
Of the candidate distributional models, is there one distribution that fits the data better than the other candidate distributional models?

The use of probability plots and probability plot correlation coefficient (PPCC) plots provide answers to both of these questions.

If the distribution does not have a shape parameter, we simply generate a probability plot.

If we fit a straight line to the points on the probability plot, the intercept and slope of that line provide estimates of the location and scale parameters, respectively.
Our critierion for the "best fit" distribution is the one with the most linear probability plot. The correlation coefficient of the fitted line of the points on the probability plot, referred to as the PPCC value, provides a measure of the linearity of the probability plot, and thus a measure of how well the distribution fits the data. The PPCC values for multiple distributions can be compared to address the second question above.

If the distribution does have a shape parameter, then we are actually addressing a family of distributions rather than a single distribution. We first need to find the optimal value of the shape parameter. The PPCC plot can be used to determine the optimal parameter. We will use the PPCC plots in two stages. The first stage will be over a broad range of parameter values while the second stage will be in the neighborhood of the largest values. Although we could go further than two stages, for practical purposes two stages is sufficient. After determining an optimal value for the shape parameter, we use the probability plot as above to obtain estimates of the location and scale parameters and to determine the PPCC value. This PPCC value can be compared to the PPCC values obtained from other distributional models.

Analyses for Specific Distributions

We analyzed the data using the approach described above for the following distributional models:

Normal distribution - from the 4-plot above, the PPCC value was 0.980.
Exponential distribution - the exponential distribution is a special case of the Weibull with shape parameter equal to 1. If the Weibull analysis yields a shape parameter close to 1, then we would consider using the simpler exponential model.
Weibull distribution
Lognormal distribution
Gamma distribution
Power normal distribution
Power lognormal distribution

Summary of Results

The results are summarized below.

These results indicate that several of these distributions provide an adequate distributional model for the data. We choose the 3-parameter Weibull distribution as the most appropriate model because it provides the best balance between simplicity and best fit.

Percent Point Estimates

The final step in this analysis is to compute percent point estimates for the 1%, 2.5%, 5%, 95%, 97.5%, and 99% percent points. A percent point estimate is an estimate of the strength at which a given percentage of units will be weaker. For example, the 5% point is the strength at which we estimate that 5% of the units will be weaker.

To calculate these values, we use the Weibull percent point function with the appropriate estimates of the shape, location, and scale parameters. The Weibull percent point function can be computed in many general purpose statistical software programs, including Dataplot.

Dataplot generated the following estimates for the percent points:

 Estimated percent points using Weibull Distribution
  
 PERCENT POINT        POLISHED WINDOW STRENGTH
 0.01                 17.86
 0.02                 18.92
 0.05                 20.10
 0.95                 44.21
 0.97                 47.11
 0.99                 50.53

Quantitative Measures of Goodness of Fit

Although it is generally unnecessary, we can include quantitative measures of distributional goodness-of-fit. Three of the commonly used measures are:

Chi-square goodness-of-fit.
Kolmogorov-Smirnov goodness-of-fit.
Anderson-Darling goodness-of-fit.

In this case, the sample size of 31 precludes the use of the chi-square test since the chi-square approximation is not valid for small sample sizes. Specifically, the smallest expected frequency should be at least 5. Although we could combine classes, we will instead use one of the other tests. The Kolmogorov-Smirnov test requires a fully specified distribution. Since we need to use the data to estimate the shape, location, and scale parameters, we do not use this test here. The Anderson-Darling test is a refinement of the Kolmogorov-Smirnov test. We run this test for the normal, lognormal, and Weibull distributions.

Normal Anderson-Darling Output

               ANDERSON-DARLING 1-SAMPLE TEST
               THAT THE DATA CAME FROM A NORMAL DISTRIBUTION
  
 1. STATISTICS:
       NUMBER OF OBSERVATIONS                =       31
       MEAN                                  =    30.81142
       STANDARD DEVIATION                    =    7.253381
  
       ANDERSON-DARLING TEST STATISTIC VALUE =   0.5321903
       ADJUSTED TEST STATISTIC VALUE         =   0.5870153
  
 2. CRITICAL VALUES:
       90         % POINT    =   0.6160000
       95         % POINT    =   0.7350000
       97.5       % POINT    =   0.8610000
       99         % POINT    =    1.021000
  
 3. CONCLUSION (AT THE 5% LEVEL):
       THE DATA DO COME FROM A NORMAL DISTRIBUTION.

Lognormal Anderson-Darling Output

               ANDERSON-DARLING 1-SAMPLE TEST
               THAT THE DATA CAME FROM A LOGNORMAL DISTRIBUTION
  
 1. STATISTICS:
       NUMBER OF OBSERVATIONS                =       31
       MEAN OF LOG OF DATA                   =    3.401242
       STANDARD DEVIATION OF LOG OF DATA     =   0.2349026
  
       ANDERSON-DARLING TEST STATISTIC VALUE =   0.3888340
       ADJUSTED TEST STATISTIC VALUE         =   0.4288908
  
 2. CRITICAL VALUES:
       90         % POINT    =   0.6160000
       95         % POINT    =   0.7350000
       97.5       % POINT    =   0.8610000
       99         % POINT    =    1.021000
  
 3. CONCLUSION (AT THE 5% LEVEL):
       THE DATA DO COME FROM A LOGNORMAL DISTRIBUTION.

Weibull Anderson-Darling Output

               ANDERSON-DARLING 1-SAMPLE TEST
               THAT THE DATA CAME FROM A WEIBULL DISTRIBUTION
  
 1. STATISTICS:
       NUMBER OF OBSERVATIONS                =       31
       MEAN                                  =    14.91142
       STANDARD DEVIATION                    =    7.253381
       SHAPE PARAMETER                       =    2.237495
       SCALE PARAMETER                       =    16.87868
  
       ANDERSON-DARLING TEST STATISTIC VALUE =   0.3623638
       ADJUSTED TEST STATISTIC VALUE         =   0.3753803
  
 2. CRITICAL VALUES:
       90         % POINT    =   0.6370000
       95         % POINT    =   0.7570000
       97.5       % POINT    =   0.8770000
       99         % POINT    =    1.038000
  
 3. CONCLUSION (AT THE 5% LEVEL):
       THE DATA DO COME FROM A WEIBULL DISTRIBUTION.

Note that for the Weibull distribution, the Anderson-Darling test is actually testing the 2-parameter Weibull distribution (based on maximum likelihood estimates), not the 3-parameter Weibull distribution. To give a more accurate comparison, we subtract the location parameter (15.9) as estimated by the PPCC plot/probability plot technique before applying the Anderson-Darling test.

Conclusions

The Anderson-Darling test passes all three of these distributions. Note that the value of the Anderson-Darling test statistic is the smallest for the Weibull distribution with the value for the lognormal distribution just slightly larger. The test statistic for the normal distribution is noticeably higher than for the Weibull or lognormal.

This provides additional confirmation that either the Weibull or lognormal distribution fits this data better than the normal distribution with the Weibull providing a slightly better fit than the lognormal.