5.
Process Improvement
5.6.
Case Studies
5.6.3.
Catapult Case Study
5.6.3.5.
|
Model Selection Criterion
|
|
Criterion for Including Terms in the Model
|
The Eddy current probe sensitivity case
study gave a number of critierion for determining which terms
to include in our model. These critierion are valid for fractional
factorial models as well.
As with the Eddy current probe sensitivity
full factorial case study, the criterion for fractional
factorial designs as to the ordering of terms to be
added in the model is the same, namely, add terms in
the order of the ranked list of important factors.
Hence our model will include X4 (arm length), followed by
X3 (number of bands), followed by X1 (band height), etc.
The questions is: when does one stop adding terms? The
guidelines are the same as with full factorial designs; namely,
- Generate a halfnormal
probability
plot of the absolute value of the effects. Add the
terms which stand out in the halfnormal probability
plot.
- Minimum Engineering Signficant Difference: add those
terms which are bigger than the engineer's prespecified
minimum engineering significant difference (e.g., 10%
of the total range of the data = 10% (126.5 - 8) =
12 inches = 1 foot. In fact, no such cutoff value was
pre-specified in this experiment, so this criterion
cannot be applied.
- Minimum Engineering Residual Standard Deviation: Add
terms until the residual standard deviation gets
smaller that the engineer's pre-specified value for
how good he/she wants the model to be. That is, how
small does he/she want the fitted model to be (e.g.,
5% of the total range = 5% (126.5 - 8) = 6 inches).
In fact, no such apriori value was pre-set, so
this criterion cannot be applied.
- Replication Standard Deviation: Add terms until the
residual standard deviation is smaller than the
replicated standard deviation. Since this
experiment has by design built-in replication, this
criterion may be applied. The replication standard
deviations were computed from the 2 pseudo-center
points and had the values 5.3 and 9.75 with
a pooled value of 8.162. The logic of this is that we
can demand of our model to fit with at least
the precision of replicated points, but no further
(else we would be fitting noise). Thus keep adding
terms to the model until the residual standard
deviation drops below 8.2.
- Generate a normal probability plot of the residuals.
Keep adding terms (and computing residuals) until the
normal probability plot of the residuals is "sufficiently"
linear.
The above 5 criterion may not all agree in terms of
what terms to include in the model. Criterion 4 is an
absolute lower bound and so adding terms beyond that is
senseless. Further, since this is essentially a
modeling exercise, then criterion 3 (how small do you
want the residual standard deviation to be?) is the
definitive criterion. In the absense of an answer to
that question, then criterion 1 along with criterion 5
is often used in combination.
For the experiment at hand, we use criteria 1, 4, and 5.
|
Normal and Halfnormal Probablity Plots of Effects
|
The following plots show the normal probability plot of
the effects and the halfnormal probability plot of the
absolute value of the effects (both without and with
the coded effect tags). The halfnormal probability
plot is more informative than the normal probability
plot since the halfnormal plot does not change
depending on how the original factor settings were coded.
For example, for factor 1, do we define 2.25 and 4.75
to be coded as -1 and +1, or conversely as +1 and -1?
Depending on what we do, this will change the sign of
the effect. This changes the appearance of the
normal probability plot, but the halfnormal plot is
unchanged since it only focuses on the magnitude.
|
Plots Not Conclusive
|
The plots are not conclusive. The normal probability plot is
roughly linear, which implies no factors are important. The
halfnormal probability plot is also roughly linear with the same
implication. In practice, such plots usually have a linear
sub-portion (consisting of factors we leave out of the model)
and then a displaced set of factors (which we include in the
model). Here the situation is not clear. From the halfnormal
probability plot, we can extract the upper six factors as
(slightly) different from the lower linear nine factors, but
little justification exists for this and less justification
exists for adding more terms beyond the six. As from our ranked
list, the six terms are X4 (arm length), X3 (number of bands),
X1 (band height), X5 (stop angle), X2 (start angle), and X3*X4.
Carrying out the usual least squares fit for this model, we
obtain the following fitted model (prediction equation):
Regarding adequacy of this model, two approaches are carried out:
- Quantitatively: compute the residual standard deviation
and compare it to the replication deviation (step 4 above);
- Graphically: generate a normal probability plot of the
residuals so as to achieve linearity (step 5 above).
|
Residual Standard Deviation and Normal Probability Plot of the
Residuals
|
Quantitatively, the residual standard deviation for this model
is 12.48. This is 50% larger than the replication standard
deviation of 8.16 which suggests that the model is inadequate
(and so additional terms must be added). Graphically, the
normal probability plot of the residuals is presented below.
This plot by itself suggests that the model is
adequate, but the normal probability plot is a
secondary tool relative to the computed residual
standard deviation. The bottom line is that the
given model has a "typical residual" of the order of
12 inches, and so the usual
standard deviations will yield a rough 95% error of prediction
of 24 inches. This is probably too large in practice and
suggests that more terms need to be added to drive the error
of prediction down.
How many more terms need to be added? To determine
this we need the following table which provides
effect estimates and residual standard deviations
for cumulatively more complicated models. One of
the saving graces of orthogonal designs (such as the
25-1 fractional factorial design used here) is
that the effect estimates do not change as we add
additional terms. The following table thus serves
the two-fold purpose of giving the ranked list of
factors and also giving the goodness of fit of the
correspoinding cumulative fitted models. Specifically, the
last column in the table is residual standard deviation of the
model that includes the term on that line plus all the terms
above that line.
|
Yates Table
|
IDENTIFIER EFFECT T VALUE RESSD: RESSD:
MEAN + MEAN +
TERM CUM TERMS
----------------------------------------------------------
MEAN 55.29688 37.56807 37.56807
4 40.28125 3.5* 32.38174 32.38174
3 35.90625 3.1* 33.82029 27.06551
1 26.96875 2.3 36.11603 23.47657
1234 24.09375 2.1 36.69212 19.75246
2 -22.15625 -1.9 37.03936 15.25831
34 15.21875 1.3 38.02627 12.47985
14 9.40625 0.8 38.56024 11.44448
13 9.28125 0.8 38.56889 10.02313
23 -6.34375 -0.5 38.73853 9.50675
123 6.28125 0.5 38.74143 8.76873
124 5.65625 0.5 38.76894 8.00750
12 -5.53125 -0.5 38.77409 6.68584
134 5.34375 0.5 38.78160 3.15269
24 -2.21875 -0.2 38.86856 0.43301
234 0.21875 0.0 38.88647 0.00000
|
Conclusions
|
As seen above, the model with a constant + six terms has
a residual standard deviation of 12.48. Adding two
additional terms (X1*X4 and X1*x3) brings the residual
standard deviation down to 10.02. It would take three more
terms beyond that (a constant + nine terms) to drive the
residual standard deviation below the replication
standard deviation of 8.16. The normal probability
plot of the residuals of this model (not shown) would
be acceptable (that is, linear) but the problem we have
is 4-fold:
- Have we used the wrong form? If the form (linear
+ interaction terms) is wrong, then no amount of
additional terms will improve things. For
example, trying to fit an asymptotic exponential
function via a polynomial.
- Is the model too complicated? Such a model
(constant + nine terms) is always suspicious. This
suggests that perhaps another functional
approach/form should have been used (e.g,
transforming before fitting).
- Were the underlying assumptions for least squares
fitting adhered to when estimating the
coefficients of this model? The answer to this
is no because of the non-constant variance
problem (low responses have low variation, but
high responses have high variation). In such
case, the regression coefficients of the model
are incorrect. The corrective action for this is
twofold:
- weighted least squares;
- transformations of the response.
This latter point will be developed and pursued
in the next section.
- Does this model have good interpolatory
properties? Recall that the purpose of this
model is not as an end in itself, but to assist
in deriving good settings for Y = 30, 60, and 90.
In this context, we ask how well does this
model do at the pseudo-center points (such
interpolatory testing is why these points were
included in the design in the first place).
- At pseudo-center point 1: (0,0,-1,0,0), Y = 37.5
and 45.0, with a mean of 41.25. The fitted model
yields a predicted value of
Y = 55.30 + 0.5 (26.97*(-1)) = 41.82
which is excellent compared to the mean of 41.25.
- At pseudo-center point 2: (0,0,+1,0,0),
Y = 84.5 and 99, with a mean of 91.75.
The fitted model yields a predicted value of
Y = 55.30 + 0.5 (26.97*(+1)) = 68.79
which is terrible compared to the mean of 91.75.
We thus conclude that this model cannot be freely and
universally used for interpolation, in particular for
the X3 = +1 (that is, the number of rubber bands = 2) case.
We are thus led to the prospect of a model whose adequacy is
conditional. It is based on the setting of a discrete
variable (X3). As a matter of experience, this is a quite
common occurrence when discrete variables are involved in the
analysis. When a factor is important, discrete, and the two
levels of such a factor behave drastically differently, then
the suggestive corrective action is to split the analysis
and carry out two parallel analyses, each with their own models
and settings, based on the different levels of the factor. Int
this case, one model would be based on X3 = -1 and the other
one would be based on X3 = +1.
Four added benefits of such an approach are
- Simplified Sub-models: the resulting tso
sub-models are each frequently quite simple
(since all of the interaction terms
involving X3 (in this case) disappear with
the subsetting;
- Improved Predicted values: the prediction
equations yield better (more accurate)
predicted values;
- Improved best settings: This follows from
the above. This line of analysis will be
pursued later.
- Improved Insight: Separating out a dominant
discrete terms encourages a separation of
engineering cause-effect into 2 separate
camps, both devoid of complicated
interactions;
|