Threats to Validity Involving Geographic Space



         THREATS TO VALIDITY INVOLVING geographic SPACE1

                         ROLF R. SCHMITT
              National Transportation Policy Study 
               N.W., Washington, DC 20006, U.S.A.

      (Received 20 September 1977: revised 27 January 1978)

     Abstract-The measurement of changes in human behavior,
     socioeconomic organization, and the environment is a central
     task of planning and evaluation research.  The measurement
     of such changes is susceptable to numerous, systematic
     biases which have been examined in a nonspatial context as
     threats to validity.  Consideration of threats to validity
     is extended in this Paper to measurement problems involving
     geographic space. Such problems are common in studies of
     land use, transportation, environmental impacts, and related
     topics.



                          INTRODUCTION

Planning and evaluation research is largely directed at measuring
changes in human behavior, the organization of social and
economic activity, and the environment. For the measurement of
any change to be accepted with confidence, the validity of the
measure must be established. Both Kaplan[l] and Suchman[2] divide
this task into two basic questions:
     1.   Is this measurement reliable?
     2.   Does the measurement actually characterize the
phenomena in question?
For a measure to be reliable, it must replicate a result under
separate but identical situations. This requires the measurement
to be precise and adequately sensitive to change at the desired
level of detail. Reliability is a necessary but not sufficient
condition for validity. Systematic biases may consistently
distort a reliable measurement, which can eventually result in a
misguided evaluation. The issues of reliability and of systematic
biases must therefore be considered together in establishing the
validity of a measured change. 
     Cook and Campbell[3] suggest a four step process to validate
a measured change. These steps are the basis of their four
generic classes of validity threats. The first step is to assure
that the policy action and the condition supposedly affected
actually covary. The mathematical problems of covariance,
particularly among variables drawn from samples, are classified
as threats to statistical validity. When magnitudes of the
measured change are thus corroborated, causality is attributed to
the policy inputs only after all other plausible explanations for
the change are hypothesized and tested in the second step2. The
many, extraneous factors which should generally be considered are
classified as threats to the internal validity of the
observation, experiment, or model. Once causal factors are
established, their interpretation is questioned in the third
step. Threats to construct validity occur when the problem
definition and its translation into an operational form do not
truly represent the social condition to be ameliorated or the

_________________________________________________________________
     1The impetus for this paper was provided by Richard S. P.
Weissbrod of The Johns Hopkins University.  Editorial suggestions
by David L. Greene of Oak Ridge National Laboratories are also
appreciated.
     2The rigorous consideration of such rival hypotheses to es-
tablish causality was advocated in the last century by Chamberlin
[4].

policy action taken. If these classes of validity threats are
adequately controlled, the measured change can be accepted with
confidence. It is useful, however, to know also whether the
measured change would be valid in other, external settings; thus,
a fourth set of threats which affect external validity are
examined. Within these four generic classes, Cook and Campbell
discuss 34 validity threats, listed in Table 1, any of which can
systematically bias measurements in planning and evaluation
research.
     The Cook and Campbell typology of validity threats is based
on experience in education, criminal justice and industrial
management. Very little of this experience

     Table 1. The Cook and Campbell typology of validity threats

THREATS TO STATISTICAL CONCLUSION VALIDITY

     Statistical power
     Fishing and error rate problem
     Reliability of measures
     Reliability of treatment implementation
     Random irrelevancies in the experimental setting
     Random heterogeneity of respondents

THREATS TO INTERNAL VALIDITY

     History
     Local history
     Maturation
     Testing
     Instrumentation
     Statistical regression
     Selection
     Mortality
     Interactions with selection
     Ambiguity  bout the direction of causality
     Diffusion or imitation of treatment
     Compensatory equalization of treatment
     Compensatory rivalry
     Resentful demoralization of respondents receiving
     less desirable treatment

THREATS TO CONSTRUCT VALIDITY

     Inadequate pre-operational explication of construct!
     Mono-operation bias
     Mono-method bias
     Hypothesis-guessing within experimental conditions
     Evaluation apprehension
     Experimenter expectancies
     Confounding levels of constructs and constructs
     Generalizing across time

THREATS TO EXTERNAL VALIDITY

     Interaction of the treatment and treatments
     Interaction of the treatment and testing
     Interaction of the treatment and selection
     Interaction of the treatment and setting
     Interaction of the treatment and history
     Generalizing across effect constructs

                               191


192

includes research in which geographic space plays a major role. 
Such research includes studies of land use, transportation,
environmental impacts, and related topics. 
     Geographic space creates more than an additional context for
considering the validity threats in Table 1. Some threats are
inherently different or exist only in research which involves
geographic space. In this context, the Cook and Campbell
inventory is incomplete.
     Threats to validity involving geographic space are now
examined. In this paper, geographic space is taken to be any
functionally related area which is at least as large as an
average city block but not contained by physically linked
structures. Transportation planning is used as the substantive
context for illustrating the validity threats which arise in a
spatial context.


geographic SPACE AND THE COOK AND CAMPBELL INVENTORY 
     Many of the validity threats inventoried by Cook and
Campbell occur in a spatial context. These threats are generally
related to two themes: spatial differentiation and geographic
proximity.  
     Spatial differentiation is the compartition of human
activity into relatively homogeneous areal units. People of
similar backgrounds tend to live in the same neighborhoods. 
Symbiotic enterprises are often located together. Zoning
ordinances in the United States generally allow only one type of
land use for a given set of adjoining properties. Whether
encouraged by social ties, economic linkages, localized
resources, or legal mandates, the result is a varied landscape in
which no two places are exactly alike, and in which most
localities are internally homogeneous. In short, spatial
differentiation is the basis of a complex stratification of
cultural and environmental phenomena.
     As with any method of stratification, spatial differen-
tiation is a source of selection biases and related validity
threats. An obvious example of selection bias is the estimation
of transit fare elasticities with interview data from patrons
after a fare change, ignoring past users who quit. More subtle
biases occur when monitoring sites are selected for previously
measured, extreme conditions. For example, the extreme accident
rate at an intersection measured over a short time period may be
due to local conditions or random variation. If the extreme rate
is due in part to the latter, then fewer accidents will probably
occur in a subsequent period, whether or not safety improvements
are made[5]. Comparisons of the accident rates are susceptible to
the bias of statistical regression, which can distort the
evaluation of ameliorative actions. Evaluations can also be
confused by multiple actions in the same locality, such as
effects of simultaneous changes in local transit service and
fares on ridership.
     Validity threats which stem from spatial differentiation
underlie the failure of many transportation impact studies to
generate useful insights[6]. In these studies, impacts are
measured from comparisons between sites adjacent to the new
facility and "control" sites of similar circumstances yet far
enough removed to supposedly be unaffected. Finding a distant yet
comparable monitoring site is difficult at best; finding a
distant control site in 
_________________________________________________________________
3Compensatory activities are incorporated by the last four
threats to Internal Validity in Table 1.

which local history does not cause distortions during the study
is even harder.
     To reduce the biases which stem from spatial differen-
tiation, areas in closer geographic proximity are often selected
for control sites. Social and economic interactions between
nearby areas reduce their differences. Unfortunately, those same
interactions aid the diffusion of impacts into the control area. 
Impacts of fixed facilities or service changes can physically
diffuse or be imitated by compensatory activity in the
surrounding area.3Whether labeled "externality", "indirect
impact", "John Henry Effect", or otherwise, the result is a dis-
torted comparison between treatment and control areas.
     Geographic proximity also usually precludes the use of true
experimental designs for evaluating changes in spatially
contiguous services such as transportation. For ,example, it is
nearly impossible to assign at random individual eligibility for
a fixed-route transit service as it passes through the
neighborhood. Without random assignment, spatially contiguous
services can be evaluated only with predictive models or quasi-
experimental designs. The former often lack sensitivity to
innovations. The latter often require interarea comparisons, and
are less effective in controlling the biases inherent to spatial
differentiation.
     The role of geographic space in the Cook and Campbell
classification of validity threats can be summarized as a
conflict between the effects of spatial differentiation and of
geographic proximity. This conflict is illustrated by the
attempted evaluation of a door-to-door, advance reservation,
transit service for the elderly and handicapped in a six square
kilometer portion of Baltimore, Maryland[7]. Evaluators wished to
measure changes in the average frequency and length of travel
caused by the new service. To subtract out seasonal variations
and other extraneous factors, the traditional approach of
comparing changes in the service area to changes in a control
neighborhood was considered. The only comparable neighborhoods
were not far enough removed to preclude residents in the control
area from altering their travel behavior to match that of friends
in the service area. Changes attributable to the service would
then be indeterminant. More distant neighborhoods of similar
social and economic characteristics had vastly different
locations] attributes, such as distance to downtown Baltimore and
other potential destinations of travel. Differences in such
attributes distort the comparison of local travel patterns. A
quasi-experimental design was eventually selected in which
separate samples of households are drawn from the service area
twice before and once after the service is implemented. The known
validity of this evaluation is restricted to the particular
locality and time. In general, external validity is assured
mainly through comparisons of areas, but such comparisons are
possible only if the conflicting validity threats stemming from
spatial differentiation and geographic proximity can be
controlled.
     The geographic aspects of validity discussed so far are
readily subsumed by the Cook and Campbell classification. Their
concern with the timing of observations and interpersonal
diffusion of the treatment are directly analogous to the
preceding concern with the degree of spatial separation among
monitoring sites. However, these aspects are only a portion of
the threats to validity which arise in a spatial context. The
Cook and Campbell inventory in Table I must be expanded to
include the additional threats.

         Threats to validity involving geographic space       193



ADDITIONAL VALIDITY THREATS IN geographic SPACE 
     The validation of planning and evaluation research which
involves geographic space should include the consideration of
eight threats beyond the Cook and Campbell inventory. These
additional threats are included in Table 2.


Table 2. Geographic validity threats not inventoried by Cook
                          and Campbell
---------------------------------------------------------
          1.   Boundary Distortions

               a.   Overextension
               b.   Truncation

          2.   Partition Distortions

               a.   Spurious location and diffusion
               b.   Spatial autocorrelation
               c.   Excessive heterogeneity within.zones
               d.   Density bias

          3.   Scale Distortions
          4.   Interaction of Scale and Constructs
          5.   Interaction of Scale and Statistical Validity
          6.   Generalizability across Scales
          7.   Interaction of Space and Time
          8.   Confusion of Spatial and Aspatial Issues

Boundary distortions
     Boundary distortions, which affect both statistical and
internal validity, arise in the definition of the study area. 
Its boundaries can overextend and dilute the phenomena under
study, or the phenomena can be prematurely truncated. Measures of
density are particularly susceptible to this problem. Population
density, for example, can be altered merely by increasing or
decreasing the amount of surrounding, unsettled land encompassed
by the study area. Of course, many boundaries can be defined by
physical barriers or by discrete changes in the spatial incidence
or nature of a phenomena4. Such is rarely the case, however, for
small-area studies.
     In transportation studies, boundary distortions are
especially difficult to avoid in the calibration of trip
distribution models. When characterizing local travel, trips
ending beyond the local area are usually classified as
statistical outliers and excluded from the calculations. The
number and length of these external trips will affect the values
of the model's parameters[9]. Biases can subsequently occur both
in predicted intra-area travel volumes and in comparisons of the
effects of distance on local trip frequencies.
     Boundary distortions can be mitigated. If possible, the
study area should be defined by the region's functional linkages
or by a characteristic which has greater within-region variance
than between-region variance. If the use of less appropriate
boundaries is required by the data or the political context, then
a measure of the degree to which the desired and utilized
boundaries differ should be included with the study's results.
_________________________________________________________________
     4For example, the rapid change in land values and density of
development often define a precise urban boundary[8].

Partition distortions
     Partition distortions are a potential threat to internal
validity whenever the study area is subdivided into analysis
zones. These distortions include spurious location or diffusion,
spatial auto-correlation, excessive heterogeneity within zones,
and the density bias.
     Spurious location and diffusion can occur when the spatial
incidence of a phenomenon is located by centroids of analysis
zones. If the phenomenon actually occurs peripherally in a large
analysis zone, its location is distorted by its arbitrary
assignment to the zone centroid. Should the phenomenon be divided
by the boundaries of large zones, its location is falsely split
and spuriously diffused among the distant zonal centroids. An
example is the use of census tracts to measure the attractiveness
of retail centers. Retail centers are usually partitioned by
major traffic arteries which frequently demarcate census tracts. 
Since census tracts are designed for population rather than
transportation or retail studies, the retail center is assigned
to several distant centroids. This problem is more severe in
suburban areas where census tracts are larger.
     The obvious answer to the spurious location or diffusion
problem is to minimize the size of each analysis zone. 
Partitioning the study area into smaller zones, however,
magnifies computational headaches as the number of zones
increases. Decreasing zone size can also incur the problem of
spatial autocorreladon.
     Spatial autocorrelation occurs when functionally unified
areas are subdivided into zones; measurements of a phenomenon in
one analysis zone will be highly correlated to the measurements
taken in the adjacent zones. For example, data collected by
county for federally funded and state administered programs will
probably be highly correlated between counties within each state,
but not between counties in different states. While useful for
uncovering and predicting geographic patterns, spatial
autocorrelation can lead to biased estimators in linear
models[10].
     Complications also occur when partitions allow too much
heterogeneity within zones. If within-zone variance of a
phenomenon exceeds its between-zone variance, then the areal
units provide a basis for neither precise descriptions nor
adequately sensitive predictions. An example is the excessive
variance of travel behavior observed within census tracts [11,
12]. This variance has been attributed to the mismatch between
observational units and the spatial distribution of causal
factors[13]. Mismatches are more prevalent when an arbitrary grid
or constant area is used to define the subdivisions.
     When the size and shape of zones are allowed to vary and
more accurately reflect functional units, a density bias can
occur. For example, monitored increases in the zonal
concentration of new suburban activities may be overrepresented
by the larger sizes of census tracts outside the central
city[14].
     There is no panacea for partition distortions. The amount of
potential bias, requisite observational or model sensitivity, and
computational capabilities must all be considered if the number,
size, shape, and uniformity of subdivisions are not
predetermined. These considerations have been examined most
recently by Batty[15] and Coulson[16].

Scale distortions
     Several validity threats which arise in geographic space are
related to scale. In this context, scale refers to



194                       R.R. SCHMITT

the relative magnitude of the study. Micro-scale transportation
studies usually involve individuals or households in a
neighborhood setting. Macro-scale studies deal with larger
aggregates, such as interzonal travel flows throughout an entire
city or region.
     While scale is an issue to each of the four generic classes
of validity, scale distortions specifically affect internal
validity. Scale distortions occur when a measure is applied to
different scales without careful recalibration. Local conditions,
which are usually averaged out in aggregate studies, will often
cause parametric shifts in a measure of travel behavior.
     Scale distortions are an unnoticed yet relevant threat to a
commonly proposed measure of accessibility [ 17-19]. A zone's
accessibility increases with the attractiveness or size of
surrounding zones and decreases exponentially with distance to
each zone. The rate of exponential decay is taken from a gravity-
type trip distribution model calibrated from regionwide travel
surveys. The regionwide parameter is used to calculate the
accessibility of specific facilities to zones within
transportation corridors, ignoring the strong possibility that
regionwide travel behavior is not simply mirrored by local
residents. The effect of distance on local accessibility is thus
distorted because the measure is calibrated at a different scale. 
Likewise, the use of a locally calibrated measure for larger
aggregates of travel behavior is also susceptible to scale
distortions.

Interaction of scale and constructs
     More than parametric shifts can occur between scales in
which case completely different variables assume importance. To
investigate this threat of interaction between scale and
constructs, the question must be asked does the operational form
of the construct hold for varying distances, densities of
activity, or degree of area aggregation? These questions of
construct validity would be relevant, for example, if a travel
demand model for interurban, rail passenger service was applied
to an intraurban subway system. Availability of air trans-
portation would be an important variable in the former
application, but hardly relevant to the intraurban case

Interaction of scale and statistical validity
     Scale is an important issue when establishing statistical
validity. In order to establish covariance between policy actions
and indicators of the condition to be ameliorate the scale of the
analysis must not be reduced beyond the ability of the data base
to provide adequate inference Discrepancies in the data are
magnified by increasing disaggregation because they are less
likely to be averaged out. Statistical validity depends on the
level detail available in the data base. Data can be aggregate
above-but rarely disaggregated below-the scale which it is
collected. Inferences about larger populations may be drawn from
representative samples, but inferences about subgroups require
their adequate representation in the sample as well.  Any of the
factors will affect the consistency of measured result and are
explored further by Alonso[20].

Generalizability across scales
     The final scale problem is one of external validity. The
conclusions reached for one scale may not be generalizable to
another.  For example, activity patterns in medium-size city are
not always analogous to those the largest metropolitan areas. 
Similarly, a door-to-do transit service covering a 6km area may
not be comparable with one serving 100 km, even if the intended
clientele have similar characteristics.
     As with the other validity threats related to scale, control
of this threat is not readily accomplished with an analytical
device. The best "control" is an awareness of scale-related
problems and the need to avoid them by matching the scale of the
study to the scale of the particular research question.  The
expediency of using tools and results from one scale to another
will most likely cause more problems than it is worth.

Interaction of space and time
     The seventh validity threat in Table 2 is the interaction of
space and time. It should be obvious that "the use of space
involves movement, and movement consumes time"[21], yet this
point is occasionally forgotten in transportation studies. This
is particularly true for estimates of latent travel demand, in
which frequencies of travel are often compared without
consideration of the trip lengths. Trips of similar frequencies
but differing lengths do not represent equivalent amounts of
travel. Furthermore, similar trip lengths in physical distance
may be radically different in traveltime and thus not truly the
same. In short, equivalent patterns of behavior must be equal in
their respective consumption of both time and space.

Confusion of spatial and aspatial issues
     The confusion of spatial and aspatial issues is the last and
perhaps most fundamental validity threat inherent to research at
geographic scales, and stems from the misattribution of a spatial
effect to a spatial rather than aspatial cause. Rapid
suburbanization in the late 1960s is an example of one spatial
effect that has been commonly attributed to a spatial cause
(investments in major transportation facilities)[22]. 
Consideration must also be given to aspatial causes, such as
housing subsidy programs, tax laws, and the national economic
climate, all of which have spatial expressions but are not
necessarily applied to specific, spatial domains[231.
     The excessive within-zone variance of travel behavior
previously mentioned may have its roots in the confusion of
spatial and aspatial issues. The use of areal units to explain
travel behavior assumes that a spatial process such as
residential differentiation affects the observed spatial behavior
(i.e. travel), although the effect is unclear. Yet excessive'
within-zone variance of travel behavior is evidence of
heterogeneity within the area units, stemming either from the
previously discussed partition distortions or from the aspatial
nature of causal factors. Kutter[24] argues the latter: that
travel behavior responds more to individual characteristics than
to spatial settings.
     The confusion of spatial and aspatial issues can often be
attributed to disciplinary turf. Overemphasis of spatial factors
is common for geographers and their allies, while economists and
their allies tend to underemphasize space (if only for
theoretical tractability). In either case, the construct validity
of a measured or predicted change is left in doubt unless both
spatial and aspatial interpretations are considered.

                           CONCLUSIONS
     The validation of measurements of change is a critical step
for establishing confidence in the results of planning and
evaluation research. Many threats to validity have

 

been inventoried by Cook and Campbell in an essentially aspatial
context. In this paper, their inventory has been re-examined and
expanded to accommodate more fully the validity threats which
arise in geographic space. These threats are inherent to studies
of land use, transportation, and environmental impacts.
     While a number of validity threats have been explicated,
methods for their control have been given only cursory attention. 
Such methods are numerous, and their use depends on the research
topic, purpose, setting, approach to measurement and availability
of resources. In the vast majority of cases, control of all
validity threats which are relevant to the particular study is
impossible. However, simple awareness of the possible biases in
specific findings is often sufficient to indicate what range of
the finding's applications are appropriate. Decisionmakers are
rightfully skeptical of results from a "black box" or study in
which the assumptions and difficulties are obscured. In the long
run, clear knowledge of the caveats encourages far greater
confidence in and better use of research findings.

                           REFERENCES                         195

1.   A. Kaplan, The Conduct of Inquiry: Methodology for
     Behavioral Science.  Chandler, Scranton, Pennsylvania
     (1964).
2.   E. A. Suchman, Evaluative Research: Principles and Practice
     in Public Service and Social Action Programs.  Russell Sage
     Foundation, New York (1967).
3.   T. D. Cook and D. T. Campbell, The design and conduct of
     quasi-experiments and true experiments in field settings. 
     In Handbook of Industrial and Organizational Psychology
     (Edited by D. Dunnette), pp. 223-326. Rand McNally. Chicago
     (1976).
4.   T. C. Chamberlin, The method of multiple working hypotheses. 
     Science 15, 92-96 (1890).
5.   L. I. Griffin, B. Powers and C. Mullen, Impediments to the
     Evaluation of Highway Safety Programs. Highway Safety
     Research Center, University of North Carolina, Chapel Hill
     (1975).
6.   Charles River Associates, Measurement of the Effects of
     Transportation Changes. Urban Mass Transportation Ad-
     ministration, Washington, D.C. (1972).
7.   R. R. Schmitt and D. L. Greene, Evaluating transportation
     innovations with the intervening opportunities model.
     Northeast AIDS Proc. pp.  184-187 (1977).
8.   R. Barden and J. H. Thompson, The Urban Frontier.  Occ.  
     Paper No. 4, Syracuse University (c. 1970).
9.   T. J. Wilbanks, Measuring Accessibility: Progress Rep. I,
     Occ.  Paper No. 5, Syracuse University (1970).
10.  A. D. Cliff, P. Haggett, J. K. Ord, K. Bassett and R.
     Davies, Elements of Spatial Structure: A Quantitative
     Approach, Chap. 9. Cambridge University Press, New York
     (1975).
11.  G. M. McCarthy, Multiple regression analysis of household
     trip generation: a critique.  Highway Res.  Rec. 297, 31-43
     (1969).
12.  M. Wachs, Resource paper. In Highway Research Board, Urban
     Travel Demand Forecasting, Special Rep. No. 143, pp. 96-113. 
     Washington, D.C. (1973).
13.  E. Aldana, R. deNeufville and J. H. Stafford, Microanalysis
     of urban transportation demand. Highway Res. Rec. 446, 1-11
     (1973).
14.  D. L. Greene, Multinucleation in urban spatial structure. 
     Ph.D. dissertation, Johns Hopkins University (1977).
15.  M. Batty, Urban Modelling: Algorithms. Calibrations, Pre-
     dictions. Cambridge University Press, New York (1976).
16.  M. R. C. Coulson, 'Potential for variation': a concept for
     measuring the significance in the size and shape of areal
     units for their use in descriptive and analytical studies. 
     Geografiska Annaler, in press(1978).
17.  U.S. Department of Transportation, Special Area Analysis,
     Final Manual, Washington, D.C.(1973).
18.  L. Sherman, B. Barber and W. Kondo, method for evaluating
     metropolitan accessibility.  Transpn Res. Rec. 449, 70-87
     (1974).
19.  R. W. Vickerman, Accessibility attraction and potential: a
     review of some concepts and their use in determining mobil-
     ity.  Environment Plan. 6, 675-691 (1974).
20.  W. Alonso, Predicting best with imperfect data, J. Am. 
     Instit. Planners 34. 248-255 (1968).
21.  I. G. Cullen, Space, time and the disruption of behavior in
     cities. Environment Plan. 4, 459-470 (1972).
22.  U.S. Federal Highway Administration, Social and Economic
     Effects of Highways, Washington, D.C. (1976).
23.  D. W. Harvey, The political economy of urbanization in
     advanced capitalist societies: the case of the United
     States. In The Social Economy of Cities (Edited by G.
     Gappert and M.Rose), pp. 119-163. Sage Publications, Beverly
     Hills (1975).
24.  E. Kutter, A model for individual travel behavior. Urban
     Studies 10, 235-258 (1973).
(gopher.html)