Threats to Validity Involving Geographic Space
THREATS TO VALIDITY INVOLVING geographic SPACE1 ROLF R. SCHMITT National Transportation Policy Study N.W., Washington, DC 20006, U.S.A. (Received 20 September 1977: revised 27 January 1978) Abstract-The measurement of changes in human behavior, socioeconomic organization, and the environment is a central task of planning and evaluation research. The measurement of such changes is susceptable to numerous, systematic biases which have been examined in a nonspatial context as threats to validity. Consideration of threats to validity is extended in this Paper to measurement problems involving geographic space. Such problems are common in studies of land use, transportation, environmental impacts, and related topics. INTRODUCTION Planning and evaluation research is largely directed at measuring changes in human behavior, the organization of social and economic activity, and the environment. For the measurement of any change to be accepted with confidence, the validity of the measure must be established. Both Kaplan[l] and Suchman[2] divide this task into two basic questions: 1. Is this measurement reliable? 2. Does the measurement actually characterize the phenomena in question? For a measure to be reliable, it must replicate a result under separate but identical situations. This requires the measurement to be precise and adequately sensitive to change at the desired level of detail. Reliability is a necessary but not sufficient condition for validity. Systematic biases may consistently distort a reliable measurement, which can eventually result in a misguided evaluation. The issues of reliability and of systematic biases must therefore be considered together in establishing the validity of a measured change. Cook and Campbell[3] suggest a four step process to validate a measured change. These steps are the basis of their four generic classes of validity threats. The first step is to assure that the policy action and the condition supposedly affected actually covary. The mathematical problems of covariance, particularly among variables drawn from samples, are classified as threats to statistical validity. When magnitudes of the measured change are thus corroborated, causality is attributed to the policy inputs only after all other plausible explanations for the change are hypothesized and tested in the second step2. The many, extraneous factors which should generally be considered are classified as threats to the internal validity of the observation, experiment, or model. Once causal factors are established, their interpretation is questioned in the third step. Threats to construct validity occur when the problem definition and its translation into an operational form do not truly represent the social condition to be ameliorated or the _________________________________________________________________ 1The impetus for this paper was provided by Richard S. P. Weissbrod of The Johns Hopkins University. Editorial suggestions by David L. Greene of Oak Ridge National Laboratories are also appreciated. 2The rigorous consideration of such rival hypotheses to es- tablish causality was advocated in the last century by Chamberlin [4]. policy action taken. If these classes of validity threats are adequately controlled, the measured change can be accepted with confidence. It is useful, however, to know also whether the measured change would be valid in other, external settings; thus, a fourth set of threats which affect external validity are examined. Within these four generic classes, Cook and Campbell discuss 34 validity threats, listed in Table 1, any of which can systematically bias measurements in planning and evaluation research. The Cook and Campbell typology of validity threats is based on experience in education, criminal justice and industrial management. Very little of this experience Table 1. The Cook and Campbell typology of validity threats THREATS TO STATISTICAL CONCLUSION VALIDITY Statistical power Fishing and error rate problem Reliability of measures Reliability of treatment implementation Random irrelevancies in the experimental setting Random heterogeneity of respondents THREATS TO INTERNAL VALIDITY History Local history Maturation Testing Instrumentation Statistical regression Selection Mortality Interactions with selection Ambiguity bout the direction of causality Diffusion or imitation of treatment Compensatory equalization of treatment Compensatory rivalry Resentful demoralization of respondents receiving less desirable treatment THREATS TO CONSTRUCT VALIDITY Inadequate pre-operational explication of construct! Mono-operation bias Mono-method bias Hypothesis-guessing within experimental conditions Evaluation apprehension Experimenter expectancies Confounding levels of constructs and constructs Generalizing across time THREATS TO EXTERNAL VALIDITY Interaction of the treatment and treatments Interaction of the treatment and testing Interaction of the treatment and selection Interaction of the treatment and setting Interaction of the treatment and history Generalizing across effect constructs 191 192 includes research in which geographic space plays a major role. Such research includes studies of land use, transportation, environmental impacts, and related topics. Geographic space creates more than an additional context for considering the validity threats in Table 1. Some threats are inherently different or exist only in research which involves geographic space. In this context, the Cook and Campbell inventory is incomplete. Threats to validity involving geographic space are now examined. In this paper, geographic space is taken to be any functionally related area which is at least as large as an average city block but not contained by physically linked structures. Transportation planning is used as the substantive context for illustrating the validity threats which arise in a spatial context. geographic SPACE AND THE COOK AND CAMPBELL INVENTORY Many of the validity threats inventoried by Cook and Campbell occur in a spatial context. These threats are generally related to two themes: spatial differentiation and geographic proximity. Spatial differentiation is the compartition of human activity into relatively homogeneous areal units. People of similar backgrounds tend to live in the same neighborhoods. Symbiotic enterprises are often located together. Zoning ordinances in the United States generally allow only one type of land use for a given set of adjoining properties. Whether encouraged by social ties, economic linkages, localized resources, or legal mandates, the result is a varied landscape in which no two places are exactly alike, and in which most localities are internally homogeneous. In short, spatial differentiation is the basis of a complex stratification of cultural and environmental phenomena. As with any method of stratification, spatial differen- tiation is a source of selection biases and related validity threats. An obvious example of selection bias is the estimation of transit fare elasticities with interview data from patrons after a fare change, ignoring past users who quit. More subtle biases occur when monitoring sites are selected for previously measured, extreme conditions. For example, the extreme accident rate at an intersection measured over a short time period may be due to local conditions or random variation. If the extreme rate is due in part to the latter, then fewer accidents will probably occur in a subsequent period, whether or not safety improvements are made[5]. Comparisons of the accident rates are susceptible to the bias of statistical regression, which can distort the evaluation of ameliorative actions. Evaluations can also be confused by multiple actions in the same locality, such as effects of simultaneous changes in local transit service and fares on ridership. Validity threats which stem from spatial differentiation underlie the failure of many transportation impact studies to generate useful insights[6]. In these studies, impacts are measured from comparisons between sites adjacent to the new facility and "control" sites of similar circumstances yet far enough removed to supposedly be unaffected. Finding a distant yet comparable monitoring site is difficult at best; finding a distant control site in _________________________________________________________________ 3Compensatory activities are incorporated by the last four threats to Internal Validity in Table 1. which local history does not cause distortions during the study is even harder. To reduce the biases which stem from spatial differen- tiation, areas in closer geographic proximity are often selected for control sites. Social and economic interactions between nearby areas reduce their differences. Unfortunately, those same interactions aid the diffusion of impacts into the control area. Impacts of fixed facilities or service changes can physically diffuse or be imitated by compensatory activity in the surrounding area.3Whether labeled "externality", "indirect impact", "John Henry Effect", or otherwise, the result is a dis- torted comparison between treatment and control areas. Geographic proximity also usually precludes the use of true experimental designs for evaluating changes in spatially contiguous services such as transportation. For ,example, it is nearly impossible to assign at random individual eligibility for a fixed-route transit service as it passes through the neighborhood. Without random assignment, spatially contiguous services can be evaluated only with predictive models or quasi- experimental designs. The former often lack sensitivity to innovations. The latter often require interarea comparisons, and are less effective in controlling the biases inherent to spatial differentiation. The role of geographic space in the Cook and Campbell classification of validity threats can be summarized as a conflict between the effects of spatial differentiation and of geographic proximity. This conflict is illustrated by the attempted evaluation of a door-to-door, advance reservation, transit service for the elderly and handicapped in a six square kilometer portion of Baltimore, Maryland[7]. Evaluators wished to measure changes in the average frequency and length of travel caused by the new service. To subtract out seasonal variations and other extraneous factors, the traditional approach of comparing changes in the service area to changes in a control neighborhood was considered. The only comparable neighborhoods were not far enough removed to preclude residents in the control area from altering their travel behavior to match that of friends in the service area. Changes attributable to the service would then be indeterminant. More distant neighborhoods of similar social and economic characteristics had vastly different locations] attributes, such as distance to downtown Baltimore and other potential destinations of travel. Differences in such attributes distort the comparison of local travel patterns. A quasi-experimental design was eventually selected in which separate samples of households are drawn from the service area twice before and once after the service is implemented. The known validity of this evaluation is restricted to the particular locality and time. In general, external validity is assured mainly through comparisons of areas, but such comparisons are possible only if the conflicting validity threats stemming from spatial differentiation and geographic proximity can be controlled. The geographic aspects of validity discussed so far are readily subsumed by the Cook and Campbell classification. Their concern with the timing of observations and interpersonal diffusion of the treatment are directly analogous to the preceding concern with the degree of spatial separation among monitoring sites. However, these aspects are only a portion of the threats to validity which arise in a spatial context. The Cook and Campbell inventory in Table I must be expanded to include the additional threats. Threats to validity involving geographic space 193 ADDITIONAL VALIDITY THREATS IN geographic SPACE The validation of planning and evaluation research which involves geographic space should include the consideration of eight threats beyond the Cook and Campbell inventory. These additional threats are included in Table 2. Table 2. Geographic validity threats not inventoried by Cook and Campbell --------------------------------------------------------- 1. Boundary Distortions a. Overextension b. Truncation 2. Partition Distortions a. Spurious location and diffusion b. Spatial autocorrelation c. Excessive heterogeneity within.zones d. Density bias 3. Scale Distortions 4. Interaction of Scale and Constructs 5. Interaction of Scale and Statistical Validity 6. Generalizability across Scales 7. Interaction of Space and Time 8. Confusion of Spatial and Aspatial Issues Boundary distortions Boundary distortions, which affect both statistical and internal validity, arise in the definition of the study area. Its boundaries can overextend and dilute the phenomena under study, or the phenomena can be prematurely truncated. Measures of density are particularly susceptible to this problem. Population density, for example, can be altered merely by increasing or decreasing the amount of surrounding, unsettled land encompassed by the study area. Of course, many boundaries can be defined by physical barriers or by discrete changes in the spatial incidence or nature of a phenomena4. Such is rarely the case, however, for small-area studies. In transportation studies, boundary distortions are especially difficult to avoid in the calibration of trip distribution models. When characterizing local travel, trips ending beyond the local area are usually classified as statistical outliers and excluded from the calculations. The number and length of these external trips will affect the values of the model's parameters[9]. Biases can subsequently occur both in predicted intra-area travel volumes and in comparisons of the effects of distance on local trip frequencies. Boundary distortions can be mitigated. If possible, the study area should be defined by the region's functional linkages or by a characteristic which has greater within-region variance than between-region variance. If the use of less appropriate boundaries is required by the data or the political context, then a measure of the degree to which the desired and utilized boundaries differ should be included with the study's results. _________________________________________________________________ 4For example, the rapid change in land values and density of development often define a precise urban boundary[8]. Partition distortions Partition distortions are a potential threat to internal validity whenever the study area is subdivided into analysis zones. These distortions include spurious location or diffusion, spatial auto-correlation, excessive heterogeneity within zones, and the density bias. Spurious location and diffusion can occur when the spatial incidence of a phenomenon is located by centroids of analysis zones. If the phenomenon actually occurs peripherally in a large analysis zone, its location is distorted by its arbitrary assignment to the zone centroid. Should the phenomenon be divided by the boundaries of large zones, its location is falsely split and spuriously diffused among the distant zonal centroids. An example is the use of census tracts to measure the attractiveness of retail centers. Retail centers are usually partitioned by major traffic arteries which frequently demarcate census tracts. Since census tracts are designed for population rather than transportation or retail studies, the retail center is assigned to several distant centroids. This problem is more severe in suburban areas where census tracts are larger. The obvious answer to the spurious location or diffusion problem is to minimize the size of each analysis zone. Partitioning the study area into smaller zones, however, magnifies computational headaches as the number of zones increases. Decreasing zone size can also incur the problem of spatial autocorreladon. Spatial autocorrelation occurs when functionally unified areas are subdivided into zones; measurements of a phenomenon in one analysis zone will be highly correlated to the measurements taken in the adjacent zones. For example, data collected by county for federally funded and state administered programs will probably be highly correlated between counties within each state, but not between counties in different states. While useful for uncovering and predicting geographic patterns, spatial autocorrelation can lead to biased estimators in linear models[10]. Complications also occur when partitions allow too much heterogeneity within zones. If within-zone variance of a phenomenon exceeds its between-zone variance, then the areal units provide a basis for neither precise descriptions nor adequately sensitive predictions. An example is the excessive variance of travel behavior observed within census tracts [11, 12]. This variance has been attributed to the mismatch between observational units and the spatial distribution of causal factors[13]. Mismatches are more prevalent when an arbitrary grid or constant area is used to define the subdivisions. When the size and shape of zones are allowed to vary and more accurately reflect functional units, a density bias can occur. For example, monitored increases in the zonal concentration of new suburban activities may be overrepresented by the larger sizes of census tracts outside the central city[14]. There is no panacea for partition distortions. The amount of potential bias, requisite observational or model sensitivity, and computational capabilities must all be considered if the number, size, shape, and uniformity of subdivisions are not predetermined. These considerations have been examined most recently by Batty[15] and Coulson[16]. Scale distortions Several validity threats which arise in geographic space are related to scale. In this context, scale refers to 194 R.R. SCHMITT the relative magnitude of the study. Micro-scale transportation studies usually involve individuals or households in a neighborhood setting. Macro-scale studies deal with larger aggregates, such as interzonal travel flows throughout an entire city or region. While scale is an issue to each of the four generic classes of validity, scale distortions specifically affect internal validity. Scale distortions occur when a measure is applied to different scales without careful recalibration. Local conditions, which are usually averaged out in aggregate studies, will often cause parametric shifts in a measure of travel behavior. Scale distortions are an unnoticed yet relevant threat to a commonly proposed measure of accessibility [ 17-19]. A zone's accessibility increases with the attractiveness or size of surrounding zones and decreases exponentially with distance to each zone. The rate of exponential decay is taken from a gravity- type trip distribution model calibrated from regionwide travel surveys. The regionwide parameter is used to calculate the accessibility of specific facilities to zones within transportation corridors, ignoring the strong possibility that regionwide travel behavior is not simply mirrored by local residents. The effect of distance on local accessibility is thus distorted because the measure is calibrated at a different scale. Likewise, the use of a locally calibrated measure for larger aggregates of travel behavior is also susceptible to scale distortions. Interaction of scale and constructs More than parametric shifts can occur between scales in which case completely different variables assume importance. To investigate this threat of interaction between scale and constructs, the question must be asked does the operational form of the construct hold for varying distances, densities of activity, or degree of area aggregation? These questions of construct validity would be relevant, for example, if a travel demand model for interurban, rail passenger service was applied to an intraurban subway system. Availability of air trans- portation would be an important variable in the former application, but hardly relevant to the intraurban case Interaction of scale and statistical validity Scale is an important issue when establishing statistical validity. In order to establish covariance between policy actions and indicators of the condition to be ameliorate the scale of the analysis must not be reduced beyond the ability of the data base to provide adequate inference Discrepancies in the data are magnified by increasing disaggregation because they are less likely to be averaged out. Statistical validity depends on the level detail available in the data base. Data can be aggregate above-but rarely disaggregated below-the scale which it is collected. Inferences about larger populations may be drawn from representative samples, but inferences about subgroups require their adequate representation in the sample as well. Any of the factors will affect the consistency of measured result and are explored further by Alonso[20]. Generalizability across scales The final scale problem is one of external validity. The conclusions reached for one scale may not be generalizable to another. For example, activity patterns in medium-size city are not always analogous to those the largest metropolitan areas. Similarly, a door-to-do transit service covering a 6km area may not be comparable with one serving 100 km, even if the intended clientele have similar characteristics. As with the other validity threats related to scale, control of this threat is not readily accomplished with an analytical device. The best "control" is an awareness of scale-related problems and the need to avoid them by matching the scale of the study to the scale of the particular research question. The expediency of using tools and results from one scale to another will most likely cause more problems than it is worth. Interaction of space and time The seventh validity threat in Table 2 is the interaction of space and time. It should be obvious that "the use of space involves movement, and movement consumes time"[21], yet this point is occasionally forgotten in transportation studies. This is particularly true for estimates of latent travel demand, in which frequencies of travel are often compared without consideration of the trip lengths. Trips of similar frequencies but differing lengths do not represent equivalent amounts of travel. Furthermore, similar trip lengths in physical distance may be radically different in traveltime and thus not truly the same. In short, equivalent patterns of behavior must be equal in their respective consumption of both time and space. Confusion of spatial and aspatial issues The confusion of spatial and aspatial issues is the last and perhaps most fundamental validity threat inherent to research at geographic scales, and stems from the misattribution of a spatial effect to a spatial rather than aspatial cause. Rapid suburbanization in the late 1960s is an example of one spatial effect that has been commonly attributed to a spatial cause (investments in major transportation facilities)[22]. Consideration must also be given to aspatial causes, such as housing subsidy programs, tax laws, and the national economic climate, all of which have spatial expressions but are not necessarily applied to specific, spatial domains[231. The excessive within-zone variance of travel behavior previously mentioned may have its roots in the confusion of spatial and aspatial issues. The use of areal units to explain travel behavior assumes that a spatial process such as residential differentiation affects the observed spatial behavior (i.e. travel), although the effect is unclear. Yet excessive' within-zone variance of travel behavior is evidence of heterogeneity within the area units, stemming either from the previously discussed partition distortions or from the aspatial nature of causal factors. Kutter[24] argues the latter: that travel behavior responds more to individual characteristics than to spatial settings. The confusion of spatial and aspatial issues can often be attributed to disciplinary turf. Overemphasis of spatial factors is common for geographers and their allies, while economists and their allies tend to underemphasize space (if only for theoretical tractability). In either case, the construct validity of a measured or predicted change is left in doubt unless both spatial and aspatial interpretations are considered. CONCLUSIONS The validation of measurements of change is a critical step for establishing confidence in the results of planning and evaluation research. Many threats to validity have been inventoried by Cook and Campbell in an essentially aspatial context. In this paper, their inventory has been re-examined and expanded to accommodate more fully the validity threats which arise in geographic space. These threats are inherent to studies of land use, transportation, and environmental impacts. While a number of validity threats have been explicated, methods for their control have been given only cursory attention. Such methods are numerous, and their use depends on the research topic, purpose, setting, approach to measurement and availability of resources. In the vast majority of cases, control of all validity threats which are relevant to the particular study is impossible. However, simple awareness of the possible biases in specific findings is often sufficient to indicate what range of the finding's applications are appropriate. Decisionmakers are rightfully skeptical of results from a "black box" or study in which the assumptions and difficulties are obscured. In the long run, clear knowledge of the caveats encourages far greater confidence in and better use of research findings. REFERENCES 195 1. A. Kaplan, The Conduct of Inquiry: Methodology for Behavioral Science. Chandler, Scranton, Pennsylvania (1964). 2. E. A. Suchman, Evaluative Research: Principles and Practice in Public Service and Social Action Programs. Russell Sage Foundation, New York (1967). 3. T. D. Cook and D. T. Campbell, The design and conduct of quasi-experiments and true experiments in field settings. In Handbook of Industrial and Organizational Psychology (Edited by D. Dunnette), pp. 223-326. Rand McNally. Chicago (1976). 4. T. C. Chamberlin, The method of multiple working hypotheses. Science 15, 92-96 (1890). 5. L. I. Griffin, B. Powers and C. Mullen, Impediments to the Evaluation of Highway Safety Programs. Highway Safety Research Center, University of North Carolina, Chapel Hill (1975). 6. Charles River Associates, Measurement of the Effects of Transportation Changes. Urban Mass Transportation Ad- ministration, Washington, D.C. (1972). 7. R. R. Schmitt and D. L. Greene, Evaluating transportation innovations with the intervening opportunities model. Northeast AIDS Proc. pp. 184-187 (1977). 8. R. Barden and J. H. Thompson, The Urban Frontier. Occ. Paper No. 4, Syracuse University (c. 1970). 9. T. J. Wilbanks, Measuring Accessibility: Progress Rep. I, Occ. Paper No. 5, Syracuse University (1970). 10. A. D. Cliff, P. Haggett, J. K. Ord, K. Bassett and R. Davies, Elements of Spatial Structure: A Quantitative Approach, Chap. 9. Cambridge University Press, New York (1975). 11. G. M. McCarthy, Multiple regression analysis of household trip generation: a critique. Highway Res. Rec. 297, 31-43 (1969). 12. M. Wachs, Resource paper. In Highway Research Board, Urban Travel Demand Forecasting, Special Rep. No. 143, pp. 96-113. Washington, D.C. (1973). 13. E. Aldana, R. deNeufville and J. H. Stafford, Microanalysis of urban transportation demand. Highway Res. Rec. 446, 1-11 (1973). 14. D. L. Greene, Multinucleation in urban spatial structure. Ph.D. dissertation, Johns Hopkins University (1977). 15. M. Batty, Urban Modelling: Algorithms. Calibrations, Pre- dictions. Cambridge University Press, New York (1976). 16. M. R. C. Coulson, 'Potential for variation': a concept for measuring the significance in the size and shape of areal units for their use in descriptive and analytical studies. Geografiska Annaler, in press(1978). 17. U.S. Department of Transportation, Special Area Analysis, Final Manual, Washington, D.C.(1973). 18. L. Sherman, B. Barber and W. Kondo, method for evaluating metropolitan accessibility. Transpn Res. Rec. 449, 70-87 (1974). 19. R. W. Vickerman, Accessibility attraction and potential: a review of some concepts and their use in determining mobil- ity. Environment Plan. 6, 675-691 (1974). 20. W. Alonso, Predicting best with imperfect data, J. Am. Instit. Planners 34. 248-255 (1968). 21. I. G. Cullen, Space, time and the disruption of behavior in cities. Environment Plan. 4, 459-470 (1972). 22. U.S. Federal Highway Administration, Social and Economic Effects of Highways, Washington, D.C. (1976). 23. D. W. Harvey, The political economy of urbanization in advanced capitalist societies: the case of the United States. In The Social Economy of Cities (Edited by G. Gappert and M.Rose), pp. 119-163. Sage Publications, Beverly Hills (1975). 24. E. Kutter, A model for individual travel behavior. Urban Studies 10, 235-258 (1973).