The Locus of Word Frequency Effects Revealed by Patterns of Task Interference


Roger W. Remington

NASA Ames Research Center

MS 262-2

Moffett Field, CA 94035

&

Robert S. McCann

Western Aerospace

NASA Ames Research Center

MS 262-2

Moffett Field, CA 94035




Paper presented at the annual meeting of the Psychonomic Society, Washington, DC., November 1993.

Abstract

Two Psychological Refractory Period experiments investigated contrasting hypotheses concerning the processing locus of the word frequency effect in lexical decision. The encoding hypothesis places the word frequency effect in a preattentive lexical access stage of visual word processing; the decision hypothesis places the effect in attentive decision processes. To examine these hypotheses, subjects performed a lexical decision task presented at varying intervals following a primary tone discrimination task. Response times in lexical decision showed the typical second task slowing at short presentation intervals. This slowing was additive with word frequency. We argue that this additivity favors the decision hypothesis, suggesting a central locus for the word frequency effect.

Introduction

The overt behaviors we observe and measure in the study of human cognition are generally not unitary phenomena, but instead are the products of a collection of more elementary mental operations. In identifying a visually presented word, for example, elementary visual features are extracted, letter identities computed, a pronunciation generated, a meaning retrieved, and the appropriate response selected. Ultimately these mental operations are themselves the products of neural computations. The current view is that such neural computations are localized to specific brain regions, and that the brain can be conceived as a massively distributed processing system with each region performing local computations (e.g., Ullman, 1984; Posner, Petersen, Fox, & Raichle, 1988; Seidenberg & McClelland, 1989; Minsky, 1986).

Though neural computation may be massively parallel, many of the mental operations that underlie the processing of even simple choice response time tasks are executed in sequence. This sequentiality arises naturally from considering how information would flow in a system in which the mental operations were independent, but highly interconnected. In such an interconnected system there are natural computational dependencies. The outputs of some mental operations are the inputs to others. In reading, for example, the processing of the elementary visual features provides information about the letter identities, the letter identities about the words, and the words combine to yield a meaning.

Sequentiality can occur in the absence of computational dependencies if we further assume that although different mental operations can compute in parallel, each operation is limited in its ability to process inputs. We consider the simplified case where each operation can process only one input at a time. When two or more inputs compete for the same mental operation, a bottleneck results. One of the competing inputs must then be queued up until the other has been processed. Evidence for the postponement of specific mental operations has been observed in cases where the stimulus-response mapping operations from two separate tasks compete (Pashler, 1984).

It follows naturally from this architecture that both parallel and serial processing are common. Parallel information flow is constrained only by computational dependencies and bottlenecks. This analysis of mixed parallel and serial processing has been developed extensively in stage models of the information flow in task processing. In stage modeling (e.g.. Sternberg, 1969), the putative mental operations for specific tasks are grouped into a sequence of three independent, sequential processing stages: encoding, decision, and response. The encoding stage handles the routine aspects of stimulus processing including the identification of familiar stimuli. The decision stage handles non-routine, context-dependent processing, such as deciding to which arbitrary category an item belongs, or which arbitrary response should be selected. The response stage handles the programming and execution of responses. Processing is assumed to proceed in a linear sequence from encoding to decision to response.

The independence assumption asserts that processing time for a given stage is not affected by the processing in other stages. Independence implies that it should be posible to find factors that alter the processing time of one stage while leaviing the times for other stages unaffected. This assumption also implies that pipelining should be possible; Task 1 could be in the response stage with Task 2 in the decision stage and Task 3 in the encoding stage. In addition, evidence supports the view that the encoding and response stages are themselves capable of significant parallel processing. That is, the encoding operations for two (or more) tasks can be done in parallel, as can their respective response operations. (This parallelism is subject to the constraint that the two tasks do not place competing demands on perceptual or motor resources.) It is commonly assumed, however, that he decision stage (central processor) can handle only one input at a time, and is thus the limited capacity stage responsible for bottlenecks.

Quantitative methods have been developed (discussed below) to determine whether an experimental factor affects processing before or after a bottleneck. By creating a bottleneck in central processing then, it is possible to order certain mental operations for a given task by determining whether factors that affect those operations occur before or after the central processing bottleneck.

We use this approach to examine the processing locus of the mental operations that underlie the word frequency effect in visual word processing, specifically in lexical decision. In the lexical decision task, subjects make a speeded choice response to indicate whether a visually presented letter string is a word or a nonword. The word frequency effect refers to the robust observation that "word" response times are faster for words that occur more frequently in English than for words that occur less frequently, or rarely. This effect occurs even when all the low-frequency words are known and are correctly recognized as words. It also occurs in the absence of any information in the letter sequence itself that would serve to distinguish low-frequency words from high-frequency words. Rather, the word frequency seems to point to something fundamental about how word-related knowledge is acquired, stored, and retrieved. Because of this, it is one of the handful of fundamental phenomena for which all theories of visual word processing must provide a satisfying account. We ask whether the effects of frequency stem from encoding or decision operations. The theoretical interpretation of word frequency will depend considerably on whether it is found to affect encoding processes specialized for lexical processing, or whether it affects decision processes.

Two contrasting hypothesis of the locus of word frequency effects in lexical decision are considered here. The encoding hypothesis, asserts that lexical access is localized in one or more early parallel operations, prior to central processor bottlenecks. The word frequency effect occurs because the preattentive lexical access takes longer for low frequency words than for high frequency words. Evidence for preattentive, automatic lexical access has been obtained (e.g.., Freidrich, 1991; Mullin & Egeth, 1989; Meyer, Schvaneveldt, & Ruddy, 1975), and localizing the frequency effect to a parallel encoding stage is a feature of many accounts of word recognition (e.g., Morton, 1969; McClelland & Rumelhart, 1981; Monsell, Doyle, & Haggard, 1989; Monsell, 1990). The assignment of the word frequency effect to stimulus-driven, parallel operations appears even in models that deny the existence of a specific lexicon (Seidenberg & McClelland, 1989). A strong version of the encoding hypothesis would maintain that the measured increase in response time for low frequency words is a direct measure of this increased automatic encoding time.

The decision hypothesis, however, localizes the word frequency effect in decision operations that require the limited capacity central processor. This hypothesis asserts that the word frequency effect occurs because more central processing time is required to decide the lexicality of a low frequency word than for a high frequency word. Increased decision time could occur, for example, if there were several sources of evidence used in deciding wordness, and that the amount of evidence at any point in time (over some restricted time domain) was directly proportional to word frequency. Recent evidence suggests that some if not all of the word frequency effect can be attributed to central decision operations (Becker, 1976; Herdman & Dobbs, 1989; McCann, Folk & Johnston, 1992; Balota & Chumbley, 1984; McCann, Besner, & Davelaar, 1988) Note that the decision hypothesis makes no claim about the specific locus (or existence) of lexical access. Lexical access could still be occurring during encoding. The claim is that the word frequency effect does not result from an increase in the time for any encoding process, whether it is lexical access or some other computation.

Logic of the Present Experiments

We investigate whether word frequency affects encoding or decision processes by looking for evidence that lexical decision computations affected by word frequency are carried out simultaneously with central, decision-level operations on a separate task. To do this, a primary task is presented shortly before the lexical decision task and subjects are instructed not to let the second task (lexical decision) affect response time to the primary task. The primary task should thus tie up central processing resources, delaying the availability of the central processor for the lexical decision task. During the delay, lexical processing that does not require central processing resources can proceed, processing that requires the central processor must wait until the central demands of the primary task have been completed. By delaying central processing resources we can apply the cognitive slack technique (Schweickert, 1978, 1980, 1983) to investigate the degree to which the data support the encoding and decision hypotheses.

The logic of the cognitive slack technique can be illustrated by way of the simple 3-stage model of task processing shown in Figure 1. Each of the three boxes represents a collection of operations, which constitute a stage of processing (Sternberg, 1969). The clear boxes indicate stages whose operations do not require limited capacity central resources and can be done in parallel; the hashed box denotes a processing stage whose set of operations do require limited capacity central processing. The three stages are executed in a linear sequence with the output of one stage serving to trigger activity in the subsequent stage.

The three-stage architecture of Figure 1, assumes that early processing (encoding) is done in parallel, followed by a limited capacity central processor stage that requires attention,. This basic structure is a feature of a number of cognitive architectures (e.g., Newell, 1992; Pashler, 1984; Anderson, 1983; Sternberg, 1969; but see Meyer & Kieras, 1992 for an alternative). It is likely that encoding and motor operations can be done in parallel not because of massive general resources, but because there is a high degree of specialization. Thus, if one chooses stimuli that are very different, say an auditory and a visual stimulus, they can be encoded in parallel since they make demands on separate specialized systems. However, such specialized processing seldom satisfies all the demands of task processing. At some point in processing, a general resource is needed to deal with the situation-specific demands of a task. This more general resource in Figure 1 is the central processor which because it can only do one thing at a time, constitutes a processing bottleneck.


Figure 2 shows the encoding and decision hypotheses in terms of the stage model of Figure 1. According to the encoding hypothesis, low frequency words increase the time for the encoding stage. This is indicated by extending the clear rectangle that depicts the parallel encoding stage. According to the decision hypothesis, low freqency items extend the duration of the central processing stage. This is indicated by extending the hashed rectangle that depicts the central processing stage.


These contrasting assertions about the locus of the word frequency effect can be distinguished experimentally. Consider

Figure 3 which depicts a condition from a typical psychological refractory period (PRP) experiment. In the PRP paradigm, two tasks are presented in close temporal succession. Because it is presented first, Task 1 gains access to the central processor before Task 2. This delays the availability of the central processor for Task 2 central processing, creating slack in Task 2 processing. By the independence assumption discussed above, encoding processes for Task 2 can proceed in parallel with central processing on Task 1, but central processing operations for Task 2 cannot proceed until Task 1 central processing is completed.

Because of the slack created by the central processing bottleneck factors which lengthen the time for Task 2 encoding will produce different results than factors that lengthen the time for Task 2 central processing. This is shown in Figure 4. In the Hard "A" case the Task 2 difficulty factor has increased the time for a central processing operation. In the Hard "B" case it has increased the time for an early, parallel encoding stage. When Task 2 is done in isolation, the difficult levels of both factors produce equivalent increases in task completion times. However, in the presence of slack created by Task 1 central processing, Factor A yields additivity, while Factor B yields underadditivity. Factor A additivity obtains because the difficulty manipulation affects processing at or after the central processing bottleneck; there is no way to do any part of the extra work in the difficult condition during the slack. With Factor B, underadditivity obtains because some or all of the extra processing for the difficult condition can be done during the slack time.


Thus, when the lexical decision task is presented as the second of two tasks, the encoding hypothesis predicts a different pattern of results than the decision hypothesis. Like factor A, the decision hypothesis localizes the effects of word frequency in the central processor (decision). Like factor B, the encoding hypothesis localizes the effects of word frequency in encoding operations occurring prior to central processor demands. As a result, the decision hypothesis predicts additive effects of word frequency and dual-task slowing; the encoding hypothesis predicts underadditive effects.



Experiment 1

The lexical decision task was presented as the second of two tasks (T2) in a psychological refractory period (PRP) paradigm, with word frequency as a variable. Task 1 (T1) in our experiments was a tone discrimination task in which subjects pressed one of two keys with the middle or index fingers of the left hand to indicate whether a high tone (900 Hz) or a low tone (500 Hz) was presented. The lexical decision task was presented at stimulus onset asynchronies (SOA) of 100, 200, 400, and 800 ms following tone onset. Subjects pressed either the middle or index finger of the right hand to indicate whether or not the letter string was a word. By presenting the two tasks in different modalities, we eliminate or reduce conflicts for encoding resources, making it likely that central, decision-level stages are the primary sources of conflict.

The experimental logic is illustrated in Figure 5. Increases in the T1-T2 stimulus onset asynchrony (SOA) decrease the internal overlap of the two tasks, reducing the processing slack. At short T1-T2 SOAs, T1 processing is not complete and T1 and T2 make competing demands for limited-capacity central processing resources. Since T1 is presented first, it generally wins access, forcing T2 to wait for the completion of T1 central processing. Thus, T2 processing will be delayed by an amount proportional to the time required to complete T1 central processing. At longer SOAs, there is an increasing probability that T1 central processing will be complete, or almost complete, when T2 is presented, reducing the time T2 must wait for the central processor. At the longest SOA used here, 800 ms, it is highly likely that T1 processing will be complete when T2 is presented and the lexical decision and tones tasks should overlap little if at all at this SOA.


Now what happens to the word frequency effect as SOA is decreased? At long SOAs, say 800 ms, where there is little or no overlap of the two tasks, increasing the time of a mental computation will simply add to the total response time by an amount equal to the increase. This is true whether the affected computation occurs in encoding or decision.

At short SOAs, say 100 ms, it matters a lot where the increased computation time occurs. Here, the lexical decision task (T2) must queue up and wait for the decision process to finish with the tones task (T1). If word frequency affects a computation that occurs prior to the decision process and can be done in parallel, then all or some of the word frequency effect should disappear, since any extra computation can be done while T2 is waiting for the central processor.

Figure 6 shows the predictions for the two hypotheses as a function of SOA. The decision-level hypothesis predicts additive effects of SOA and word frequency. The encoding hypothesis, however, predicts underadditive effects; the word frequency effect should be reduced as SOA is reduced, since as task overlap increases, the waiting time or slack increases allowing the extra encoding computation time required for low frequency words to finish.

In Experiment 1, we ran 24 volunteers from NASA staff and local universities in a 1-hr session. All subjects were native speakers of English between the ages of 18-40 who participated for fun or class credit.



Results

Figure 7 shows the response time to the tones task (T1) along with overall response time to nonwords and words (averaged over frequency). Responses to T1 were 19 ms faster at 800 ms SOA than at the 100 ms SOA. This effect was significant (p ² .02). Though it is desirable to have completely flat T1 RTs, small elevations of RT1 at short SOAs are commonly observed.

There was a significant PRP effect (p ² .001) seen in the elevated response times for both words and nonwords at early SOAs. The average magnitude of the effect was 260 ms. The slope of the curve for words between the 100 and 200 ms SOAs was -0.86, for nonwords -0.99 consistent with substantial T2 postponement at these delays. Response time for words was about 50 ms faster than for nonwords. This effect was highly significant (p ² .001). The interaction of SOA and wordness (word vs. nonword) was not significant, despite a trend toward underadditivity.

Figure 8 shows the interaction of word frequency with SOA. Subjects took on average 66 ms longer to respond to low frequency words than to high frequency words. This was highly significant (p ² .001). The effects of word frequency though were completely additive with SOA (p ² .939). The word frequency effect was 62 ms at SOA-800, and 69 ms at SOA-100.

Our results are consistent with the decision-level hypothesis. We found no evidence that word frequency affects computations that can be done prior to the availability of central processing resources. However, responses to both tasks were made with the middle and index fingers of each hand. It is possible that such highly similar response mappings interfered with attempts to process any part of the two tasks in parallel. To keep T2 from disrupting T1, subjects may have chosen not to engage in T2 processing that could otherwise have gone on in parallel. Indeed, lexical decision response times seem rather high in this experiment.


Experiment 2

To overcome this potential confound Experiment 2 used a joystick response for the tones task. Subjects used their right hand to move the joystick lever forward (up) for a high tone and back (down) for a low tone. In this way we eliminated at least the most obvious source of response interference and provided subjects with a highly compatible response mapping for T1. In all other respects, the experiments were identical.

Figure 9 shows the T1 latencies along with the word and nonword latencies collapsed across frequency and plotted as a function of SOA. The interaction of T1 latency with SOA was marginally significant (P ² .07); there was a 9 ms difference between the maximum and minimum T1 latencies, and only a 7 ms advantage for the SOA-800 over SOA-100. Clearly, subjects were able to keep RT1 constant.

There was a 34 ms effect of wordness which was highly significant (p ² .001), and a marginally significant underadditive interaction of wordness with SOA (p ² .06). The word-nonword difference was 50 ms at SOA-800, and 18 ms at SOA-100. The slope of the curve for words between SOA-100 and SOA-200 ms was -1.00, for nonwords -.79. As in Experiment 1, there was substantial postponement for words and nonwords at short SOAs.

Figure 10 shows that the effects of word frequency were again completely additive with SOA; there was no hint of an underadditive interaction (p ² .49). High frequency words were responded to 65 ms faster than low frequency words -- a figure that compares well to the 66 ms effect in Experiment 1. This was highly significant (p ² .001).

Discussion

We found no evidence that the magnitude of the word frequency effect reflects the increased time to perform preattentive lexical encoding of low frequency words. Our results rule out accounts in which the increased time to respond to low frequency words is a measure of the extra time needed for a preattentive lexical encoding. Instead, our results are consistent with a later, decision-level locus for word frequency. This conclusion is consistent with that of Herdmann & Dobbs (1989), Balota & Chumbly (1984), and Becker (1976) who have reached the same conclusion from examinations of different lexical phenomena.

It could be argued that unspecified factors in our experiments prevented subjects from processing information about the letter strings until they had completed cognitive work on Task 1. If subjects simply chose to do nothing on Task 2 until they had completed Task 1, the results of all Task 2 difficulty manipulations would be additive regardless of the locus of processing. Our results then could reflect strategic choices rather than architectural features of human information processing. It is clear from the response time data that the internal processing of the two tasks overlapped at short SOAs. If the tasks were done stictly in sequence, Task 2 delays should be equal to the Task 1 response times minus the SOA. It can be easily seen from examining Figure 7 and Figure 9 that the observed delays are not as large as predicted by this account, indicating an overlap in the processing of the two tasks. Could subjects still have chosen to postpone perceptual processing of the letter string until after central processing for the tones was complete? This is unlikely given the consistent trends toward an underaddive interaction of words and nonwords. Though this trend was not significant, it was present in both experiments, falling just short of significance in Experiment 2. A plausible explanation for this underadditive trend is that early perceptual information was sufficient to identify some letter strings as nonwords. This suggests that subjects were in fact processing the letter string prior to the completion of Task 1 central processing.

Do our results imply that there are no preattentive computations in lexical encoding that are sensitive to word frequency? Not necessarily. Even if, as we have asserted, the locus of the word frequency effect is in decision, the increased decision time for low frequency words could reflect a richer, or more complete preattentive encoding for high frequency words. Thus, the evidence available to decision may be different for high and low frequency words and may accumulate at different rates. Our conclusion is really that the increased response time observed for low frequency reflects an increase in decision time, not an increase in preattentive lexical encoding time. The additivity we found is inconsistent with accounts that attribute the word frequency effect to an increase in the duration of an encoding stage for low frequency words. Any such increase should have led to an underadditive interaction of word frequcency with SOA in our experiments.

As to the nature of the decision processes responsible for the word frequency effect, we prefer at present to make only a general claim that there is greater decision noise for low frequency words than for high frequency words. One way for this to occur is to assume that there are multiple sources of information as to the lexicality of a letter string. Balota & Chumbly (1984) have suggested meaningfulness and visual familiarity, but there could well be others. Low frequency items would have reduced strength on one or more dimensions. On the other hand, it could be that lexical access time is the sole source of the word frequency effect. If so, our results clearly show that lexical access requires limited-capacity central processing.

References

Anderson, J.A. (1983). The Architecture of Cognition. Cambridge, MA: Harvard University Press.

Balota, D.A., & Chumbley, J.I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology; Human Perception and Performance, 10, 340-357.

Becker, C.A. (1976). Allocation of attention during visual word recognition. Journal of Experimental Psychology; Human Perception and Performance, 2, 556-566.

Freidrich, F.J., Henik, A., and Tzelgov, J. (1991). Automatic processes in lexical access and spreading activation. Journal of Experimental Psychology; Human Perception and Performance, 17, 792-806.

Herdman, C.M. and Dobbs, A.R. (1989). Attentional demands of visual word recognition. Journal of Experimental Psychology; Human Perception and Performance, 15, 124-132.

Marr, D. (1982). Vision. San Francisco: W.H. Freeman.

McCann, R.S., Besner, D., and Davelaar, E. (1988). Word frequency and identification: Do word-frequency effects reflect lexical access? Journal of Experimental Psychology; Human Perception and Performance, 14, 692-706.

McCann, R.S., Folk, C.L., and Johnston, J.C. (1992). The role of spatial attention in visual word processing. Journal of Experimental Psychology; Human Perception and Performance, 18, 1015-1029.

McClelland, J.L. and Rumelhart, D.E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.

Meyer D.E. & Kieras, D.E. (1992). The PRP effect: Central bottleneck, perceptual-motor limitations, or task strategies? Paper presented at the Annual Meeting of the Psychonomics Society, St. Louis, MO, 1992.

Meyer, D.E., Schvaneveldt, R.W., and Ruddy, M.G. (1975). Loci of contextual effects on visual word-recognition. In P.M.A. Rabbitt and S. Dornic (Eds.), Attention and Performance V. New York, NY: Academic Press.

Minsky, M. (1986) The Society of Minds. New York: Simon & Schuster.

Monsell, S. (1990). Frequency effects in lexical tasks: Reply to Balota and Chumbley. Journal of Experimental Psychology: General, 119, 335-339.

Monsell, S., Doyle, M.C., and Haggard, P.N. (1989). Effects of frequency on visual word recognition tasks: Where arey they? Journal of Experimental Psychology: General, 118, 43-71.

Morton, J. (1969). Interaction of information in word processing. Psychological Review, 76, 165-178.

Mullin, P.A. and Egeth, H.E. (1989). Capacity limitations in visual word processing. Journal of Experimental Psychology; Human Perception and Performance, 15, 111-123.

Newell, A. (1990). Unified Theories of Cognition. Cambridge, MA: Harvard University Press.

Paap, K.R., McDonald, J.E., Schvaneveldt, R.W., and Noel, R.W. (1987). Frequency and pronounceability in visually presented naming and lexical decision tasks. In Coltheart, M. Attention and Performance XII: The Psychology of Reading. Hillsdale, NJ: Lawrence Erlbaum Associates.

Pashler, H. (1984). Processing stages in overlapping tasks: Evidence for a central bottleneck Journal of Experimental Psychology; Human Perception and Performance, 10, 358-377.

Schweikert, R. (1978). A critical path generalization of the additive factor method: Analysis of a Stroop task. Journal of Mathematical Psychology, 18, 105-139.

Schweikert, R. (1980). Critical-path scheduling of mental processes in a dual task. Science, 209, 704-706.

Schweikert, R. (1983). Latent network theory: Scheduling of processes in sentence verification and the Stroop effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 353-383.

Seidenberg, M.S. and McClelland, J.L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568.

Sternberg, S. (1969). The discovery of processing stages: Extensions of Donder's method. Acta Psychologica, 30, 276-315.

Ullman, S. (1984). Visual routines. Cognition, 18, 97-159.