Copyright © 2006, The National Academy of Sciences Psychology A gain-control theory of binocular combination § To whom correspondence should be addressed. E-mail: sperling/at/uci.edu. Contributed by George Sperling, November 18, 2005 | |||||||||||
Abstract In binocular combination, light images on the two retinas are combined to form a single “cyclopean” perceptual image, in contrast to binocular rivalry which occurs when the two eyes have incompatible (“rivalrous”) inputs and only one eye`s stimulus is perceived. We propose a computational theory for binocular combination with two basic principles of interaction: in every spatial neighborhood, each eye (i) exerts gain control on the other eye's signal in proportion to the contrast energy of its own input and (ii) additionally exerts gain control on the other eye's gain control. For stimuli of ordinary contrast, when either eye is stimulated alone, the predicted cyclopean image is the same as when both eyes are stimulated equally, coinciding with an easily observed property of natural vision. The gain-control theory is contrast dependent: Very low-contrast stimuli to the left- and right-eye add linearly to form the predicted cyclopean image. The intrinsic nonlinearity manifests itself only as contrast increases. To test the theory more precisely, a horizontal sine wave grating of 0.68 cycles per degree is presented to each eye. The gratings differ in contrast and phase. The predicted (and perceived) cyclopean grating also is a sine wave; its apparent phase indicates the relative contribution of the two eyes to the cyclopean image. For 48 measured combinations of phase and contrast, the theory with only one estimated parameter accounts for 95% of the variance of the data. Therefore, a simple, robust, physiologically plausible gain-control theory accurately describes an early stage of binocular combination. Keywords: binocular vision, neural networks, perception, rivalry, vision | |||||||||||
When different images are presented to the left and right eyes, only a single, combined “cyclopean” image is perceived. Let IL(x, y) and IR(x, y) be the images presented to the left and right eyes, respectively, and Î(x, y) be the perceived cyclopean image. The problem is to find a binocular combination functional Γ that maps two input images IL(x, y) and IR(x, y) into a single perceived cyclopean image Î(x, y), i.e.,
| |||||||||||
Model Constraints. We propose a solution for binocular combination Γ that satisfies three conditions.
The following presents a sequence of successively more complex models to illustrate the steps by which we arrived at a Γ that satisfies the above constraints. Model 1: Linear Summation. The simplest case for binocular combination is simple linear summation. Suppose, as shown in Fig. 1a, that, within a narrow spatial frequency band, the cyclopean image is the sum of two images presented to two eyes, i.e.,
The linear summation model also fails to account for experimental data. In the experiment described below, we find that the eye presented with a higher-contrast stimulus has more influence on the cyclopean image than would be predicted by simple linear summation. Model 2. For left- and right-eye images IL and IR, model 2 proposes that each eye exerts gain control on the other (Fig. 1b) [e.g., Cogan's model (1) and the initial stage of Wilson's binocular rivalry model (2)]:
Suppose that identical images I are presented to each eye and, therefore, that the TCE for each eye is the same, εL(I) = εR(I). From Eq. 4 it is obvious that Γ(I, I) becomes a smaller and smaller fraction of Γ(I, 0) as TCE increases above 1. For example, consider a simple sine wave in each eye for which ε is simply proportional to stimulus contrast. That the perceived cyclopean sine wave becomes increasingly weaker relative to a monocular sin wave as ε > 1 increases is an obvious violation of fact. Model 3. Although Eq. 4, which describes model 2, obviously fails as written, replacing the gain-controlling terms εL(IL) and εR(IR) with terms that were normalized to 1 might remedy the difficulties. This observation motivates model 3 (Fig. 1d). In every neighborhood, each eye (i) exerts gain control on the other eye in proportion to the strength of its own input and (ii) exerts gain control on the other eye's gain control.
| |||||||||||
Experiment 1 In all of the experiments reported herein, we take advantage of a simple mathematical fact: The arithmetic sum of two sine waves of the same wavelength is again a sine wave of the same wavelength whose amplitude and phase depend on the phases and amplitudes of the two component sine waves. It is both reasonable to assume and empirically observed that the cyclopean image of two parallel monocular sinewave gratings of the same wavelength is indeed, to a very close approximation, a sinewave grating of the same wavelength. Therefore, in this instance, predicting the combined cyclopean image is equivalent to predicting the apparent phase and amplitude of the cyclopean sine wave. The relative contribution of each eye to the cyclopean sine wave is easily determined from the perceived phase of the cyclopean sinewave grating. Fig. 2 illustrates our procedure for measuring the perceived phase of a cyclopean sinewave grating when two sinewave gratings of different contrast and different phase are presented to two eyes, respectively. Stimuli. A horizontal sinewave grating is presented to each eye. Eqs. 7 and 8 and Fig. 2 describe the stimuli to the left and right eyes, respectively,
Procedure. Every trial begins with a uniform field of luminance L0, presented to each eye upon which a black fixation cross with two dots is arranged so that with correct vergence, a single cross with four symmetrically placed dots is perceived (Fig. 2a). Once a single cross with four symmetric dots is clearly perceived, the subject presses a key to continue the trial. The key press produces a blank screen (Fig. 2b) of luminance L0 for 0.5 s, then 1 s of sinewave gratings to the two eyes (Fig. 2c). The blank screen is restored until the observer responds. The observer's task is to indicate the apparent location of the dark stripe in the perceived cyclopean sine wave relative to black horizontal reference lines adjacent to each edge (Fig. 2c). When the reference line is judged above the dark cyclopean stripe, a key press indicating “above” is made; otherwise the “below” key press is made (Fig. 2d). After the response, the cross-plus-four-dots fixation image for the next trial appears. As shown in Fig. 3a, in all displays a sine wave is presented to one eye with phase shift θ/2 above the midline and to the other eye with phase shift –θ/2 below the midline, thereby producing a relative phase shift θ between the images in the two eyes. The higher-contrast sine wave has contrast m, 0 < m ≤ 1; the other sine wave has contrast δm, 0≤ δ≤ 1. A “condition” is characterized by three parameters: θ, the phase difference between left- and right-eye sine waves; m, the contrast of the higher-contrast sine wave; δ, the fractional reduction in contrast of the lower-contrast sine wave. For every condition, there are four different displays: The higher-contrast sine wave can be either above the midline in the left eye (α1) or right eye (α2), or it can be below the midline in the left eye (α3) or right eye (α4) (examples of display types α1 and α3 are shown in Fig. 3 a and b). For each of the four displays (α1, α2, α3, and α4) comprising a condition, the perceived location of the cyclopean sine wave (, and ) is determined by means of a psychophysical up–down tracking procedure. The perceived location of the cyclopean bar for a condition (θ, m, δ) is given by . This measure of has the advantage of canceling slight position or eye biases should they occur. has the property that, when one eye is closed (δ = 0), the location of cyclopean sine wave is identical to that of the monocular sine wave, so . When two eyes have the same stimulus (δ = 1), . The perceived phase shift measures how far a particular contrast ratio δ pushes the cyclopean perception toward the maximum possible value θ. The perceived phase shift was measured for 48 conditions with values of m = {0.05, 0.10, 0.20, 0.40}, δ = {0.3, 0.5, 0.71, 0.86}, and θ = {45, 90, 135} degrees. All 192 display types were interleaved in a mixed-list design (i.e., 192 up–down staircases were run concurrently). Three observers were tested. Results. Sample results for m = 0.05 and m = 0.40 of one observer are shown in Fig. 3 c and d, each of which shows 12 (of 48) conditions. The ordinate indicates the perceived phase shift and the abscissa indicates the contrast ratio δ. The dashed curves are predictions of the linear summation model (Fig. 1a):
The solid lines fitted to the data are generated by model 3. Even the lowest-contrast stimuli in this experiment are sufficiently strong that the total contrast energy and . Given the estimated parameters, neglecting the 1 in the numerator and denominator of Eq. 6 changes the prediction by <1% and simplifies it to yield Eq. 10
The advantage of Eq. 10 over Eq. 6 is that, together with Eqs. 7 and 8, it yields a simple expression for the perceived phase shift
| |||||||||||
Experiment 2: Spatial Frequency Selectivity of Binocular Gain-Control When binocular combination is being determined within one spatial frequency band (e.g., 0.68 cpd in experiment 1), how do the stimuli in other spatial frequency bands influence the combination, e.g., by contributing to gain control? Experiment 2 addresses this issue. Procedure. The stimuli and procedure are generally similar to those in experiment 1 except that the contrast of sinewave gratings presented to two eyes is identical. In experiment 2, various 2D spatial-bandpass-filtered noises are added to one eye's grating to determine how the spatial frequency and contrast of the added noise affect that eye's weight in binocular combination. The icons in Fig. 4 illustrate added-noise stimuli. The left- and right-eye horizontal gratings are described by Eqs. 7 and 8 with mL = mR = m, θ = 90° and fs = 0.68 cpd. One of six bandpass noises, each with a 2.4-octave bandwidth and fs,N center spatial frequency, separated by 2 octaves, was added to one eye's grating. Each noise band was tested in the entire range of available contrasts for which sinewave location judgements were feasible. As in experiment 1, four displays determined a condition. Results. Because the contrast of the sine waves being judged was identical for both eyes, we expect both eyes to make equal contributions to binocular combination. The counterintuitive result is that adding a random noise to one eye's sinewave grating causes that grating to dominate the combination. The domination increases as the contrast of the noise increases. A logical process would suggest that noisy stimuli should be ignored, not preferred. The results of experiment 2 are easily understood in terms of model 3: random noise contributes to the TCE (Fig. 1c) that gain-controls the competing eye's contribution to the cyclopean image (Fig. 1d). The relative effectiveness to gain control of each bandpass noise is described by b(fs, N) (Fig. 1c), which can then be estimated by fitting model 3 to the experimental data. Fig. 4 shows the spatial frequency weights b(fs, N) for one observer. Note that bandpass noise is maximally effective in gain controlling the 0.68-cpd sine wave when it is four times that spatial frequency, fs,N = 2.72 cpd. An alternative interpretation suggested by the masking data of Yang and Blake (4) is that 3 cpd is centered in a particularly effective spatial frequency range for stereo masking. Further Experiments to Refine the Model. To further investigate the effect of spatial frequency, temporal frequency, and spatial orientation on binocular combination, three additional experiments used superimposed sine waves as masking stimuli (as opposed to superimposed masking noise as in experiment 2). A fourth experiment investigated the effect of exposure duration. Again, adding a masking sine wave to one eye's stimulus causes it to dominate the combination; domination increases as the masking contrast increases. The spatial frequency modulation transfer function for sine waves is similar to that in Fig. 4 for bandpass noise. Both added noise and added sine waves are maximally effective at 2.72 cpd, four times the frequency whose phase is being judged. We also conducted an experiment in which exposure duration was varied to study the temporal filter (TF in Fig. 1) in the gain control path. The stimuli were identical to those in Experiment 1 except that the stimulus exposure duration took the values 50, 100, 200, 400, and 1,000 ms instead of being fixed at 1,000 ms. As stimulus duration increases from 50 to 1,000 ms, contrast energy increases. At shortest duration (50 ms), binocular combination is well approximated by model 1, linear addition. As duration increases, binocular combination becomes increasingly nonlinear. Model 3 gives good fit to all these data by placing a temporal filter with an overall time constant of ≈110 ms in the gain-control path. (This filter was achieved as a Gamma function equivalent to five stages of exponential decay each with time constant 50 ms.) Further experiments investigated the orientation tuning function when a masking sinewave grating had an angle relative to the grating being judged. The orientation tuning function showed that vertical and horizontal mask gratings were equally potent in terms of gain controlling the signal in the opposing eye, and both were somewhat more effective than diagonal gratings. That there is a difference in gain control between gratings at different orientations means that the gain control is at least in part determined by orientation-specific processes. Because neurons in the lateral geniculate nucleus (LGN) are essentially indifferent to orientation, this means that some of the gain control is of cortical origin, i.e., arises beyond the LGN. | |||||||||||
Discussion Disclaimer. The stimuli used to judge binocular combination in this experiment were 0.68 cpd. This relatively low spatial frequency was used because the accuracy of judging the phase of a sinewave grating decreases in inverse proportion to its frequency. We do not know to what extent the properties observed in the spatial frequency channel centered at 0.68 cpd apply to other spatial frequency channels. Also, although we investigated how different spatial frequencies exert gain control on the 0.68 cpd signal, we did not study how correlated signals in different spatial frequencies combine. However, within the spatial frequency band studied, the gain-control model has some interesting properties and makes some counterintuitive predictions that we consider below. At High Contrast, the Model's Output only Depends on the Contrast Ratio. For superthreshold stimuli εL(IL) 1 and εR(IR) 1, the 1's in the numerator and denominator of Eq. 6 become insignificant, yielding
Contrast-Weighted Summation for High-Contrast Sinewave Gratings. Consider sinewave gratings, such as those in experiment 1. Let the contrast modulation amplitudes bmL and bmR of the gratings presented to the left and right eyes be sufficiently high that bmL 1 and bmR 1. Eq. 13 (see also Eq. 15) becomes
Linear Brightness Summation at Low Contrast and for Ganzfelds. As the contrast energy, εL(IL) and εR(IR), of input images is reduced, the gain–control model asymptotically approaches arithmetic summation, i.e., model 1. Model 3 reduces to model 1 (arithmetic stimulus summation) whenever there is negligible contrast energy for mutual inhibition. This is the case not only for near-threshold stimuli but also in Ganzfelds with quite intense stimuli. In a Ganzfeld, the entire visual field is covered with a uniform light intensity. A Ganzfeld has no contours, and therefore, zero contrast energy ε. When the two eyes are presented with two identical Ganzfeld stimuli (6), binocular brightness increases monotonically with monocular brightness increasing from weak to strong. The perceived binocular brightness is simply the sum of the monocular brightnesses, as predicted by model 3. Summation of Unequal Interocular Contrasts: Binocular Isocontrast Contours. In our binocular combination experiments, we measured only the phase, not the amplitude, of the cyclopean sine wave. To determine how well model 3 can predict amplitude as well as phase, we rely on an abundance of published data concerning the perceived brightnesses and contrasts of cyclopean images. Here we consider interocular sinewave stimuli of unequal contrast (as in our experiments). Let the stimuli to the left and right eyes, respectively, be IL = mL sin x and IR = mR sin x, which yield the corresponding contrast energies for gain control and . Let be the perceived contrast of the cyclopean sinusoidal grating when the above two sinusoidal gratings, IL and IR, are presented to two eyes. From Eq. 6, we have
In Fechner's Paradox, one eye is presented a stimulus of moderate luminance, and the other is presented a zero-luminance stimulus. As the luminance of the zero-contrast stimulus is increased, cyclopean brightness decreases. Fechner's Paradox in binocular brightness combination occurs in ordinary stimuli such as discs but not in Ganzfelds (6). Fechner's Paradox also occurs in judgments of contrast matching in binocularly viewed sine waves (5). Model 3, which predicts simple summation for Ganzfelds (because they produce no interocular contrast energy for inhibition, ε) also makes quite accurate predictions of Fechner's Paradox for sine waves (because of their large ε). Rivalry, Higher-Order Binocular Phenomena. Up to this point, we have dealt with “compatible” stimuli in the left and right eyes that can be binocularly combined: in our experiments, two parallel sine waves that differ in phase by at most 135°, in other experiments, disks of the same size but of different brightnesses, and so on. However, suppose the stimuli in the two eyes are incompatible, i.e., they cannot be interocularly combined, such as sine waves 180° out of phase (one is the negative of the other) or perpendicular sine waves. Model 3 makes a prediction of the relative strength of the left- and right-eye stimuli in a combination process except that, for incompatible stimuli, the combination process is not addition but a binary choice that admits only one or the other to further processing, i.e., rivalry. In the case of rivalry, model 3 is interpreted as making a prediction of the relative proportions of times that each eye's stimulus is dominant, i.e., admitted to further processing, as opposed to the present case, where model 3 determines the proportion of the cyclopean image that is determined by each eye. Dealing with incompatible binocular stimuli is inherently more complex than dealing with compatible stimuli and is beyond the scope of the present treatment. Also beyond the scope of the present treatment are “higher-order” binocular interactions that involve global considerations, such as the perception of one part of a stimulus influencing how another part is perceived, top-down effects of attention, and similar instances where complex interpretations of the visual stimulus influence ocular dominance (e.g., ref. 9). | |||||||||||
Conclusion Model 3 is a simple, robust, physiologically plausible model that accurately describes an early stage of binocular combination. | |||||||||||
Appendix: Computation of Total Visually Weighted Contrast Energy (TCE) Fig. 1c illustrates the computation of visually weighted TCE for the left eye. Let IL be the input image to the left eye and IL,i be the output of the temporal filter hL,i(t) within the ith spatial frequency-and-orientation channel gL,i(x, y). We have
| |||||||||||
Notes Conflict of interest statement: No conflicts declared. Abbreviations: TCE, Total Visually Weighted Contrast Energy; cpd, cycles per degree. | |||||||||||
Footnotes ¶In Eq. 6, terms representing contrast gain control appear in both numerator and denominator. In this respect, it is similar to Grossberg and Kelly's (3) different and more complex equation 7 (p. 3804) proposed to describe binocular brightness perception. | |||||||||||
References 1. 2. 3. 4. 5. 6. 7. Levelt, W. J. M. (1965) On Binocular Rivalry (Institute for Perception RVOTNO, Soesterberg, The Netherlands). 8. 9. Blake, R. (2003) in The Visual Neurosciences, eds. Chalupa, L. M. & Warner, J. (MIT Press, Cambridge, MA). | |||||||||||