| |
| Proc Natl Acad Sci U S A. 2002 January 8; 99(1): 479–482. Published online 2001 December 26. doi: 10.1073/pnas.012361999. | PMCID: PMC117585 |
Copyright © 2002, The National Academy of Sciences Neurobiology Mice and humans perceive multiharmonic communication sounds in
the same way Günter Ehret *† and Sabine Riecke ‡§*Department of Neurobiology, University of Ulm, D-89069 Ulm,
Germany; and ‡Department of Biology, University of
Konstanz, D-78457 Konstanz, Germany Received July 16, 2001. |
Abstract Vowels and voiced consonants of human speech and most mammalian
vocalizations consist of harmonically structured sounds. The
frequency contours of formants in the sounds determine their spectral
shape and timbre and carry, in human speech, important phonetic and
prosodic information to be communicated. Steady-state partitions of
vowels are discriminated and identified mainly on the basis of
harmonics or formants having been resolved by the critical-band filters
of the auditory system and then grouped together. Speech-analog
processing and perception of vowel-like communication sounds in
mammalian vocal repertoires has not been demonstrated so far. Here, we
synthesize 11 call models and a tape loop with natural wriggling calls
of mouse pups and show that house mice perceive this communication call
in the same way as we perceive speech vowels: they need the presence of
a minimum number of formants (three formants—in this case, at 3.8 +
7.6 + 11.4 kHz), they resolve formants by the critical-band mechanism,
group formants together for call identification, perceive the formant
structure rather continuously, may detect the missing fundamental of a
harmonic complex, and all of these occur in a natural communication
situation without any training or behavioral constraints. Thus,
wriggling-call perception in mice is comparable with unconditioned
vowel discrimination and perception in prelinguistic human infants and
points to evolutionary old rules of handling speech sounds in the human
auditory system up to the perceptual level. Keywords: auditory perception‖evolution of speech
perception‖formant filtering and grouping‖mouse‖sound
communication |
Mouse pups produce so-called
wriggling calls when struggling in the nest, mainly when pushing for
the teats during suckling by the mother ( 1). These calls release three
types of maternal behavior: namely, licking of pups, changes of
suckling position, and nest building, and thus show that they are
important communication sounds ( 2). Wriggling calls usually consist of
a fundamental frequency near 4 kHz and several (a minimum of two)
overtones reaching to a maximum frequency of about 20 kHz, if overtones
within a 30-dB range are considered (Fig.
1). This basic harmonic structure may be
modified by frequency modulations of the harmonics and rapid amplitude
modulations leading to side bands of the harmonics or to rather noisy
partitions of the calls (Fig. 1).
| Figure 1 Examples of natural wriggling calls of mouse pups aged 1–5 days. The
spectrograms indicate frequency contours and relative intensities of
the frequency components. |
Mouse calls are structurally similar to vocalizations of many mammals
( 3), including cries and other nonverbal sounds of humans, especially
infants ( 4). Because little is known about the perception of these
types of vocalizations, we will compare the perceptual properties of
wriggling calls with the perception of vowels of human speech that also
have a frequency structure similar to that of wriggling calls of mice
( 5, 6). We hypothesize that mice perceive the wriggling calls in the
frequency domain by following the same rules as humans in analyzing,
identifying, and grouping formants together to a vowel percept ( 7– 9).
The term “formant”, which defines frequency contours of increased
intensity (resonance frequencies of the vocal tract) in human speech
vocalizations, will be adapted here to the main frequency components of
a mouse call. To determine the frequency structure in the wriggling calls necessary
and sufficient to be perceived as a relevant stimulus by the mothers,
we synthesized 11 wriggling-call models (Fig.
2) and prepared a tape loop with natural
wriggling calls, all to be played back to the undisturbed mothers in a
nursing situation. The wriggling-call models were designed to test the
perceptual significance of the basic harmonic structure of the natural
calls, the number of their formants, resolved vs. nonresolved formants,
the pitch produced by the harmonics, and the frequency range to be
covered. Because the mother's own pups produce wriggling calls as
well, we are in the fortunate situation to be able to calibrate
separately for every observation period the mother's response rate to
wriggling-call models or wriggling calls from the tape loop to her
response rate to the calls of live pups. Thus, our results reflect, as
do psychoacoustic measurements in humans, the analysis and processing
of complex frequency information in the auditory system up to the
perceptual level. To our knowledge, they are the first behavioral tests
to show rules of grouping together formants of a mammalian
communication call for the perception of its acoustical Gestalt.
| Figure 2Diagrams (frequency vs. time) of the frequency structure of the 11
synthesized wriggling calls (A–K) used
as stimuli. The 12th stimulus (L) consists of natural
wriggling calls from a tape loop. An example of such a call with three
main frequency (more ...) |
|
Materials and Methods Animals. Sixty primiparous lactating mice (Mus domesticus, outbred
strain NMRI), aged 9–12 weeks with their 1- to 5-day-old pups (litters
standardized to 14 pups), were housed in plastic cages (26.5 ×
20 × 14 cm) at 22°C and a 12-h light/12-h dark cycle
(light on at 7 h). Food and water were available at libitum. Recording and Synthesis of Call Models. Natural wriggling calls (Fig. 1) of 1- to 5-day-old pups were recorded
(condenser microphone 4133, measuring amplifier 2602, both from
Brüel & Kjaer Instruments, Marlborough, MA), filtered (Rockland
852, bandpass 2–30 kHz, 48 dB/octave; Rockland, Gilbertsville,
PA), stored on tape (Phillips Analog 714, Philips Electronic
Instruments, Mahwah, NJ), and an endless tape was prepared (Phillips
Analog 7, 38 cm/s). Playback was through a filter (Rockland 852,
bandpass 2–30 kHz) and power amplifier (Exact 170) to the loudspeaker
(Dynaudio D28, Dynaudio, Bensenville, IL). Artificial wriggling calls
were synthesized from sine waves of known frequency (three oscillators:
Exact 129, Wavetek 130, Wavetek 142, and counter) and from band-passed
white noise (General Radio 1390B, GenRad, Ismaning, Germany;
Rockland 852 and Kemo VBF/8 filters in series with a total of 96
dB/octave slopes). Signals were passed through a four-channel
adder (all outputs at initial zero phase) triggered by an electronic
switch, which formed bursts of 100-ms duration, including 5-ms rise and
fall times, and 200-ms interburst intervals. The bursts could be
sinusoidally amplitude-modulated (modulation frequency 1 kHz,
modulation depth 50%; Exact 129). Amplitude-modulated signals were
band-passed through a filter (Krohn-Hite 3500, 24 dB/octave) set
to the lowest sine wave minus 1 kHz and the highest sine wave plus 1
kHz, respectively, of the signal. The nonmodulated or modulated signals
were attenuated (Hewlett-Packard 350D), amplified (Exact 170), and sent
to the speaker. The frequency structure of the 11 synthesized calls is
shown in Fig. 2. Playback of Call Models. Stimuli were presented in a sound-proof room under dim red light
between 9–12 h and 14–19 h. After having given birth, the mother,
with her litter, was placed in a cage with a circular hole (9-cm
diameter) covered with a fine polyamide gauze in the center of
its bottom. Wood shavings served as nest material. The cage with mother
and litter was suspended in the room about 30 min before the
observation started. The cover grid of the cage was removed and its
height increased by a 6-cm-high plastic head-piece. The loudspeaker was
fixed independently about 1 cm underneath the hole of the cage. The
speaker had a flat ± 6-dB frequency spectrum (Nicolet 466A
spectrum analyser) between 3–19 kHz, measured in the cage. Sounds were
presented at a 70-dB total sound pressure level (SPL) (relative to 20
μPa) at the nest area of the cage (Brüel & Kjaer Instruments
4133 plus 2606). In multiformant calls, each formant had the same
level, all adding up to 70 dB SPL. Most natural wriggling calls of pups
are heard by the mother at about 70 dB SPL ( 1). Artificial wriggling
calls were presented as bouts of five sound bursts (bouts of two to
five wriggling calls are most frequently produced by 1- to 5-day-old
pups; S.R., unpublished work). Above the cage, a microphone and a video
camera monitored sounds from the cage (wriggling calls from the litter
and playback sounds) and the behavior of the mother for later analysis. Recording and Analysis of Maternal Behavior. Observations were made only while the mother was in a nursing position
on her litter. In a 45-min observation period, about 50 bouts of 1 of
the 11 types of artificial wriggling calls or about 50 bouts each
consisting of five natural calls from the tape loop were played back at
intervals of 20–120 s. The mother responded not only to the sounds
from the loudspeaker but also to wriggling calls of the litter. While
the pups were vocalizing, synthesized sounds were not presented. In the
video tapes, maternal responses to natural or synthesized sounds were
noted if the mother responded within 3 s after the onset of the
sounds with either “licking of pups”, “changing nursing
position”, or “nest building” ( 2). Nonresponses of the mother
and the number of bouts of wriggling calls produced by the litter also
were noted. When the litter produced wriggling calls just when a
playback of a bout of synthesized calls had been started, the response
of the mother was not considered. For every 45-min observation period,
a quality coefficient (Q) indicated the response to a given playback
signal relative to the response to wriggling calls of the litter.
Explicitly, Q was calculated as the ratio of the number of responses to
the bouts of the playback signal (A) and the number of bouts of the
signal played back (B) divided by the ratio of the number of responses
to bouts of wriggling calls from the litter (C) and the number of bouts
produced by the litter (D), or Q = AD/BC. Five mothers
having 1-, 2-, 3-, 4-, or 5-day-old pups were tested with a given
signal type, and individual Q values were calculated. Each mother was
tested only once. |
Results and Discussion General Responsiveness to Natural Calls and Call Models. Fig. 3 shows a significant increase in
the average number of bouts of wriggling calls produced by live pups of
increasing age (1–5 days old) in the 45-min observation period
(regression analysis, correlation coefficient r =
0.564; P < 0.001, two-tailed, n = 60).
The average number of responses to the pup calls, however, increased
only weakly with the age of the pups ( r = 0.308,
P < 0.05, two-tailed, n = 60), so that
the average percent responded pup calls decreases with increasing age
of the pups ( r = −0.333, P < 0.01,
two-tailed, n = 60). Fig. 3 also shows that the average
number of spontaneous maternal actions decreases significantly with
increasing pup age ( r = −0.510, P <
0.001, two-tailed, n = 60). Together, these data
indicate that the mothers' motivation to act maternally decreases with
increasing age of the pups. This motivational decrease is
counterbalanced by a higher calling rate of older pups, so that the
average number of maternal acts remains rather constant over the first
5 days in the life of the pups. A compensation of decreasing
postparturient maternal motivation by increasing efforts of the young
to keep the rate of maternal behavior high is common to systems of
instinctive regulation of the amount of maternal care ( 10, 11).
| Figure 3(Ordinate, Left) , number of bouts of
wriggling calls produced by the mothers' own pups; ,
number of maternal responses to the bouts of wriggling calls of their
own pups; , spontaneous maternal acts of the mothers.
(Means (more ...) |
The maternal responsiveness to natural wriggling calls or call models
and the spontaneous maternal actions did not habituate (systematically
decrease) over the 45-min observation period. However, the
responsiveness to the call models and to the natural calls of the live
pups produced in the observation periods for the respective call models
were rather variable. Bouts of wriggling calls of live pups elicited
maternal responses an average of 39–67% of the cases, with SDs up to
50% of the means. In addition, the proportions of responses vs.
nonresponses to calls were unhomogeneous both among the call models and
among the natural calls (χ2 contingency
analysis, P < 0.001, two-tailed, 12° of freedom in
each case). Upon this background of variability of maternal motivation
to respond to wriggling calls of own live pups, a valid ranking of the
responsiveness to call models according to their specific acoustic
properties needs a calibration procedure to eliminate motivation as a
variable. For this purpose, we calculated the quality coefficient Q
(see above). Ranking of Call Models. Average Q values for all 12 playback signals are shown in Fig.
4. Three call models (Fig. 2
A, D, and K) have very little
effectiveness in releasing maternal behavior. Their Q values do not
differ significantly (Kruskal-Wallis H-test analysis of variance,
P > 0.2). The Q values of the next six call
models (Fig. 2 B, C, F– H,
and J) do not differ significantly among each other (H-test,
P > 0.2). The releasing capability of these call
models remains below 50% of the effectiveness of the natural calls of
live pups (Q = 1). The Q values of the first nine call models
( A, D, K, B, C, F, G, H, J) differ significantly (H-test,
P < 0.001), and the Q values of call models A,
D, K are all significantly different from all of the Q values of
call models B, C, F, G, H, J (U test, at least
P < 0.05, two-tailed). That is, the call models
B, C, F, G, H, and J are significantly more
effective in releasing maternal behavior than the call models A,
D, and K. Only two wriggling-call models (Fig. 2
E and I) and natural calls from tape release
maternal behavior at a rate of more than 75%, compared with the calls
of live pups. The Q values from these three signals do not differ
significantly (H-test, P > 0.1), but they are
different from Q values from all of the other call models (U test, at
least P < 0.05, two-tailed). Thus, we can state that a
sufficient spectral condition for wriggling-call perception by mouse
mothers is a structure of three harmonically related frequencies, the
first three formants in the calls, or 3.8 + 7.6 + 11.4 kHz.
| Figure 4Mean values of the quality coefficient (Q) expressing the relative
effectiveness of the stimuli (wriggling-call models as shown in Fig. 2
and natural calls from tape) to release maternal response behavior. SDs
are presented unilaterally for clarity. (more ...) |
How can this result be explained by mechanisms of analysis in the
auditory system, and how does it relate to human vowel perception?
First, spectral energy in the lower compared with the higher part of
the frequency range of natural wriggling calls (Fig. 1) is more
important for call perception. The mouse has its best hearing range
between 15 and 20 kHz ( 12) and, therefore, perceives the high-frequency
noise band (12–20 kHz) better than the low-frequency noise (3–12
kHz). However, the mice responded significantly better to the
low-frequency noise compared with the high-frequency noise (Fig. 4)
and, thus, indicated their preference for low-frequency spectral
energy. Similarly, for identification of most vowels in human speech,
the low-frequency formants below our best hearing range of 2–5 kHz
( 12) have been shown to be most important ( 6, 13). Second, critical band filters determined psychophysically ( 14) or
neurophysiologically ( 15) express the ability of the auditory system to
resolve frequency components in a sound. In mice of the same strain,
they have widths close to or slightly below 4 kHz in the frequency
range of 5–10 kHz ( 16), so that the formants of the call models
A– E and I (Figs. 2 and 4) can just be spectrally
resolved. The formants in the 4 + 6 + 8 kHz model and the first two
formants in the nonharmonic call (Figs. 2 and 4, H) are too
close together to be resolved. Hence, like in human vowel perception,
only resolved formants can be grouped together to produce an optimum
vowel percept ( 9, 17, 18). The mechanisms necessary for grouping
resolved formants together exist in combination-sensitive neurons
showing spectral facilitation in their responses. Combination-sensitive
neurons are created in the auditory midbrain ( 15, 19) and can also be
observed in the auditory cortex ( 20, 21). Third, at least three resolved formants in the low-frequency range (3.8
+ 7.6 + 11.4 kHz; Fig. 4) of the wriggling calls are necessary for
nearly optimum call perception. One formant alone is very ineffective
(6 kHz, Fig. 4); two formants together (3.8 + 7.6 kHz or 3.8 + 11.4
kHz) are significantly more effective in releasing maternal behavior,
however, only if the fundamental frequency (3.8 kHz) is present,
because 7.6 + 11.4 kHz (Fig. 4) are as ineffective as one formant
alone. Thus, the first of the three formants, which is also the
fundamental of the harmonic complex (3.8 + 7.6 + 11.4 kHz), is of
special importance. This finding is similar to speech vowel
perception in humans ( 6, 13) and dogs ( 22). Fourth, the significantly increased responsiveness to the three
high-frequency harmonics (11.4 + 15.2 + 19 kHz) compared with the
two-harmonic complex (7.6 + 11.4 kHz; Fig. 4) suggests that our mice
perceived the pitch of the missing fundamental of 3.8 kHz (virtual
pitch). The more harmonics that are present, the stronger is the pitch
( 23). Hearing the 3.8-kHz pitch in the 11.4 + 15.2 + 19 kHz complex
would explain the very similar perception of this complex compared with
3.8 + 11.4 kHz (Fig. 4). Virtual pitch perception in mice would be the
second demonstration of this phenomenon in a mammal ( 24) for
frequencies above the existence range reported for humans (below about
2 kHz; ref. 25). Fifth, vowel transitions in human speech are perceived rather
continuously (not categorically), and there are no perfect spectral
boundaries for vowel classification ( 26, 27). From our present results,
a similar strategy may be predicted for the perception of the four
mouse calls differing in formant structure: the wriggling calls of
pups, pain or rough handling sounds of pups, distress sounds of adults,
and defensive calls of nonreceptive females ( 1, 28, 29). Here, we show
that wriggling-call models are not just categorized into relevant and
irrelevant sounds according to their frequency structure, but they are
perceived continuously better as the number of the resolved formants is
increased from one (call model A) to two (call models
B and C) to three (call models E and
I; Fig. 4). Sixth, the somewhat (but not statistically significant) better response
to natural calls from tape and to the amplitude-modulated three-formant
call (3.8 + 7.6 + 11.4 kHz + AM.; Fig. 4) compared with that to the
same three-formant call without modulation suggests, as in humans ( 5),
that more natural sounding stimuli provide a better basis for
perception than plain synthesized stimuli, with just the minimum
structure for carrying the message. A comparison of wriggling-call spectra (Fig. 1) with the spectral
condition for wriggling-call perception (Fig. 4) indicates that most
calls (Fig. 1 B– D, and F) carry the
decisive features for releasing maternal behavior, and thus must be
effective in communication. Thus, like in human vowel production and
perception, auditory mechanisms in mice acting as a “harmonic
sieve” on the resolved sound spectrum ( 9, 30) may ensure an
automatic normalization of vowels from different speakers into a
formant reference frame for perception ( 31). The automatism in formant
resolution, grouping, and vowel perception is stressed by the fact that
our mice were in no way trained or conditioned to perceive the
wriggling-call models. All of the behavior the animals demonstrated in
the tests followed from their natural tendency to respond with
different rates to the call models. Thus, the natural discrimination of
communication sound models in mice on the basis of their formant
structure is comparable to the unconditioned vowel discrimination in 4-
to 17-week-old human infants ( 32). Our data, together with others on categorical perception ( 33, 34) and
left-hemisphere dominance of communication call perception in mice
( 35), point to the same mechanisms for the analysis and perception of
communication sounds in mice and humans, with the consequence that the
handling of speech sounds in the mammalian auditory system up to the
perceptual level follows evolutionary old rules. |
Acknowledgments This work has been supported by the Deutsche
Forschungsgemeinschaft, Eh 53/8 and 17-1. |
|
References 1. Ehret, G. Behaviour. 1975;52:38–56. 2. Ehret, G; Bernecker, C. Anim Behav. 1986;34:821–830. 3. Tembrock, G. Akustische Kommunikation bei Säugetieren. Darmstadt, Germany: Wissenschaftl, Buchgesellschaft; 1996. 4. Ostwald, P. Dev Med Child Neurol. 1972;14:350–361. [PubMed]5. Flanagan, J L. Speech Analysis Synthesis and Perception. Berlin: Springer; 1972. 6. Peterson, G E; Barney, H L. J Acoust Soc Am. 1952;24:175–184. 7. Plomp, R. J Acoust Soc Am. 1964;36:1628–1636. 8. Plomp, R. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 7–22. 9. Darwin, C J. The Auditory Processing of Speech. From Sounds to Words. Schouten M E H. , editor. Berlin: de Gruyter; 1992. pp. 133–147. 10. Rosenblatt, J S; Siegel, H I. Parental Care in Mammals. Gubernick D J, Klopfer P H. , editors. New York: Plenum; 1981. pp. 13–76. 11. Godfray, H C J. Nature (London). 1995;376:133–138. [PubMed]12. Ehret, G. Naturwissenschaften. 1974;61:506–507. [PubMed]13. Carlson, R; Fant, G; Grantström, B. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 55–82. 14. Scharf, B. Foundations of Modern Auditory Theory, Vol. 1. Tobias J V. , editor. New York: Academic; 1970. pp. 159–202. 15. Ehret, G; Merzenich, M M. Brain Res Revs. 1988;13:139–163. 16. Ehret, G. Biol Cybern. 1976;24:35–42. [PubMed]17. Darwin, C J. J Acoust Soc Am. 1984;76:1636–1647. [PubMed]18. ter Keurs, M; Festen, J M; Plomp, R. The Auditory Processing of Speech. From Sounds to Words. Schouten M E H. , editor. Berlin: de Gruyter; 1992. pp. 283–288. 19. Mittmann, D H; Wenstrup, J J. Hear Res. 1995;90:185–191. [PubMed]20. Suga, N. Auditory Function. Neurobiological Bases of Hearing. Edelmann G M, Gall W E, Cowan W M. , editors. New York: Wiley; 1988. pp. 679–720. 21. Fitzpatrick, D C; Kanwal, J S; Butman, J A; Suga, N. J Neurosci. 1993;13:931–940. [PubMed]22. Baru, A V. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 91–101. 23. Terhardt, E. J Acoust Soc Am. 1974;55:1061–1069. [PubMed]24. Preisler, A; Schmidt, S. Naturwissenschaften. 1995;82:45–47. 25. Ritsma, R J. J Acoust Soc Am. 1967;42:191–198. [PubMed]26. Pisoni, D B. Percept Psychophys. 1973;13:253–260. 27. Centmayer, K. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 143–152. 28. Haack, B; Markl, H; Ehret, G. The Auditory Psychobiology of the Mouse. Willott J F. , editor. Springfield, IL: Thomas; 1983. pp. 57–97. 29. Whitney, G; Nyby, J. The Auditory Psychobiology of the Mouse. Willott J F. , editor. Springfield, IL: Thomas; 1983. pp. 98–129. 30. Cohen, M A; Grossberg, S; Wyse, L L. J Acoust Soc Am. 1995;98:862–879. [PubMed]31. Summerfield, A Q; Haggard, M P. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 115–141. 32. Trehub, S E. Dev Psychol. 1973;9:91–96. 33. Ehret, G. Categorical Perception. The Groundwork of Cognition. Harnad S. , editor. Cambridge, U.K.: Cambridge Univ. Press; 1987. pp. 301–331. 34. Ehret, G. Anim Behav. 1992;43:409–416. 35. Ehret, G. Nature (London). 1987;325:249–251. [PubMed] |
| |