Mice and humans perceive multiharmonic communication sounds in
 the same way

doi:10.1073/pnas.012361999

Journal List > Proc Natl Acad Sci U S A > v.99(1); Jan 8, 2002

Proc Natl Acad Sci U S A. 2002 January 8; 99(1): 479–482.

Published online 2001 December 26. doi: 10.1073/pnas.012361999.

PMCID: PMC117585

Neurobiology

Mice and humans perceive multiharmonic communication sounds in the same way

Günter Ehret^*^† and Sabine Riecke^‡^§

^*Department of Neurobiology, University of Ulm, D-89069 Ulm, Germany; and ^‡Department of Biology, University of Konstanz, D-78457 Konstanz, Germany

^†To whom reprint requests should be addressed. E-mail: guenter.ehret/at/biologie.uni-ulm.de.

^§Present address: Kappishalde 33, D-74199 Untergruppenbach, Germany.

Edited by Michael M. Merzenich, University of California, San Francisco, CA, and approved November 8, 2001

Received July 16, 2001.

This article has been cited by other articles in PMC.

Abstract

Vowels and voiced consonants of human speech and most mammalian vocalizations consist of harmonically structured sounds. The frequency contours of formants in the sounds determine their spectral shape and timbre and carry, in human speech, important phonetic and prosodic information to be communicated. Steady-state partitions of vowels are discriminated and identified mainly on the basis of harmonics or formants having been resolved by the critical-band filters of the auditory system and then grouped together. Speech-analog processing and perception of vowel-like communication sounds in mammalian vocal repertoires has not been demonstrated so far. Here, we synthesize 11 call models and a tape loop with natural wriggling calls of mouse pups and show that house mice perceive this communication call in the same way as we perceive speech vowels: they need the presence of a minimum number of formants (three formants—in this case, at 3.8 + 7.6 + 11.4 kHz), they resolve formants by the critical-band mechanism, group formants together for call identification, perceive the formant structure rather continuously, may detect the missing fundamental of a harmonic complex, and all of these occur in a natural communication situation without any training or behavioral constraints. Thus, wriggling-call perception in mice is comparable with unconditioned vowel discrimination and perception in prelinguistic human infants and points to evolutionary old rules of handling speech sounds in the human auditory system up to the perceptual level.

Keywords: auditory perception‖evolution of speech perception‖formant filtering and grouping‖mouse‖sound communication

Mouse pups produce so-called wriggling calls when struggling in the nest, mainly when pushing for the teats during suckling by the mother (1). These calls release three types of maternal behavior: namely, licking of pups, changes of suckling position, and nest building, and thus show that they are important communication sounds (2). Wriggling calls usually consist of a fundamental frequency near 4 kHz and several (a minimum of two) overtones reaching to a maximum frequency of about 20 kHz, if overtones within a 30-dB range are considered (Fig. 1). This basic harmonic structure may be modified by frequency modulations of the harmonics and rapid amplitude modulations leading to side bands of the harmonics or to rather noisy partitions of the calls (Fig. 1).

Figure 1

Examples of natural wriggling calls of mouse pups aged 1–5 days. The spectrograms indicate frequency contours and relative intensities of the frequency components.

Mouse calls are structurally similar to vocalizations of many mammals (3), including cries and other nonverbal sounds of humans, especially infants (4). Because little is known about the perception of these types of vocalizations, we will compare the perceptual properties of wriggling calls with the perception of vowels of human speech that also have a frequency structure similar to that of wriggling calls of mice (5, 6). We hypothesize that mice perceive the wriggling calls in the frequency domain by following the same rules as humans in analyzing, identifying, and grouping formants together to a vowel percept (7–9). The term “formant”, which defines frequency contours of increased intensity (resonance frequencies of the vocal tract) in human speech vocalizations, will be adapted here to the main frequency components of a mouse call.

To determine the frequency structure in the wriggling calls necessary and sufficient to be perceived as a relevant stimulus by the mothers, we synthesized 11 wriggling-call models (Fig. 2) and prepared a tape loop with natural wriggling calls, all to be played back to the undisturbed mothers in a nursing situation. The wriggling-call models were designed to test the perceptual significance of the basic harmonic structure of the natural calls, the number of their formants, resolved vs. nonresolved formants, the pitch produced by the harmonics, and the frequency range to be covered. Because the mother's own pups produce wriggling calls as well, we are in the fortunate situation to be able to calibrate separately for every observation period the mother's response rate to wriggling-call models or wriggling calls from the tape loop to her response rate to the calls of live pups. Thus, our results reflect, as do psychoacoustic measurements in humans, the analysis and processing of complex frequency information in the auditory system up to the perceptual level. To our knowledge, they are the first behavioral tests to show rules of grouping together formants of a mammalian communication call for the perception of its acoustical Gestalt.

Figure 2

Diagrams (frequency vs. time) of the frequency structure of the 11 synthesized wriggling calls (A–K) used as stimuli. The 12th stimulus (L) consists of natural wriggling calls from a tape loop. An example of such a call with three main frequency (more ...)

Materials and Methods

Animals.

Sixty primiparous lactating mice (Mus domesticus, outbred strain NMRI), aged 9–12 weeks with their 1- to 5-day-old pups (litters standardized to 14 pups), were housed in plastic cages (26.5 × 20 × 14 cm) at 22°C and a 12-h light/12-h dark cycle (light on at 7 h). Food and water were available at libitum.

Recording and Synthesis of Call Models.

Natural wriggling calls (Fig. 1) of 1- to 5-day-old pups were recorded (condenser microphone 4133, measuring amplifier 2602, both from Brüel & Kjaer Instruments, Marlborough, MA), filtered (Rockland 852, bandpass 2–30 kHz, 48 dB/octave; Rockland, Gilbertsville, PA), stored on tape (Phillips Analog 714, Philips Electronic Instruments, Mahwah, NJ), and an endless tape was prepared (Phillips Analog 7, 38 cm/s). Playback was through a filter (Rockland 852, bandpass 2–30 kHz) and power amplifier (Exact 170) to the loudspeaker (Dynaudio D28, Dynaudio, Bensenville, IL). Artificial wriggling calls were synthesized from sine waves of known frequency (three oscillators: Exact 129, Wavetek 130, Wavetek 142, and counter) and from band-passed white noise (General Radio 1390B, GenRad, Ismaning, Germany; Rockland 852 and Kemo VBF/8 filters in series with a total of 96 dB/octave slopes). Signals were passed through a four-channel adder (all outputs at initial zero phase) triggered by an electronic switch, which formed bursts of 100-ms duration, including 5-ms rise and fall times, and 200-ms interburst intervals. The bursts could be sinusoidally amplitude-modulated (modulation frequency 1 kHz, modulation depth 50%; Exact 129). Amplitude-modulated signals were band-passed through a filter (Krohn-Hite 3500, 24 dB/octave) set to the lowest sine wave minus 1 kHz and the highest sine wave plus 1 kHz, respectively, of the signal. The nonmodulated or modulated signals were attenuated (Hewlett-Packard 350D), amplified (Exact 170), and sent to the speaker. The frequency structure of the 11 synthesized calls is shown in Fig. 2.

Playback of Call Models.

Stimuli were presented in a sound-proof room under dim red light between 9–12 h and 14–19 h. After having given birth, the mother, with her litter, was placed in a cage with a circular hole (9-cm diameter) covered with a fine polyamide gauze in the center of its bottom. Wood shavings served as nest material. The cage with mother and litter was suspended in the room about 30 min before the observation started. The cover grid of the cage was removed and its height increased by a 6-cm-high plastic head-piece. The loudspeaker was fixed independently about 1 cm underneath the hole of the cage. The speaker had a flat ± 6-dB frequency spectrum (Nicolet 466A spectrum analyser) between 3–19 kHz, measured in the cage. Sounds were presented at a 70-dB total sound pressure level (SPL) (relative to 20 μPa) at the nest area of the cage (Brüel & Kjaer Instruments 4133 plus 2606). In multiformant calls, each formant had the same level, all adding up to 70 dB SPL. Most natural wriggling calls of pups are heard by the mother at about 70 dB SPL (1). Artificial wriggling calls were presented as bouts of five sound bursts (bouts of two to five wriggling calls are most frequently produced by 1- to 5-day-old pups; S.R., unpublished work). Above the cage, a microphone and a video camera monitored sounds from the cage (wriggling calls from the litter and playback sounds) and the behavior of the mother for later analysis.

Recording and Analysis of Maternal Behavior.

Observations were made only while the mother was in a nursing position on her litter. In a 45-min observation period, about 50 bouts of 1 of the 11 types of artificial wriggling calls or about 50 bouts each consisting of five natural calls from the tape loop were played back at intervals of 20–120 s. The mother responded not only to the sounds from the loudspeaker but also to wriggling calls of the litter. While the pups were vocalizing, synthesized sounds were not presented. In the video tapes, maternal responses to natural or synthesized sounds were noted if the mother responded within 3 s after the onset of the sounds with either “licking of pups”, “changing nursing position”, or “nest building” (2). Nonresponses of the mother and the number of bouts of wriggling calls produced by the litter also were noted. When the litter produced wriggling calls just when a playback of a bout of synthesized calls had been started, the response of the mother was not considered. For every 45-min observation period, a quality coefficient (Q) indicated the response to a given playback signal relative to the response to wriggling calls of the litter. Explicitly, Q was calculated as the ratio of the number of responses to the bouts of the playback signal (A) and the number of bouts of the signal played back (B) divided by the ratio of the number of responses to bouts of wriggling calls from the litter (C) and the number of bouts produced by the litter (D), or Q = AD/BC. Five mothers having 1-, 2-, 3-, 4-, or 5-day-old pups were tested with a given signal type, and individual Q values were calculated. Each mother was tested only once.

Results and Discussion

General Responsiveness to Natural Calls and Call Models.

Fig. 3 shows a significant increase in the average number of bouts of wriggling calls produced by live pups of increasing age (1–5 days old) in the 45-min observation period (regression analysis, correlation coefficient r = 0.564; P < 0.001, two-tailed, n = 60). The average number of responses to the pup calls, however, increased only weakly with the age of the pups (r = 0.308, P < 0.05, two-tailed, n = 60), so that the average percent responded pup calls decreases with increasing age of the pups (r = −0.333, P < 0.01, two-tailed, n = 60). Fig. 3 also shows that the average number of spontaneous maternal actions decreases significantly with increasing pup age (r = −0.510, P < 0.001, two-tailed, n = 60). Together, these data indicate that the mothers' motivation to act maternally decreases with increasing age of the pups. This motivational decrease is counterbalanced by a higher calling rate of older pups, so that the average number of maternal acts remains rather constant over the first 5 days in the life of the pups. A compensation of decreasing postparturient maternal motivation by increasing efforts of the young to keep the rate of maternal behavior high is common to systems of instinctive regulation of the amount of maternal care (10, 11).

Figure 3

(Ordinate, Left) [open triangle]

, number of bouts of wriggling calls produced by the mothers' own pups; [open circle]

, number of maternal responses to the bouts of wriggling calls of their own pups; [down-pointing small open triangle]

, spontaneous maternal acts of the mothers. (Means (more ...)

The maternal responsiveness to natural wriggling calls or call models and the spontaneous maternal actions did not habituate (systematically decrease) over the 45-min observation period. However, the responsiveness to the call models and to the natural calls of the live pups produced in the observation periods for the respective call models were rather variable. Bouts of wriggling calls of live pups elicited maternal responses an average of 39–67% of the cases, with SDs up to 50% of the means. In addition, the proportions of responses vs. nonresponses to calls were unhomogeneous both among the call models and among the natural calls (χ² contingency analysis, P < 0.001, two-tailed, 12° of freedom in each case). Upon this background of variability of maternal motivation to respond to wriggling calls of own live pups, a valid ranking of the responsiveness to call models according to their specific acoustic properties needs a calibration procedure to eliminate motivation as a variable. For this purpose, we calculated the quality coefficient Q (see above).

Ranking of Call Models.

Average Q values for all 12 playback signals are shown in Fig. 4. Three call models (Fig. 2 A, D, and K) have very little effectiveness in releasing maternal behavior. Their Q values do not differ significantly (Kruskal-Wallis H-test analysis of variance, P > 0.2). The Q values of the next six call models (Fig. 2 B, C, F–H, and J) do not differ significantly among each other (H-test, P > 0.2). The releasing capability of these call models remains below 50% of the effectiveness of the natural calls of live pups (Q = 1). The Q values of the first nine call models (A, D, K, B, C, F, G, H, J) differ significantly (H-test, P < 0.001), and the Q values of call models A, D, K are all significantly different from all of the Q values of call models B, C, F, G, H, J (U test, at least P < 0.05, two-tailed). That is, the call models B, C, F, G, H, and J are significantly more effective in releasing maternal behavior than the call models A, D, and K. Only two wriggling-call models (Fig. 2 E and I) and natural calls from tape release maternal behavior at a rate of more than 75%, compared with the calls of live pups. The Q values from these three signals do not differ significantly (H-test, P > 0.1), but they are different from Q values from all of the other call models (U test, at least P < 0.05, two-tailed). Thus, we can state that a sufficient spectral condition for wriggling-call perception by mouse mothers is a structure of three harmonically related frequencies, the first three formants in the calls, or 3.8 + 7.6 + 11.4 kHz.

Figure 4

Mean values of the quality coefficient (Q) expressing the relative effectiveness of the stimuli (wriggling-call models as shown in Fig. 2 and natural calls from tape) to release maternal response behavior. SDs are presented unilaterally for clarity. (more ...)

How can this result be explained by mechanisms of analysis in the auditory system, and how does it relate to human vowel perception? First, spectral energy in the lower compared with the higher part of the frequency range of natural wriggling calls (Fig. 1) is more important for call perception. The mouse has its best hearing range between 15 and 20 kHz (12) and, therefore, perceives the high-frequency noise band (12–20 kHz) better than the low-frequency noise (3–12 kHz). However, the mice responded significantly better to the low-frequency noise compared with the high-frequency noise (Fig. 4) and, thus, indicated their preference for low-frequency spectral energy. Similarly, for identification of most vowels in human speech, the low-frequency formants below our best hearing range of 2–5 kHz (12) have been shown to be most important (6, 13).

Second, critical band filters determined psychophysically (14) or neurophysiologically (15) express the ability of the auditory system to resolve frequency components in a sound. In mice of the same strain, they have widths close to or slightly below 4 kHz in the frequency range of 5–10 kHz (16), so that the formants of the call models A–E and I (Figs. 2 and 4) can just be spectrally resolved. The formants in the 4 + 6 + 8 kHz model and the first two formants in the nonharmonic call (Figs. 2 and 4, H) are too close together to be resolved. Hence, like in human vowel perception, only resolved formants can be grouped together to produce an optimum vowel percept (9, 17, 18). The mechanisms necessary for grouping resolved formants together exist in combination-sensitive neurons showing spectral facilitation in their responses. Combination-sensitive neurons are created in the auditory midbrain (15, 19) and can also be observed in the auditory cortex (20, 21).

Third, at least three resolved formants in the low-frequency range (3.8 + 7.6 + 11.4 kHz; Fig. 4) of the wriggling calls are necessary for nearly optimum call perception. One formant alone is very ineffective (6 kHz, Fig. 4); two formants together (3.8 + 7.6 kHz or 3.8 + 11.4 kHz) are significantly more effective in releasing maternal behavior, however, only if the fundamental frequency (3.8 kHz) is present, because 7.6 + 11.4 kHz (Fig. 4) are as ineffective as one formant alone. Thus, the first of the three formants, which is also the fundamental of the harmonic complex (3.8 + 7.6 + 11.4 kHz), is of special importance. This finding is similar to speech vowel perception in humans (6, 13) and dogs (22).

Fourth, the significantly increased responsiveness to the three high-frequency harmonics (11.4 + 15.2 + 19 kHz) compared with the two-harmonic complex (7.6 + 11.4 kHz; Fig. 4) suggests that our mice perceived the pitch of the missing fundamental of 3.8 kHz (virtual pitch). The more harmonics that are present, the stronger is the pitch (23). Hearing the 3.8-kHz pitch in the 11.4 + 15.2 + 19 kHz complex would explain the very similar perception of this complex compared with 3.8 + 11.4 kHz (Fig. 4). Virtual pitch perception in mice would be the second demonstration of this phenomenon in a mammal (24) for frequencies above the existence range reported for humans (below about 2 kHz; ref. 25).

Fifth, vowel transitions in human speech are perceived rather continuously (not categorically), and there are no perfect spectral boundaries for vowel classification (26, 27). From our present results, a similar strategy may be predicted for the perception of the four mouse calls differing in formant structure: the wriggling calls of pups, pain or rough handling sounds of pups, distress sounds of adults, and defensive calls of nonreceptive females (1, 28, 29). Here, we show that wriggling-call models are not just categorized into relevant and irrelevant sounds according to their frequency structure, but they are perceived continuously better as the number of the resolved formants is increased from one (call model A) to two (call models B and C) to three (call models E and I; Fig. 4).

Sixth, the somewhat (but not statistically significant) better response to natural calls from tape and to the amplitude-modulated three-formant call (3.8 + 7.6 + 11.4 kHz + AM.; Fig. 4) compared with that to the same three-formant call without modulation suggests, as in humans (5), that more natural sounding stimuli provide a better basis for perception than plain synthesized stimuli, with just the minimum structure for carrying the message.

A comparison of wriggling-call spectra (Fig. 1) with the spectral condition for wriggling-call perception (Fig. 4) indicates that most calls (Fig. 1 B–D, and F) carry the decisive features for releasing maternal behavior, and thus must be effective in communication. Thus, like in human vowel production and perception, auditory mechanisms in mice acting as a “harmonic sieve” on the resolved sound spectrum (9, 30) may ensure an automatic normalization of vowels from different speakers into a formant reference frame for perception (31). The automatism in formant resolution, grouping, and vowel perception is stressed by the fact that our mice were in no way trained or conditioned to perceive the wriggling-call models. All of the behavior the animals demonstrated in the tests followed from their natural tendency to respond with different rates to the call models. Thus, the natural discrimination of communication sound models in mice on the basis of their formant structure is comparable to the unconditioned vowel discrimination in 4- to 17-week-old human infants (32).

Our data, together with others on categorical perception (33, 34) and left-hemisphere dominance of communication call perception in mice (35), point to the same mechanisms for the analysis and perception of communication sounds in mice and humans, with the consequence that the handling of speech sounds in the mammalian auditory system up to the perceptual level follows evolutionary old rules.

Acknowledgments

This work has been supported by the Deutsche Forschungsgemeinschaft, Eh 53/8 and 17-1.

Footnotes

This paper was submitted directly (Track II) to the PNAS office.

References

Ehret, G. Behaviour. 1975;52:38–56.

Ehret, G; Bernecker, C. Anim Behav. 1986;34:821–830.

Tembrock, G. Akustische Kommunikation bei Säugetieren. Darmstadt, Germany: Wissenschaftl, Buchgesellschaft; 1996.

Ostwald, P. Dev Med Child Neurol. 1972;14:350–361. [PubMed]

Flanagan, J L. Speech Analysis Synthesis and Perception. Berlin: Springer; 1972.

Peterson, G E; Barney, H L. J Acoust Soc Am. 1952;24:175–184.

Plomp, R. J Acoust Soc Am. 1964;36:1628–1636.

Plomp, R. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 7–22.

Darwin, C J. The Auditory Processing of Speech. From Sounds to Words. Schouten M E H. , editor. Berlin: de Gruyter; 1992. pp. 133–147.

10.

Rosenblatt, J S; Siegel, H I. Parental Care in Mammals. Gubernick D J, Klopfer P H. , editors. New York: Plenum; 1981. pp. 13–76.

11.

Godfray, H C J. Nature (London). 1995;376:133–138. [PubMed]

12.

Ehret, G. Naturwissenschaften. 1974;61:506–507. [PubMed]

13.

Carlson, R; Fant, G; Grantström, B. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 55–82.

14.

Scharf, B. Foundations of Modern Auditory Theory, Vol. 1. Tobias J V. , editor. New York: Academic; 1970. pp. 159–202.

15.

Ehret, G; Merzenich, M M. Brain Res Revs. 1988;13:139–163.

16.

Ehret, G. Biol Cybern. 1976;24:35–42. [PubMed]

17.

Darwin, C J. J Acoust Soc Am. 1984;76:1636–1647. [PubMed]

18.

ter Keurs, M; Festen, J M; Plomp, R. The Auditory Processing of Speech. From Sounds to Words. Schouten M E H. , editor. Berlin: de Gruyter; 1992. pp. 283–288.

19.

Mittmann, D H; Wenstrup, J J. Hear Res. 1995;90:185–191. [PubMed]

20.

Suga, N. Auditory Function. Neurobiological Bases of Hearing. Edelmann G M, Gall W E, Cowan W M. , editors. New York: Wiley; 1988. pp. 679–720.

21.

Fitzpatrick, D C; Kanwal, J S; Butman, J A; Suga, N. J Neurosci. 1993;13:931–940. [PubMed]

22.

Baru, A V. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 91–101.

23.

Terhardt, E. J Acoust Soc Am. 1974;55:1061–1069. [PubMed]

24.

Preisler, A; Schmidt, S. Naturwissenschaften. 1995;82:45–47.

25.

Ritsma, R J. J Acoust Soc Am. 1967;42:191–198. [PubMed]

26.

Pisoni, D B. Percept Psychophys. 1973;13:253–260.

27.

Centmayer, K. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 143–152.

28.

Haack, B; Markl, H; Ehret, G. The Auditory Psychobiology of the Mouse. Willott J F. , editor. Springfield, IL: Thomas; 1983. pp. 57–97.

29.

Whitney, G; Nyby, J. The Auditory Psychobiology of the Mouse. Willott J F. , editor. Springfield, IL: Thomas; 1983. pp. 98–129.

30.

Cohen, M A; Grossberg, S; Wyse, L L. J Acoust Soc Am. 1995;98:862–879. [PubMed]

31.

Summerfield, A Q; Haggard, M P. Auditory Analysis and Perception of Speech. Fant G, Tatham M A A. , editors. London: Academic; 1975. pp. 115–141.

32.

Trehub, S E. Dev Psychol. 1973;9:91–96.

33.

Ehret, G. Categorical Perception. The Groundwork of Cognition. Harnad S. , editor. Cambridge, U.K.: Cambridge Univ. Press; 1987. pp. 301–331.

34.

Ehret, G. Anim Behav. 1992;43:409–416.

35.

Ehret, G. Nature (London). 1987;325:249–251. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of
National Academy of Sciences