SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Back to Entry
  • Entry Contents
  • Entry Bibliography
  • Academic Tools
  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Supplement to Auditory Perception

Speech perception: empirical and theoretical considerations.

What are the objects of speech perception? Speaking involves the production of meaningful streams of sounds. At the physical level, a spectrogram reveals the patterns of frequency and amplitude that ground audible features. The stream sounds like a complex acoustic structure involving patterns of audible qualities over time. The stream, however, auditorily appears to be segmented (speech in an unfamiliar language often seems like an unsegmented stream). The most salient segments are words, the meaningful units. Also discernible in the stream are segments that correspond to something like syllables. These units or segments are not ascribed meaning, but instead combine to form words in a way loosely analogous to the way words combine to form sentences. Even syllables, however, comprise perceptually distinguishable sound types. For instance, though ‘dough’ has one syllable, it includes the sounds of /d/ and /O/ (or /oʊ/). The sound of the one-syllable spoken word ‘bad’ includes /b/, /æ/, and /d/. Those of ‘bat’ and ‘bash’ differ because the former contains /t/ and the latter contains /ʃ/. Such perceptible units, or phonemes , whose patterns form the basis for recognizing and distinguishing words, have been one primary focus of research into speech perception. Phonemes form a sort of “sound alphabet” from which audible words are built (Appelbaum 1999 critiques the “alphabetic” conception).

What is a phoneme? First, consider the universal class of phones , which contains all of the possibly distinguishable types of speech sounds that may mark a semantic difference in some world language. In contrast, phonemes are specific to a particular language. Phonemes also may be understood in terms of equivalence classes of sounds. Phonemes are semantically significant sound types that constitute the spoken words in a given language. The boundaries between phonemes in a language mark sound differences that may be semantically significant for that language.

Phonemes thus may differ across languages. For instance, though certain phonemes are shared, the class of English phonemes differs from that of Japanese. English, for example, distinguishes the [l] and [r] sounds (phones) as distinct phonemes, while Japanese does not. Instead, Japanese treats them as allophones , or variants of a common phoneme. Standard Chinese distinguishes distinct phonemes that correspond to allophones of the single English phoneme /p/ (the aspirated /pʰ/ and unaspirated /p/). It is noteworthy that infants prior to language learning distinguish phones that are later subsumed to a single phonemic equivalence class (see, e.g., Werker 1995, Kuhl 2000 for review and commentary). In addition, certain languages make use of novel sounds, such as clicks, that others do not. So, when compared with each other, distinct languages may differ in which sounds they include or omit among their respective phonemes, and they may differ in which sound pairs they treat as distinct phonemes or as allophonic.

The central puzzle of speech perception is that there is no obvious direct, consistent correspondence between the surface properties of a physical acoustic signal and the phonemes perceived when listening to speech.

This is manifested in a number of ways. Pioneers into speech perception research aimed initially to develop an automated reading machine for the blind that worked by replacing individual letters with specific sounds. The project failed miserably—listeners were unable at the rates of normal speech to resolve the sequence of individual sounds required to detect words (see Liberman 1996).

Most importantly, there is no clear invariant property of a sound signal that corresponds to a given phoneme. What sounds like a single phoneme might have very different acoustic correlates depending not just upon the speaker or the speaker’s mood, but also upon the phonemic context. For instance, /di/ and /du/ audibly share the /d/ phoneme. However, the acoustic signal corresponding to /d/ differs greatly in these cases (see Liberman et al. 1967, 435, fig. 1). While /di/ includes a formant that begins at a higher frequency and rises, /du/ includes a formant that begins at a lower frequency and drops. Acoustically, nothing straightforward in the signal corresponds to the /d/ sound one auditorily experiences in both cases. Two different audible phonemes also might share acoustic correlates, again depending on context. The acoustic signal that corresponds to /p/ is nearly identical to that of /k/ in the contexts /pi/ and /ka/ (Cooper et al. 1952). Prima facie, phonemes thus are not identical with distinctive invariant acoustic structures.

Lack of invariance stems in large part from coarticulation . In contrast to how things seem auditorily, how a speaker articulates a given phoneme depends upon what precedes or follows that phoneme. Being followed by /i/ rather than /u/ impacts how one pronounces /d/, and being preceded by /d/ impacts the vowel. When pronouncing ‘dab’, the effects of pronouncing both /d/ and /b/ are evident in the acoustic signature of /a/. The articulatory consequences of phonemic context change the acoustic features of the signal and confound attempts to map phonemes to signals (which presents the difficulty for artificial speech production and recognition). Furthermore, due to coarticulation, the signal lacks the clear segmentation of categorically perceived phonemes, which have been likened to beads on a string (Bloomfield 1933). In effect, speakers pronounce two or more phonemes at a time, and transitions are fluid rather than discrete (see, e.g., Liberman 1970, 309, fig. 5, Diehl et al. 2004).

One response to this, compatible with realism about perceptible phonological features, is to search for more complex acoustic structures or to higher-order acoustical properties that correspond to apparent phonemes (see, e.g., Blumstein and Stevens 1981, Diehl et al. 2004, Holt and Lotto 2008 for the general auditory approach). On the other hand, some philosophers instead conclude that phonological features are mere intentional objects, or ‘intentional inexistents’ (see Rey 2012). Pautz (2017, 27–28), for instance, maintains that differences in acoustical features cannot account for apparent categorical differences between phonemes.

Another type of realist approach appeals to aspects of the gestures used to pronounce phonemes—ways of moving one’s throat and mouth and tongue—which are reasonably invariant across contexts. For instance, pronouncing /d/ involves placing the tip of the tongue on the alveolar ridge directly behind the teeth. The alveolar consonants /d/ and /t/ differ from each other in being voiced , or accompanied by vocal fold movement. Whether you say /di/ or /du/, your tongue touches the alveolar ridge and you voice the consonant. But, while you articulate the gestures associated with /d/, you anticipate and begin to articulate those associated with /i/ or /u/. This alters the overall acoustic signature of the gestures associated with /d/. Gestures, rather than the complex acoustic signals they produce, on this view make intelligible the perceptual individuation of phonemes. Some therefore hold that perceiving phonemes involves recovering information about articulatory gestures from the acoustic signal. The motor theory (Liberman et al. 1967, Liberman and Mattingly 1985) and direct realism (Fowler 1986) are very different versions of this approach. Articulatory gestures thus make plausible candidates for objects of phoneme perception. They are, however, imperfect candidates, since they do not entirely escape worries about the context dependence and lack of discrete segmentation stemming from fluid coarticulation (Appelbaum 1996, Remez and Trout 2009).

Nonetheless, the claim is supported by the surprising finding that visual processes impact the auditory experience of speech. For instance, the McGurk effect includes one instance in which seeing video of a speaker pronouncing /ga/ dubbed with audio of /ba/ leads to hearing as of the /da/ phoneme (McGurk and Macdonald 1976). If perceiving speech involves perceiving gestures, it is not surprising that the visual evidence for articulatory gestures should be weighed against auditory evidence.

Some researchers who hold that intended or actual gestures are the best candidates for the objects of phoneme perception argue that speech perception therefore is special. That is, speech perception’s objects differ in kind from the sounds and acoustic structures we hear in general audition (Liberman et al. 1967, Liberman and Mattingly 1985). Liberman and Mattingly (1985), furthermore, use the claim that audition has distinctive objects to motivate the claim that speech perception therefore involves distinctive perceptual processes . They even argue that although speech perception shares an end organ with auditory perception, it constitutes a functionally distinct modular perceptual system (Liberman and Mattingly 1985, 7–10, 27–30, see also 1989). Part of the motivation for their motor theory of speech perception, against auditory theories, is to integrate explanations of speech perception and speech production (1985, 23–5, 30–1, see also Matthen 2005, ch 9, which uses the Motor Theory to support a Codependency Thesis linking the capacities to perceive and produce phonemes, 221). On this account, a single modular system is responsible for both the production and perception of speech. This purported link between capacities for production and perception suggests that humans are unique in possessing a speech perception system. Humans, but not other creatures, are capable of discerning speech for many of the same reasons they are capable of producing the articulatory gestures that correspond to perceived phonemes. Other animals presumably hear just sounds (Liberman et al. 1967, Liberman and Mattingly 1985).

One might accept that perceived phonemes should be identified with articulatory gestures but reject that this makes speech special (see, e.g., Fowler 1986, Mole 2009). If auditory perception generally implicates environmental happenings or sound sources, then the gestures and activities associated with speech production are not entirely distinctive among objects of audition. If hearing even sounds is not merely a matter of hearing features of acoustic signals or structures, and if it is part of the function of auditory perception to furnish information about distal events on the basis of their audible characteristics, then speech is not entirely unique among things we hear (see also Rosenbaum 2004, O’Callaghan 2015).

The processes associated with speech perception therefore need not be understood as entirely distinct in function or in kind from those devoted to general audition, as Liberman and Mattingly contend. Given this, it is not surprising to learn that good evidence suggests humans are not special in possessing the capacity to perceptually individuate the sounds of speech (see, e.g., Lotto et al. 1997 for details).

The processes associated with speech need not be entirely continuous with those of general audition. The overall claim is compatible with higher acuity or sensitivity for speech sounds, and it allows for special selectivity for speech sounds. Even if hearing speech marshals perceptual resources continuous with those devoted to hearing other sounds and events in one’s environment, it would be very surprising to discover that there were not processes and resources devoted to the perception of speech. Research in fact supports a special status for speech among the things we auditorily perceive. First, evidence suggests that human neonates prefer sounds of speech to non-speech (Vouloumanos and Werker 2007). Second, adults are able to distinguish speech from non-speech based on visual cues alone (Soto-Faraco et al. 2007). Third, infants can detect and distinguish different languages auditorily (Mehler et al. 1988, Bosch et al. 1997). Finally, infants aged approximately 4–6 months can detect, based on visual cues alone, when a speaker changes from one language to another, though all but those in bilingual households lose that ability by roughly 8 months (Weikum et al. 2007).

To review, no obvious acoustic correlates exist for phonetic segments heard in speech. Complex acoustic cues therefore must trigger perceptual experiences of phonemes. Articulatory gestures, however, are good (though imperfect) candidates for objects of speech perception. This does not imply that speech perception involves entirely different kinds of objects or processes from ordinary non-linguistic audition, nor does it imply that speech perception is a uniquely human capacity. Nevertheless, speech clearly is special for humans, in that we have special sensitivity for speech sounds. Speech perception promises to reward additional philosophical attention (see O’Callaghan 2015 for further development).

Copyright © 2020 by Casey O’Callaghan < casey . ocallaghan @ wustl . edu >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature
  • Anthropology
  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies

Linguistics

  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Speech Perception

Introduction, general overviews.

  • Edited Collections
  • Acoustic Variation
  • Theoretical Approaches
  • Categorical Perception and Category Structure
  • Perceptual Integration of Phonetic Cues
  • Lexical Influences
  • Vowel Perception
  • Experiential Influences on Adult Listeners
  • Developmental Speech Perception
  • Sound Change
  • Phonological Similarity and Contrast
  • Perception-Sociolinguistics Relation

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Acoustic Phonetics
  • Articulatory Phonetics
  • Consonant Epenthesis
  • Contrastive Analysis in Linguistics
  • Cross-Language Speech Perception and Production
  • Dementia and Language
  • Dialectology
  • Early Child Phonology
  • Interface Between Phonology and Phonetics
  • Machine Translation
  • Psycholinguistic Methodology in Phonological Research
  • Second Language Listening
  • Second-Language Reading
  • Sociolinguistic Fieldwork
  • Speech Production
  • Speech Synthesis
  • Spoken Word Recognition
  • Voice and Voice Quality
  • William Labov

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Cognitive Grammar
  • Edward Sapir
  • Teaching Pragmatics
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Speech Perception by Patrice Speeter Beddor LAST REVIEWED: 15 November 2022 LAST MODIFIED: 19 March 2013 DOI: 10.1093/obo/9780199772810-0089

Speech perception as an experimental discipline has a roughly sixty-year history. In a very broad sense, much of the research in this field investigates how listeners map the input acoustic signal onto phonological units. Determining the nature of the mapping is an intriguing issue because the acoustic signal is highly variable, yet perception remains remarkably constant (and accurate) across many types of variation. Consequently, an overarching goal that unifies and motivates much of the work is to account for perceptual constancy, that is, to understand the perceptual mechanisms by which listeners arrive at stable percepts despite acoustic variation. Some theoretical approaches to speech perception postulate that invariant properties in the input signal underlie perceptual constancy, thereby defining a research program aimed at identifying the nature of the invariants. Other approaches do not assume invariants but either require principles that account for the necessarily more complex mapping between signal and phonological representation, or require more complex representations. As a result, theoretical approaches differ as well in their assumptions concerning the relevant phonological units (features, gestures, segments, syllables, words) and the structure of these units (e.g., abstract representations, categories consisting of traces of acoustic episodes). Within this overarching agenda, researchers also address many more specific questions. Is speech perception different from other types of auditory processing? How do listeners integrate multiple sources of information into a coherent percept? What initial perceptual capabilities do infants have? How does perception change with linguistic experience? What is the nature of perceptual influences on phonological structures? How do social categories and phonetic categories interact in perception? This bibliography is selective in several respects. “Speech perception” has traditionally referred to perception of phonetic and phonological information, distinct from recognition of spoken words. The division between these two perspectives on the listener’s task has long been a questionable one, and is in many respects an artificial one that does not reflect important current research questions and methods. Although ideally a bibliography would bridge these two approaches, the focus here is almost exclusively on speech perception. Moreover, within this focus, particular emphasis has been given to perceptual issues that are at the interface with other subdisciplines of linguistics—in particular, phonology, historical linguistics, and sociolinguistics. Another area, in addition to word recognition, that is underrepresented in this bibliography is perception of prosodic properties, although some of the edited collections cited here include reviews of both of these areas.

Several excellent overview articles by major figures in the field of speech perception have appeared in the past decade. Although all approach the main issues in the field from a perspective intended to be accessible by nonspecialists, they will all likely be challenging resources for undergraduates if they have little background in phonetics or psychology. Diehl, et al. 2004 focuses exclusively on speech perception. Cleary and Pisoni 2001 , Jusczyk and Luce 2002 , and Samuel 2011 consider issues in word recognition as well. Fowler 2003 summarizes and assesses both the speech perception and production literatures.

Cleary, M., and D. B. Pisoni. 2001. Speech perception and spoken word recognition: Research and theory. In Blackwell handbook of sensation and perception . Edited by E. B. Goldstein, 499–534. Malden, MA: Blackwell.

Comprehensive review of major issues and findings in speech perception; offers more condensed coverage of theoretical approaches and of spoken word recognition.

Diehl, R. L., A. J. Lotto, and L. L. Holt. 2004. Speech perception. Annual Review of Psychology 55:149–179.

DOI: 10.1146/annurev.psych.55.090902.142028

Detailed presentation of three theoretical approaches: motor theory, direct realism, and general auditory and learning approaches. Provides critical assessment of the strengths and weaknesses of these approaches in light of selected classic perceptual phenomena. Available online for purchase or by subscription.

Fowler, C. A. 2003. Speech production and perception. In Handbook of psychology . Vol. 4, Experimental psychology . Edited by A. F. Healy, R. W. Proctor, and I. B. Weiner, 237–266. Hoboken, NJ: Wiley.

Presents key arguments and findings for acoustic (auditory) and gestural theories of perception; also assesses the literature on the influences of experience and learning on perception. Linguists may especially appreciate that the review frames issues of perception and production within the context of the relation between phonetic and phonological forms.

Jusczyk, P. W., and P. A. Luce. 2002. Speech perception and spoken word recognition: Past and present. Ear and Hearing 23:2–40.

DOI: 10.1097/00003446-200202000-00002

Overview of major issues and findings, with particular attention to developmental speech perception. Theoretically, gives greater consideration to models of spoken word recognition than to theories of speech perception. An especially helpful aspect of this review is its focus on the historical context in which the major issues emerged. Available online for purchase or by subscription.

Samuel, A. G. 2011. Speech perception. Annual Review of Psychology 62:49–72.

DOI: 10.1146/annurev.psych.121208.131643

The most recent survey of the field. Pulls together issues, theories, and findings in speech perception and spoken word recognition, including work on statistical and perceptual learning of speech. Available online for purchase or by subscription.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Linguistics »
  • Meet the Editorial Board »
  • Acceptability Judgments
  • Acquisition, Second Language, and Bilingualism, Psycholin...
  • Adpositions
  • African Linguistics
  • Afroasiatic Languages
  • Algonquian Linguistics
  • Altaic Languages
  • Ambiguity, Lexical
  • Analogy in Language and Linguistics
  • Animal Communication
  • Applicatives
  • Applied Linguistics, Critical
  • Arawak Languages
  • Argument Structure
  • Artificial Languages
  • Australian Languages
  • Austronesian Linguistics
  • Auxiliaries
  • Balkans, The Languages of the
  • Baudouin de Courtenay, Jan
  • Berber Languages and Linguistics
  • Bilingualism and Multilingualism
  • Biology of Language
  • Borrowing, Structural
  • Caddoan Languages
  • Caucasian Languages
  • Celtic Languages
  • Celtic Mutations
  • Chomsky, Noam
  • Chumashan Languages
  • Classifiers
  • Clauses, Relative
  • Clinical Linguistics
  • Cognitive Linguistics
  • Colonial Place Names
  • Comparative Reconstruction in Linguistics
  • Comparative-Historical Linguistics
  • Complementation
  • Complexity, Linguistic
  • Compositionality
  • Compounding
  • Computational Linguistics
  • Conditionals
  • Conjunctions
  • Connectionism
  • Constructions, Verb-Particle
  • Conversation Analysis
  • Conversation, Maxims of
  • Conversational Implicature
  • Cooperative Principle
  • Coordination
  • Creoles, Grammatical Categories in
  • Critical Periods
  • Cyberpragmatics
  • Default Semantics
  • Definiteness
  • Dene (Athabaskan) Languages
  • Dené-Yeniseian Hypothesis, The
  • Dependencies
  • Dependencies, Long Distance
  • Derivational Morphology
  • Determiners
  • Distinctive Features
  • Dravidian Languages
  • Endangered Languages
  • English as a Lingua Franca
  • English, Early Modern
  • English, Old
  • Eskimo-Aleut
  • Euphemisms and Dysphemisms
  • Evidentials
  • Exemplar-Based Models in Linguistics
  • Existential
  • Existential Wh-Constructions
  • Experimental Linguistics
  • Fieldwork, Sociolinguistic
  • Finite State Languages
  • First Language Attrition
  • Formulaic Language
  • Francoprovençal
  • French Grammars
  • Gabelentz, Georg von der
  • Genealogical Classification
  • Generative Syntax
  • Genetics and Language
  • Grammar, Categorial
  • Grammar, Construction
  • Grammar, Descriptive
  • Grammar, Functional Discourse
  • Grammars, Phrase Structure
  • Grammaticalization
  • Harris, Zellig
  • Heritage Languages
  • History of Linguistics
  • History of the English Language
  • Hmong-Mien Languages
  • Hokan Languages
  • Humor in Language
  • Hungarian Vowel Harmony
  • Idiom and Phraseology
  • Imperatives
  • Indefiniteness
  • Indo-European Etymology
  • Inflected Infinitives
  • Information Structure
  • Interjections
  • Iroquoian Languages
  • Isolates, Language
  • Jakobson, Roman
  • Japanese Word Accent
  • Jones, Daniel
  • Juncture and Boundary
  • Khoisan Languages
  • Kiowa-Tanoan Languages
  • Kra-Dai Languages
  • Labov, William
  • Language Acquisition
  • Language and Law
  • Language Contact
  • Language Documentation
  • Language, Embodiment and
  • Language for Specific Purposes/Specialized Communication
  • Language, Gender, and Sexuality
  • Language Geography
  • Language Ideologies and Language Attitudes
  • Language in Autism Spectrum Disorders
  • Language Nests
  • Language Revitalization
  • Language Shift
  • Language Standardization
  • Language, Synesthesia and
  • Languages of Africa
  • Languages of the Americas, Indigenous
  • Languages of the World
  • Learnability
  • Lexical Access, Cognitive Mechanisms for
  • Lexical Semantics
  • Lexical-Functional Grammar
  • Lexicography
  • Lexicography, Bilingual
  • Linguistic Accommodation
  • Linguistic Anthropology
  • Linguistic Areas
  • Linguistic Landscapes
  • Linguistic Prescriptivism
  • Linguistic Profiling and Language-Based Discrimination
  • Linguistic Relativity
  • Linguistics, Educational
  • Listening, Second Language
  • Literature and Linguistics
  • Maintenance, Language
  • Mande Languages
  • Mass-Count Distinction
  • Mathematical Linguistics
  • Mayan Languages
  • Mental Health Disorders, Language in
  • Mental Lexicon, The
  • Mesoamerican Languages
  • Minority Languages
  • Mixed Languages
  • Mixe-Zoquean Languages
  • Modification
  • Mon-Khmer Languages
  • Morphological Change
  • Morphology, Blending in
  • Morphology, Subtractive
  • Munda Languages
  • Muskogean Languages
  • Nasals and Nasalization
  • Niger-Congo Languages
  • Non-Pama-Nyungan Languages
  • Northeast Caucasian Languages
  • Oceanic Languages
  • Papuan Languages
  • Penutian Languages
  • Philosophy of Language
  • Phonetics, Acoustic
  • Phonetics, Articulatory
  • Phonological Research, Psycholinguistic Methodology in
  • Phonology, Computational
  • Phonology, Early Child
  • Policy and Planning, Language
  • Politeness in Language
  • Positive Discourse Analysis
  • Possessives, Acquisition of
  • Pragmatics, Acquisition of
  • Pragmatics, Cognitive
  • Pragmatics, Computational
  • Pragmatics, Cross-Cultural
  • Pragmatics, Developmental
  • Pragmatics, Experimental
  • Pragmatics, Game Theory in
  • Pragmatics, Historical
  • Pragmatics, Institutional
  • Pragmatics, Second Language
  • Prague Linguistic Circle, The
  • Presupposition
  • Psycholinguistics
  • Quechuan and Aymaran Languages
  • Reading, Second-Language
  • Reciprocals
  • Reduplication
  • Reflexives and Reflexivity
  • Register and Register Variation
  • Relevance Theory
  • Representation and Processing of Multi-Word Expressions in...
  • Salish Languages
  • Sapir, Edward
  • Saussure, Ferdinand de
  • Second Language Acquisition, Anaphora Resolution in
  • Semantic Maps
  • Semantic Roles
  • Semantic-Pragmatic Change
  • Semantics, Cognitive
  • Sentence Processing in Monolingual and Bilingual Speakers
  • Sign Language Linguistics
  • Sociolinguistics
  • Sociolinguistics, Variationist
  • Sociopragmatics
  • South American Indian Languages
  • Specific Language Impairment
  • Speech, Deceptive
  • Speech Perception
  • Switch-Reference
  • Syntactic Change
  • Syntactic Knowledge, Children’s Acquisition of
  • Tense, Aspect, and Mood
  • Text Mining
  • Tone Sandhi
  • Transcription
  • Transitivity and Voice
  • Translanguaging
  • Translation
  • Trubetzkoy, Nikolai
  • Tucanoan Languages
  • Tupian Languages
  • Usage-Based Linguistics
  • Uto-Aztecan Languages
  • Valency Theory
  • Verbs, Serial
  • Vocabulary, Second Language
  • Vowel Harmony
  • Whitney, William Dwight
  • Word Classes
  • Word Formation in Japanese
  • Word Recognition, Spoken
  • Word Recognition, Visual
  • Word Stress
  • Writing, Second Language
  • Writing Systems
  • Zapotecan Languages
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [66.249.64.20|185.80.149.115]
  • 185.80.149.115

Speech Perception and Comprehension

  • First Online: 12 July 2019

Cite this chapter

Book cover

  • Bernd J. Kröger 3 &
  • Trevor Bekolay 4  

505 Accesses

In this chapter we explain how the recognition of sound features works on the acoustic-auditory level and how recognition of sound features leads to the activation of symbolic-cognitive variables such as sounds, syllables, and words. We describe how the speech information signal is compressed from a detailed acoustic-auditory representation to an efficient symbolic-cognitive representation. We also discuss why we perceive complex auditory stimuli like speech signals categorically, and why humans can easily extract invariant phonemic features from speaker-specific acoustic sound features. In addition, we introduce two theories of language perception, namely the motor theory of speech perception and the two-way theory of speech perception. Finally, we discuss the close interweaving of speech perception and speech production, and perception-related language and speech disorders.

  • Comprehension
  • Motor theory
  • Dual-route theory
  • Speech perception
  • Auditory processing

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

American Speech, Language & Hearing Association (2016) Classification of hearing disorders: http://www.asha.org/public/hearing/Hearing-Loss/

Galantucci B, Fowler CA, Turvey MT (2006) The motor theory of speech perception reviewed. Psychon Bull Rev 13:361–377

Article   Google Scholar  

Hashimoto Y, Sakai KL (2003) Brain activations during conscious self-monitoring of speech production with delayed auditory feedback: an fMRI study. Hum Brain Mapp 20:22–28

Hickok G, Poeppel D (2007) The cortical organization of speech processing. Nat Rev Neurosci 8:393–402

Article   CAS   Google Scholar  

Houde JF, Jordan MI (1998) Sensorimotor adaptation in speech production. Science 279:1213–1216

Kent RD (1997) The speech sciences. Singular Publishing, San Diego, CA

Google Scholar  

Kröger BJ (1993) A gestural production model and its application to reduction in German. Phonetica 50:213–233

Lindblom B (1983) Economy of speech gestures. In: MacNeilage PF (ed) The production of speech. Springer, New York, pp 217–245

Chapter   Google Scholar  

Magnuson JS, Nusbaum HC (2007) Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. J Exp Psychol 33:391–409

Perkell J, Matthies M, Lane H, Guenther F, Wilhelms-Tricarico R, Wozniak J, Guiod P (1997) Speech motor control: acoustic goals, saturation effects, auditory feedback and internal models. Speech Comm 22:227–250

Raphael LJ, Bordon GJ, Harris KS (2007) Speech science primer: physiology, acoustics, and perception of speech. Lippincott Williams & Wilkins, Baltimore, MD

Download references

Author information

Authors and affiliations.

Department of Phoniatrics, Pedaudiology and Communications Disorders, RWTH Aachen University, Aachen, Germany

Bernd J. Kröger

Applied Brain Research, Waterloo, ON, Canada

Trevor Bekolay

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Kröger, B.J., Bekolay, T. (2019). Speech Perception and Comprehension. In: Neural Modeling of Speech Processing and Speech Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-15853-8_3

Download citation

DOI : https://doi.org/10.1007/978-3-030-15853-8_3

Published : 12 July 2019

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-15852-1

Online ISBN : 978-3-030-15853-8

eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Greek and Roman Papyrology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Agriculture
  • History of Education
  • History of Emotions
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Variation
  • Language Families
  • Language Evolution
  • Language Reference
  • Lexicography
  • Linguistic Theories
  • Linguistic Typology
  • Linguistic Anthropology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Culture
  • Music and Media
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business History
  • Business Ethics
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic Methodology
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Philosophy of Perception

  • < Previous chapter
  • Next chapter >

The Oxford Handbook of Philosophy of Perception

25 Speech Perception

Casey O'Callaghan is professor of philosophy at Washington University in St Louis. He is author of Sounds (Oxford 2007), Beyond Vision (Oxford 2017), and A Multisensory Philosophy of Perception (Oxford 2019).

  • Published: 13 January 2014
  • Cite Icon Cite
  • Permissions Icon Permissions

Is speech special? This chapter evaluates the evidence that speech perception is distinctive when compared with non-linguistic auditory perception. It addresses the phenomenology, contents, objects, and mechanisms involved in the perception of spoken language. According to the account it proposes, the capacity to perceive speech in a manner that enables understanding is an acquired perceptual skill. It involves learning to hear language-specific types of ethologically significant sounds. According to this account, the contents of perceptual experience when listening to familiar speech are of a variety that is distinctive to hearing spoken utterances. However, perceiving speech involves neither novel perceptual objects nor a unique perceptual modality. Much of what makes speech special stems from our interest in it.

Philosophers have devoted tremendous effort to explicating what it takes to understand language. The answers focus on things such as possessing concepts, mastering grammar, and grasping meanings and truth conditions. The answers thereby focus on extra-perceptual cognition. Understanding spoken language, however, also involves perception —grasping a spoken utterance requires hearing or seeing it. Perception’s role in understanding spoken language has received far less philosophical attention. According to a simple view, understanding speech is just a matter of assigning meaning to the sounds you hear or to the gestures you see. If so, what perception contributes to understanding spoken language is not distinctive to the case of spoken utterances. Against this, however, is the prospect that speech is special. In this chapter, I present and evaluate the evidence that speech perception differs from non-linguistic auditory perception. In particular, I discuss the phenomenology, contents, objects, and mechanisms of speech perception. I make proposals about the ways in which speech is and is not perceptually special. According to the account I offer, the capacity to perceive speech in a manner that enables understanding is an acquired perceptual skill. It involves learning to hear language-specific types of ethologically significant sounds. According to this account, while the contents of perceptual experience when listening to familiar speech are of a variety that is distinctive to hearing spoken utterances, perceiving speech involves neither novel perceptual objects nor a unique perceptual modality. Much of what makes speech special stems from our fierce interest in it.

1 Is Speech Perceptually Special?

There is a thriving debate about whether the human capacity to use and understand language is special (see, e.g., Hauser et al., 2002 ; Pinker and Jackendoff, 2005 ). A key part of this wider debate is whether the capacity to speak and understand speech is special (see, e.g., Liberman, 1996 ; Trout, 2001 ; Mole, 2009 ). My concern here is with speech perception. Is the human capacity to perceive spoken language special?

To be special requires a difference. However, the debate about whether speech is special is not just about whether speech perception in some respect differs from other forms of perception. It concerns whether speech perception should be distinguished as a distinctive or a unique perceptual capacity. Put in this way, the question relies on a comparison. The most common contrast is with general audition. The question thus is whether speech perception differs or is a distinct perceptual capacity when compared with non-linguistic auditory perception . A separate contrast is with the capacities of non-human animals . Is speech perception uniquely human? The contrast between human and non-human responses to spoken language is frequently used to illuminate the contrast between human speech perception and non-linguistic audition.

A difference is a difference in some respect, and being distinctive or unique is being distinctive or unique in some way, for some reason. In what respects is speech special? It is helpful to divide the candidates into four broad classes.

The first concerns the phenomenology of speech perception. Does what it is like to perceptually experience spoken utterances contrast with what it is like to perceptually experience non-linguistic sounds and events? One way to make progress on this question is to ask whether the perceptual experience of hearing speech in a language you know differs phenomenologically from that of hearing speech in an unfamiliar language.

The second concerns the contents of speech perception. Does the perceptual experience of speech involve contents absent from non-linguistic auditory experience? Does understanding a language affect which properties perceptual experiences represent spoken utterances to have?

The third concerns the objects of speech perception. Are the objects of speech perception distinct from the objects of non-linguistic audition? Does speech perception share objects with non-linguistic audition?

The fourth concerns the mechanisms of speech perception. Does perceiving speech involve perceptual processes that differ from those involved in perceiving non-linguistic sounds and events? Does speech perception involve a special perceptual module ? Is speech perception the work of a distinct perceptual modality ?

Answering the question, ‘Is speech special?’ thus means addressing a number of different questions. This essay focuses on the contrast between speech perception and human non-linguistic auditory perception. I distinguish the various respects in which speech might be special when compared with non-linguistic audition. I assess the evidence and advance proposals about the respects in which speech perception is special.

2 Phenomenology

Is perceiving speech phenomenologically special? Is what it’s like, for the subject, to perceptually experience speech different, distinctive, or unique when compared with non-linguistic audition?

It is natural to think that the perceptual experience of listening to spoken language differs phenomenologically from the perceptual experience of listening to non-linguistic sounds, simply because speech sounds and non-linguistic sounds differ acoustically. Hearing the sound of a drop of water differs phenomenologically from hearing the sound of the spoken word ‘drop’ because the sounds differ in their basic audible qualities.

However, the perceptual experience of spoken language may also involve distinctive phenomenological features that are absent from non-linguistic auditory experience. Start with the experiential contrast between listening to non-linguistic sounds and listening to spoken language. Begin with the case of a language you know. The experience of listening to speech in a language you know differs noticeably from the experience of listening to ordinary, non-linguistic environmental sounds, even once we eliminate acoustical differences. The phenomenological shifts associated with sinewave speech support this claim. Sinewave speech is an artificial signal in which an acoustically complex human voice is replaced by several sinewaves that vary in frequency and amplitude with the primary formants of the original speech signal, while removing acoustical energy at other frequencies ( Remez et al., 1981 ). At first, it is difficult to recognize the sounds of sinewave speech as speech sounds. Instead, they just sound like computer-generated noises. However, after hearing the original human speech from which the sinewave speech is derived, it is easy to hear what the sinewave speech says. The same stimulus is first experienced as non-speech sounds, and then it is experienced as speech. And this change is accompanied by a dramatic phenomenological shift.

In the case just described, you come to comprehend the speech. Thus, understanding might suffice to explain the phenomenological difference when you are listening to speech in a language you know. You grasp meanings, so the experiential difference could in principle be explained in terms of cognitive, rather than perceptual, phenomenology. (This explanation is unavailable if you reject that extra-perceptual cognition has proprietary phenomenology.)

To control for any contribution from understanding, consider the experiential contrast between listening to non-speech sounds and listening to speech in a language you do not know. Is there any phenomenological difference? It is possible reliably to discriminate speech in a language you do not understand from ordinary environmental sounds. Neonates prefer speech sounds to non-speech sounds though they do not understand language. In addition, sinewave speech in a language you do not know may appear first as non-speech sounds and then as speech. Thus, we have evidence that perceptually experiencing a stimulus as speech rather than as non-speech sounds makes a phenomenological difference that does not depend on understanding.

Understanding spoken utterances need not, however, contribute exclusively to the phenomenology of extra-perceptual cognition. Knowing a language may also impact the phenomenal character of perceptual experience. Consider the phenomenological contrast between the perceptual experience of listening to speech in a language you know and of listening to speech in an unfamiliar language. Of course, languages differ acoustically in ways that affect how they sound. For instance, whether or not you know Hindi, it sounds different from German. To control for acoustical differences that affect phenomenology, fix the language. Contrast the experience of a person who knows the language with that of a person who does not know the language when faced with the same spoken utterance. Or, consider a person’s experience prior to and after learning the language. Many philosophers agree that knowing the language affects the phenomenological character of perceptual experience, even while they disagree about the diagnosis (see O’Callaghan, 2011 : 784-787).

What is the source of the difference? Speech in a language you know differs perceptually in several respects. Most obviously, your perceptual experience of its temporal characteristics differs. When you know the language, audible speech does not seem like an unbroken stream of sounds. It seems instead to include discernible gaps, pauses, and other boundaries between words, clauses, and sentences, and you are able perceptually to resolve qualitative features and contrasts at a much finer temporal grain. Familiar speech also appears in other respects to differ qualitatively from unfamiliar speech. For instance, when you have mastered a spoken language, you are able to detect subtle qualitative features and their contrasts, such as the difference between ‘s’ and ‘z’, or the dropped ‘g’ or ‘t’ of certain accents. The stimulus sounds different and more detailed when you recognize it as speech and you know the language.

The argument of the last paragraph, unlike the argument from sinewave speech, requires comparing phenomenology across subjects or across long stretches of time. Thus, it is more controversial. An alternative way to establish the point is to compare the shift that occurs with sinewave speech in a language you know with the shift that occurs with sinewave speech in a language you do not know. In each case, recognizing the sounds as speech leads to a shift in phenomenal character. The change, however, is far more dramatic when you know the language. The difference between the two phenomenological contrasts is the difference that accrues thanks to knowing the language.

These arguments indicate that one’s perceptual experiences may differ phenomenologically when listening to speech in a known language, when listening to speech in an unfamiliar language, and when listening to non-speech sounds. Moreover, such phenomenological differences can be evoked even when we have controlled for acoustical differences. This supports the following two claims: knowing a language impacts the phenomenal character of perceptual experience when listening to spoken utterances; and, speech perception has phenomenal features that are distinctive when compared with non-linguistic audition.

Content concerns how things are represented to be. Content thus concerns things that are perceptually experienced and the features they are perceptually experienced to have. One way to characterize the contents of perceptual experiences appeals to their accuracy or veridicality conditions. Some prefer to speak of what a given perceptual experience purports to be facts about the world, or of how things seem or appear perceptually. Some philosophers hold that perceptual experiences differ phenomenologically only if they differ in how they represent things as being. Some also hold that there is a variety of content such that perceptual experiences differ in content only if they differ phenomenologically. In either case, a difference in content may help to explain the sorts of phenomenological differences mentioned in Section 2 . What we perceive when we perceive speech may, in this sense, differ from what we perceive when we perceive non-speech sounds. Speech perception may involve contents that are special or distinctive when compared with non-linguistic audition.

In what respects does the content of speech perception differ from that of non-linguistic audition? The characteristic sounds of human vocalization differ acoustically from the sounds of non-linguistic happenings such as blowing leaves, backfiring automobiles, and violins. The perceptual experience of speech reflects this. Such evident qualitative differences, which are underpinned by acoustical differences, are part of why sinewave speech at first sounds like meaningless computer noise, and why artificial speech often sounds inhuman. Perhaps, then, the perceptual experience of speech differs phenomenologically from the perceptual experience of non-linguistic sounds and happenings because its perceptually apparent features differ in a way that is recognizable and distinctive to spoken language.

This is compatible with an austere view of the types of features that one perceptually experiences when listening to speech or to non-speech sounds. The phenomenological difference between perceptually experiencing speech and non-speech may just stem from a difference in the patterns of low-level properties that the perceptual experiences represent. For instance, it may just stem from a difference in the apparent pattern of pitch, timbre, and loudness of a sound stream over time. Any further experiential differences may result from extra-perceptual cognition, such as thought or imagination.

This austere picture also suggests an attractive account of how perceptually experiencing speech in an unfamiliar language differs phenomenologically from perceptually experiencing speech in a language you know. As discussed in Section 2 , the audibly apparent temporal and qualitative features of spoken utterances in a language you know generally differ from those of speech in a language that is unfamiliar to you. Foreign language may sound like a continuous stream of indistinct babble, but familiar speech perceptually appears to be chunked into units that correspond to words and phrases and to include discernible gaps, pauses, and boundaries that distinguish such units from each other. Hearing familiar language also involves the capacity to perceptually experience sublexical features at a finer temporal grain, and to discern linguistically significant qualitative details and contrasts that you could not make out before. Conversely, it also involves failing to discern other qualitative contrasts that are linguistically irrelevant. Thus, in these ways, differences in the perceptually apparent pattern of individual sounds and low-level audible qualities such as pitch, timbre, and loudness over time may explain the phenomenological difference that knowing a language makes.

Nevertheless, such an austere account might not suffice. Some philosophers have claimed that grasping meanings or semantic properties contributes in a constitutive rather than merely causal manner to the phenomenal character of perceptual experience. They argue therefore that listening to spoken utterances when you know the language involves perceptually experiencing meanings or semantic properties (e.g., McDowell, 1998 ; Siegel, 2006 ; Bayne, 2009 ). According to such an account, perceptual experiences may represent or involve awareness not just as of low-level sensible features, such as pitch, timbre, loudness, and timing, but also as of high-level features, including semantic properties. Such an account supports a liberal view about what types of properties may be represented by episodes of perceptual experience (see, e.g., Siegel, 2006 ; Bayne, 2009 ).

The liberal view of speech perception’s contents faces an objection if it also must explain the phenomenological difference between the perceptual experience of listening to speech in a familiar language and of listening to speech in an unfamiliar language. The account requires that, for an utterance you understand, there is something distinctive it is like for you to perceptually experience its specific meaning. That is because nothing suggests you could not hear foreign utterances as meaningful if that does not require hearing specific meanings. Hearing meaningfulness, if not specific meanings, for instance, could help to explain the phenomenological difference between hearing speech in an unfamiliar language and hearing non-linguistic sounds. However, perceptually experiencing specific meanings can also account for the difference between hearing familiar and unfamiliar speech. Suppose, therefore, that you perceptually experience specific meanings, rather than just meaningfulness. Thus, differences in apparent meaning should impact the phenomenal character of perceptual experience for utterances in a known language. But consider homophonic utterances, which share pronunciation but not meaning. Homophonic utterances do not apparently cause perceptual experiences that differ in phenomenal character. For instance, even when they are embedded appropriately in meaningful linguistic contexts, the perceptual experience of hearing an utterance of ‘to’ does not clearly differ in phenomenal character from the perceptual experience of hearing an utterance of ‘too’ or ‘two’ (the same holds for homographic homophones). Complete sentences present a similar problem. Utterances of structurally ambiguous statements, such as, ‘Visiting relatives can be boring’, and those with scope ambiguities, such as, ‘Everyone chose someone’, may not, under their differing interpretations, lead to perceptual experiences that differ phenomenologically. The argument from homophones thus casts doubt on the claim that specific meanings make a distinctive difference to the phenomenal character of perceptual experience ( O’Callaghan, 2011 ).

A moderate account denies that the perceptual experience of speech includes awareness as of meanings or high-level semantic properties. It nevertheless explains the phenomenological difference that accrues thanks to knowing a language using resources beyond the austere account’s low-level acoustical features. According to one such account, listening to speech in a familiar language involves the perceptual experience of language-specific but non-semantic properties of spoken utterances.

Phonological features, such as phones and phonemes , form the basis for recognizing and distinguishing spoken words. Phonological features in general are respects of discernible non-semantic similarity and difference among utterances that may make a semantic difference. Phonological features are like the basic perceptible vocabulary or ‘building blocks’ of spoken language. 1 To illustrate, consider utterances of ‘bad’, ‘imbue’, and ‘glob’. In one respect, these utterances are perceptibly similar. Each apparently shares with the others the ‘b’ sound—[b]‌ in phonological notation. Next consider utterances of ‘lab’ and ‘lash’. They perceptibly match, except that the former contains the ‘b’ sound and the latter contains the ‘sh’ sound—[∫] in phonological notion. The phones [b] and [∫] are examples of features which may be shared among distinct spoken utterances, which may differ among otherwise indistinguishable utterances, and which may make a semantic difference. Distinct phones are distinguished by a perceptible difference that is linguistically significant in some human language. A phone thus is usefully understood in terms of a type whose members make a common linguistic contribution to any given language. One phone is distinguished from another by some perceptually discernible difference that is or may be exploited by some spoken language to signal a semantically significant difference. Since phones are the minimal perceptible features that make a linguistic difference in some world language, they are in this sense the perceptible ‘building blocks’ of spoken language.

Specific spoken languages do not all make use of this basic stock of building blocks in the same manner. Some spoken languages, for instance, include clicks and buzzes, while others do not. Moreover, spoken languages may, even when they make use of the same basic stock, differ in which classes of utterances they treat as linguistically equivalent and in which classes of utterances they treat as distinct. For example, spoken English distinguishes [l]‌ from [r], 2 but Japanese does not. Thus, the phones [l] and [r] correspond to distinct English phonemes, /l/ and /r/, but are allophones or linguistically equivalent variations of a single Japanese phoneme. Another example is that [p] and [p h ] are allophones of the English phoneme, /p/, but Mandarin Chinese treats them as distinct phonemes, /p/ and /p h /. The difference between [p] and [p h ] suffices to distinguish Chinese but not English words. So, some languages treat [p] and [p h ] as allophones of a single phoneme, while others treat them as distinct phonemes that may suffice for a semantic difference.

Phonemes thus may usefully be understood in terms of language-specific classes whose members are treated as linguistically equivalent, or as allophonic , within the context of that spoken language, even if under certain conditions its members may be perceptually distinguishable. A language’s phonemes are distinguished from one another by perceptually discernible differences that are semantically significant. The lesson is that certain utterance pairs are treated as linguistically equivalent by some languages but as linguistically distinct by others. Thus, spoken languages yield differing families of equivalence classes of utterances that make a common semantic contribution. So, the way in which a basic stock of speech sounds, which have the potential to signal semantic difference, in fact is utilized by a particular language is specific to that language. A language’s stock of linguistically significant sound types is distinctive to that language.

Since phonemes differ across languages, discerning a language’s phonemes requires substantial exposure and learning. That such features may be perceptually experienced nonetheless helps to explain patterns of similarity and difference among utterances that are apparent to users of a given language. The capacity perceptually to discern such similarities and differences is critical to understanding spoken language. It is not, however, explained by the perceptual experience of low-level audible attributes alone.

What is noteworthy is that users of a given language commonly treat certain crucial pairs of sounds or utterances as perceptibly equivalent, while those who do not know that language treat them as perceptibly distinct. For example, auditory perceptual discrimination tasks in linguistic contexts reveal that the sounds corresponding to ‘t’ in utterances of ‘ton’ and ‘stun’ auditorily appear virtually the same to fluent monolingual English users, but appear noticeably to differ to fluent monolingual users of Chinese. Spoken utterances of ‘bed’ and ‘bad’ in linguistic contexts differ audibly to English speakers but not to Dutch speakers. Speakers of one language may discern a common linguistic sound across utterances that differ acoustically while speakers of another language do not. So, suppose we have two groups of language users. Suppose all are attentively listening, and that each is presented with two sounds uttered by the same talker in a linguistic context. Those in the first group do not notice a difference between the speech sounds. They judge that they are audibly equivalent, and they behave as if the sounds are equivalent. Those in the other group do notice a difference between the speech sounds. They judge that they audibly differ, and they behave as if the sounds are not audibly equivalent. In this case, for at least one of the speech sounds, it is plausible to say that the perceptual experience of a language listener from the first group differs phenomenologically from the perceptual experience of a listener from the second group. If so, then for a large class of linguistic sounds, the perceptual experience of someone who knows a given language may differ from the perceptual experience of someone who does not. If only those who know a spoken language perceptually experience its language-specific phonological attributes, such as its phonemes, then this provides an attractive explanation for the difference. For instance, having a perceptual experience that represents the English phoneme /l/, rather than /r/, may explain why hearing an utterance of ‘law’ differs phenomenally from hearing an utterance of ‘raw’. Having perceptual experiences as of a single English phoneme explains a monolingual English speaker’s failure to perceptually distinguish utterances of distinct Chinese words. A central part of the phenomenological difference that accrues thanks to knowing a language thus stems from the perceptual experience of attributes whose linguistic significance is specific to that language.

The perceptual experience of language-specific features explains apparent patterns of similarity and difference that to a noteworthy degree are independent from lower-level audible attributes, such as pitch, timbre, and loudness over time. For instance, the low-level audible qualities of an utterance of /p/ vary across phonological contexts, speakers, moods, and social contexts. The perceptual experience of a single phoneme explains this kind of perceptually apparent sameness in the face of differing lower-level audible qualities. On the other hand, the same acoustical signal may appear as a /p/ in some contexts and as a /b/ or /k/ in another. In different contexts, distinct apparent phonemes may accompany matching low-level audible qualities.

A moderate account of this sort finds converging support from three sources of evidence. First, developmental evidence shows that young infants discern a wide variety of phonetic differences that are linguistically significant in various languages. However, between five and twelve months, infants cease to discern phonetic differences that are not linguistically significant in the languages to which they have been exposed. Babies in Pittsburgh stop distinguishing utterances that differ with respect to [p]‌ and [p h ], and babies in Madrid stop distinguishing utterances that differ with respect to [s] and [z]. Such pruning requires regular exposure to the language, and it is part of learning to become perceptually responsive to the features that are distinctive to a spoken language. Children thus learn to hear the sounds of their language (see, e.g., Eimas et al., 1971 ; Jusczyk, 1997 ).

Second, adult perception of certain critical speech sounds, such as stop consonants, is categorical (see Chapter XX, this volume; Harnad, 1987 ). This means that, in critical cases, such as the perception of stop consonants, gradually varying the value of a diagnostic physical parameter leads to uneven perceptual variation. For example, suppose we start with a stimulus experienced as /ba/ and gradually increase its voice onset time. At first, this makes little difference. At some point, however, the stimulus abruptly appears to shift to a /pa/. In a dramatic case of categorical perception, the change seems perfectly abrupt. Thus, given a boundary that is diagnostic for a perceptual category, stimuli that differ by a certain physical magnitude may differ only slightly in perceptual appearance when each falls within that boundary; however, stimuli that differ by that same physical magnitude may differ greatly in perceptual appearance when one but not the other falls within the boundary.

Patterns of categorical perception in fact vary accordingly. Adult categorical perception of speech sounds corresponds to language-specific phonological categories, generally those of the listener’s first language (though there is some flexibility). Perceptual awareness of phonological features thus helps to explain both perceptually apparent patterns of similarity and difference among utterances within a language and variation in patterns of apparent similarity and difference across speakers of different languages.

Third, evidence from aphasias , language-related disorders, suggests that the capacity to understand spoken language normally requires the capacity to perceive language-specific attributes of speech that are not meanings. Moreover, the latter capacity affects the phenomenal character of auditory perceptual experience. Individuals with transcortical sensory aphasia (TSA) have a severely impaired capacity to grasp and to understand linguistic meanings, but they retain the capacities to hear, to generate, and to repeat spoken utterances. They commonly are unaware of their disorder. In contrast, individuals with pure word deafness (PWD) have intact semantic capacities but lack the capacity to perceive spoken language as such. Individuals with PWD are unable to hear sounds or utterances as spoken words or linguistic units. Their deficit is limited to auditory language perception. They may learn to use sign language or even read lips. And their hearing otherwise remains normal. They can hear and recognize barking dogs, cars, and even the sounds of familiar voices. Individuals with PWD say, however, that words fail to ‘come up’ and describe the auditory experience of spoken language as like hearing garbled sound or foreign language (see, especially, Poeppel 2001 : 681). These descriptions of TSA and PWD suggest that there is an important phenomenological difference in perceptual experience that stems from being able to discern and to recognize language-specific features but that does not require the capacity to discern and to recognize the meanings of spoken utterances. Auditorily experiencing language-specific features other than meanings therefore plausibly captures this difference. Phonological and other structural features of spoken utterances are good candidates. 3

Appealing to the content of perceptual experience thus helps to explain what is distinctive about the perceptual experience of listening to speech. In particular, two sorts of features help to account for the difference between the perceptual experience of listening to unfamiliar speech and of listening to speech in a language you know. When you know a language, the patterns of determinate low-level audible attributes you perceptually experience differ from when you do not know the language. This difference concerns the specific arrangement of low-level qualitative and temporal attributes, each of which you could, in principle, perceptually experience even non-linguistic sounds to bear. However, understanding speech also involves perceptually experiencing spoken utterances to bear language-specific attributes, including phonological properties such as phonemes. Developing the capacity to perceptually experience such language-specific features requires exposure and perceptual learning. Its exercise is part of any adequate explanation for the experiential difference that accrues thanks to knowing a language. While I have expressed doubt that meanings and high-level semantic properties are represented by perceptual experiences, I leave open whether and which additional language-specific features are among the contents of perceptual experience when listening to speech. For instance, you may perceptually experience morphemes, lexemes, or even grammatical properties when you listen to speech in a language you understand. Greater attention to the ways such features affect the phenomenal character of perceptual experience will inform broader debates about the richness of perceptual experience—that is, about the types of features awareness of which constitutively shapes the phenomenal character of perceptual experience. This, in turn, should impact how we understand the interface of perception with cognition.

The previous section argued that the perceptual experience of speech differs in content from non-linguistic audition. This section concerns whether the objects of speech perception differ from those of non-linguistic audition. There are two ways to understand the objects of perception. Construed broadly, the objects of perception simply are targets of perception, and may include particular individuals, their attributes, happenings, or states of affairs. In this broad sense, to be an object of perception is just to be perceived. According to some accounts, objects of perception in the broad sense are the components of content. In Section 3 , I proposed that the perceptual experience of speech involves awareness as of language-specific features. So, in the broad sense, the objects of speech perception are special when compared with those of non-linguistic audition.

Construed more narrowly, however, the objects of perception are the individuals that bear perceptible attributes. In this narrow sense, vision’s objects might include ordinary material objects that look to have attributes such as shape and colour, and audition’s objects plausibly include individual sounds that have pitch, timbre, and loudness. Further philosophical debates concern the natures of the objects of perception, including whether they are public or private. The phenomenological differences between speech perception and non-linguistic audition, especially since they are dramatic, might be taken to suggest that the objects of speech perception in this sense differ from those of non-linguistic audition. This discussion concerns whether the objects of speech perception are special in the narrow sense that includes only individuals.

In one respect, it is trivial that the objects of speech perception differ from those of non-linguistic audition. One case involves perceiving speech, and the other involves perceiving non-speech. At the very least, perceiving speech involves perceiving sounds of a kind to which non-linguistic sounds do not belong, and vice versa. Speech sounds and non-linguistic sounds differ in their causes, their sources, and their effects, as well as in their semantic and other linguistic properties.

The claim that speech perception and general audition have different objects typically is not just the claim that they involve hearing different kinds of sounds or sounds with distinctive features. Speech perception researchers have claimed that the objects of speech perception are not sounds at all. This is a claim about the sorts of individuals perceived when one perceives speech. In particular, it is the claim that while the individuals you perceive in non-linguistic auditory perception are sounds, the individuals that you perceive when you listen to speech are not sounds. The objects of speech perception instead are individuals of a wholly different sort.

Three main sorts of argument are offered. The first type of argument appeals to the mismatch between salient features of the objects of speech perception and features of the acoustic signal. We can reconstruct the argument in the following way. The objects of non-linguistic audition are sounds. The perceptible features of sounds correspond to aspects of the acoustic signal. But, the perceptible features of speech do not correspond to aspects of the acoustical signal. The perceptible features of speech are, thus, not perceptible features of sounds. So, the objects of speech perception differ from those of non-linguistic audition.

This argument can be illustrated using the case of apparent phonological features, such as phones or phonemes . The acoustic attributes that correspond to a perceived phonological feature vary greatly depending upon setting and context. Not only do they vary in expected ways, with speaker, mood, and accent, but they also depend locally upon the surrounding linguistic context. For example, phonological features are not uttered in discrete, isolated units. Instead, they are articulated in a continuous stream that flows gradually from one to the next. This has two noteworthy consequences. First, information about one phoneme is blended with information about surrounding phonemes. Because distinct speech sounds are coarticulated , when I utter ‘imbue’, the fact that the /i/ is followed by /m/ shapes how I pronounce the /i/. This differs from how I pronounce the /i/ when it is followed by /d/, as in ‘idiom’. In fact, no clear invariant acoustic signature corresponds to an utterance of a given phoneme in all of its perceptible instances. And a given acoustical configuration might contribute to distinct apparent phonemes in different contexts. Second, some have been inclined to say that perceptible speech appears to be segmented into discrete phonemes. However, the acoustic information by which you discern the presence of a given phoneme is present during the utterance of surrounding phonemes. For instance, the acoustical information corresponding to /æ/ in an utterance of ‘dab’ is present during the articulation of both the /d/ and the /b/ (and vice versa). Thus, no clear acoustic boundaries correspond to any segmentation that is apparent between adjacent phonemes. Therefore, there exists no consistent, context-independent, homomorphic mapping between apparent phonemes and straightforward features of the acoustic signal (see, e.g., Appelbaum, 1999 ; Remez and Trout, 2009 ). 4 This point should be evident to anyone who has laboured with speech recognition software. It leads some philosophers to anti-realism about phonological features. Rey (2012) , for instance, holds that phonemes are intentional inexistents (see also Smith, 2009 ).

In light of this, Liberman et al. (1967) and other early proponents of the Motor Theory famously proposed that the objects of speech perception are not sounds at all, but instead are something involved in the pronunciation of speech (see also the papers collected in Liberman, 1996 ). The core idea is that features of perceived speech do map in a homomorphic, invariant way onto types of gestures involved in the production of speech. For instance, pronouncing an instance of /d/ involves stopping airflow by placing the tongue at the front of the palate behind the teeth and then releasing it while activating the vocal folds. Pronouncing /b/ involves a voiced release of air from pursed lips. Such articulatory gestures , and the component configurations and movements they comprise, make the manner in which speech is perceptually experienced intelligible in a way that attention to the acoustic signal does not, since such gestures and their descriptions are less sensitive to context. 5 The claim was that the acoustical signal encodes information about articulatory gestures and their features. If articulatory gestures and their features rather than sounds and their attributes are the best candidates for what we perceive when we are perceptually aware of instances of phonemes, then articulatory gestures are the objects of speech perception. Thus, the objects of speech perception and of non-linguistic audition differ in kind. The former are articulatory gestures with phonological characteristics, and the latter are sounds with audible attributes.

These arguments do not establish that the bearers of phonological features are not bearers of non-linguistic audible attributes. Thus, they do not establish that the objects of speech perception include individuals of a wholly different kind from the objects of non-linguistic audition. On one hand, the mismatch argument relies on the presumption that ordinary auditory awareness does map in an invariant, homomorphic way onto features of the acoustic stimulus. However, even pitch, an apparently simple audible quality, has a complex relationship to frequency. In addition, context effects abound. For instance, varying the attack of a sound affects its timbre, and the apparent duration of a tone is affected by the duration of a tone presented earlier or even later. More generally, the apparent objects of auditory awareness in acoustically complex environments do not map clearly and in invariant ways onto straightforward features of the acoustic signal. Nothing obvious in an acoustical stream signals how to distinguish the sound of a guitar from the sound of a voice in a crowded bar. The central lesson of work on auditory scene analysis is that ordinary sounds are individuated—they are distinguished from each other at a time, and they are tracked and segmented over time—in the face of highly complex, interwoven acoustic information ( Bregman, 1990 ).

On the other hand, the argument also relies on the presumption that non-linguistic audition’s objects do not map in an illuminating way onto the events that produce acoustic information. However, audition’s vital function is to provide perceptual access to events in the environment. Accordingly, human audition carves up the acoustical scene in a way that is predicated upon an interest in identifying sound sources. In fact, the way in which sounds are individuated suggests that the objects of non-linguistic auditory perception include sound sources rather than mere acoustical events or sound streams. In the face of complex, entangled acoustical information, you distinguish the sound of the guitar from the sound of the voice because they have distinct sources. We attend to and identify sounds relative to sources, and this is reflected in our thought and talk about sounds, which concern, for instance, the sound of the car door , the sound of the dog , the sound of scratching . Many descriptive sound words are source oriented: rattle , bang , crack . So, just as articulatory gestures illuminate the manner in which the objects of speech perception are individuated and classified (see Matthen, 2005 ), considering the environmental happenings that make sounds illuminates the manner in which the objects of non-linguistic auditory perception are individuated and classified (see, e.g., Nudds, 2010 ). Audition’s objects thus fail to map in an invariant, homomorphic manner onto simple physical properties of an acoustic stimulus, and sound sources help to explain the manner in which audition’s objects are individuated and classified. In these respects, non-linguistic audition does not differ from speech perception. The mismatch argument fails.

The second type of argument is that cross-modal influences in the perception of speech reveal that the objects of speech perception differ in kind from the objects of non-linguistic audition (see, e.g., Trout, 2001 , for discussion). The McGurk effect is one powerful example ( McGurk and Macdonald, 1976 ). Subjects presented with audio of an utterance of the velar /ga/ along with video of a speaker uttering the bilabial /ba/ regularly report perceptually experiencing the alveolar /ga/. Seeing the speaker impacts which phoneme perceptually appears to be uttered. In fact, visual information systematically affects which phoneme you perceptually experience, so both vision and audition provide information about the objects of speech perception. Moreover, Gick and Derrick (2009) demonstrate tactile influences on speech perception. The objects of speech perception are multi-modally accessible. Sounds, however, are neither visible nor multi-modally accessible. Therefore, since sounds are the objects of ordinary non-linguistic audition, the argument concludes that the objects of speech perception and non-linguistic audition must differ.

One objection stems from the reply to the first argument. If audition’s objects include sound sources, and sound sources are ordinary happenings like collisions and vibrations, then audition’s objects might include things that are visible. The other objection is that speech perception is not unique in being subject to influence from multiple senses. Cross-modal recalibrations and illusions are rampant. The ventriloquist illusion shows that vision impacts non-linguistic audition. The motion bounce effect and the sound-induced flash illusion show that non-linguistic audition alters visual experience. Visual capture and the rubber hand illusion show that vision affects touch and proprioception. And the touch-induced flash shows that touch alters vision. The examples multiply (for references and discussion, see, e.g., Spence and Driver, 2004 ; O’Callaghan, 2012 ; Bayne and Spence, Chapter 32 , this volume). In many such cases, the best explanation for some cross-modal effect is that perceptual modalities share common objects ( O’Callaghan, 2008 , 2012 ). Consider the sound-induced flash illusion. When presented with a single flash accompanied by two beeps, many subjects illusorily visually experience two flashes instead of one as a result of the two sounds. This illusion occurs because an apparent conflict between vision and audition is resolved in audition’s favour. Since even apparent conflict requires the assumption of a common subject matter, perceptual processes unfold as if a common environmental source produces both the visual and the auditory stimulation. Since, under such conditions, audition is more reliable for temporal features, the overall perceptual experience that results is as of two events rather than one. If, therefore, cross-modal effects support the claim that multimodal speech perception targets common objects of perception, cross-modal effects may support the claim that there are common objects of perception in multi-modal cases that do not involve speech. Such cross-modal effects thus offer additional support for the claim that non-linguistic audition reveals the sources of sounds, which also are visible. Multi-modality is not unique to speech.

The third type of argument stems from the received view that speech perception is categorical . Some have argued that the categorical nature of phoneme perception (see Section 3 ) shows that its objects are not ordinary sounds, since ordinary sounds need not be perceived categorically (for discussion, see, e.g., Trout, 2001 ; Pinker and Jackendoff, 2005 ; for a critical perspective, see, e.g., Diehl et al., 2004 ). It is true that some attributes of sounds, such as loudness or pitch height (cf. pitch chroma), are not perceived categorically. Nevertheless, there are several lines of response to the argument from categorical perception. First, categorical perception may be limited to certain types of phonemes, such as stop consonants, so not all phoneme perception is categorical. Second, non-linguistic audition may involve categorical perception if speech perception does. Third, non-linguistic creatures, such as quail and monkeys, perceive some speech sounds categorically (see, e.g., Diehl et al., 2004 : 177). Finally, colour perception commonly is regarded as categorical, but this does not establish that the objects of colour vision differ from the objects of ordinary vision. Categorical perception for selected phonemes therefore does not show that the objects of speech perception and the objects of non-linguistic audition differ in kind.

Arguments from mismatch, cross-modal influence, and categorical perception thus do not show that the objects of speech perception differ in nature from the objects of ordinary audition. Sounds are among the objects of auditory perception. But to deny that the objects of speech perception include sounds would require denying that spoken utterances may perceptually appear to have pitch, timbre, and loudness. Nonetheless, the considerations discussed above do support the claim that the objects of speech perception include events or happenings beyond sounds, such as the articulatory gestures of speakers. However, I have maintained that environmental happenings that make or have sounds are also among the objects of non-linguistic auditory perception. For instance, while you hear the crashing sound, you also may hear the collision that makes it. Thus, in speech perception and in general audition, both sounds and sound sources plausibly are among the objects of perceptual awareness.

Suppose one held that phonological features of perceptible speech, such as phones and phonemes, themselves were the objects of speech perception. Since phonological features are not individual sounds, one might be tempted to hold that the objects of speech perception differ from the objects of non-linguistic audition.

This would be a mistake. It conflates the broad and the narrow ways to understand the objects of perception. I have been discussing the narrow understanding of the objects of perception as individuals that bear perceptible attributes. Phonological features as I have characterized them may be among the objects of perception in the broad sense, but they are not objects of perception in the narrow sense.

The account I have offered denies that phones and phonemes are novel perceptible objects , understood as items or individuals , wholly distinct from audible sounds and articulatory events. It maintains instead that phonological features, including specific phones and phonemes, are perceptible properties or attributes of audible and multi-modally perceptible objects, such as sounds and articulatory events. Thus, for instance, a stream of utterances may perceptually appear to have, to bear, or to instantiate phonological attributes, such as [d]‌ or /d/ . Such perceptible linguistic features may be complex properties, and they may have complex relationships to simple acoustical, physical, or physiological properties. They may be common sensibles. One important virtue of this account is that it allows us to abandon the troublesome ‘beads on a string’ model of perceptible phonemes and to accommodate coarticulation. It does so because continuous sound streams or gestural events may perceptually appear at certain moments to instantiate multiple phonological attributes. Rather than perceptually appearing as discrete perceptible items or individuals arranged in a neatly segmented series (like typed letters in a written word), phonological properties of continuously unfolding spoken utterances may instead appear to be instantiated in connected, blended, or overlapping sequences by a common perceptible individual.

The objects of speech perception thus need not be wholly distinct from the objects of non-linguistic audition. Each may include sounds and happenings in the environment that ordinarily are understood to be the sources of sounds. In the specific case of speech, the objects of perception may include sounds of speech and gestures used to articulate spoken language. In a broad sense, they also may include phonological features.

5 Processes

What are the implications for questions about how humans perceive speech—about the means or mechanisms involved in speech perception? Does the perception of speech involve special processes, a special module, or perhaps even a special perceptual modality?

There is evidence that perceiving speech sounds does involve distinctive perceptual processes beyond those involved in hearing non-linguistic sounds. Duplex perception for dichotic stimuli shows that a single stimulus presented to one ear can, in conjunction with information presented to the other ear, contribute simultaneously to the perceptual experience as of both a non-linguistic sound and an apparently distinct speech sound ( Rand, 1974 ). The same acoustic cue is integrated into two distinct percepts. Duplex perception is thought by some to provide evidence for a special system or mode of listening for speech. That is because, under similar experimental conditions with only non-speech tones, masking rather than integration takes place. However, duplex perception does occur for complex non-linguistic sounds, such as slamming doors, so others have responded that speech perception does not involve dedicated perceptual processes distinct from general audition ( Fowler and Rosenblum, 1990 ). Nevertheless, the capacity to perceive non-linguistic sounds does differ developmentally from the capacity to perceive speech. Notably, for instance, the timing of critical periods for the development of linguistic and non-linguistic perceptual capacities differs. In addition, functional neuroimaging establishes that the patterns of brain activity associated with the perception of speech sounds do not match those associated with the perception of non-linguistic sounds. Most tellingly, however, perceptual capacities and disorders related to speech may dissociate from those related to non-linguistic audition. The example of pure word deafness discussed above puts this into relief. Individuals with PWD have intact abilities to hear and to recognize ordinary sounds but are unable to hear and recognize speech sounds as such. In addition, auditory agnosia concerning environmental sounds may leave linguistic capacities intact ( Saygin et al., 2010 ). This shows that one could auditorily perceive speech while lacking other commonplace auditory capacities. Thus, there is evidence to support the claim that there exist perceptual resources and processes devoted to the perception of speech.

Some have held on such grounds that, when compared with general, non-linguistic audition, speech perception is special in that it is modular (e.g., Fodor, 1983 ). Others even have claimed that it involves a special perceptual modality ( Liberman, 1996 ). I am reluctant to accept the strong view that speech perception involves a dedicated perceptual modality that is distinct from general audition and vision. Audition and vision may treat speech sounds and spoken utterances in a manner that differs from non-linguistic sounds and events, but this does not show that speech perception is a novel perceptual modality. Vision, for instance, devotes special resources and deals in different ways with the perception of objects, colour, motion, and shape. Still, there is considerable debate concerning how to count and individuate perceptual modalities. We might identify modalities by their distinctive objects, stimuli, physiology, function, or phenomenology, or by some combination of these criteria. In the case of the classic sense modalities, at least, the criteria tend to align. Some have maintained that we should be pluralists when individuating and counting sense modalities ( Macpherson, 2011 ). Maintaining that speech perception involves a novel perceptual modality nevertheless requires appealing to one or more of the criteria. None of these criteria, however, warrants positing a modality devoted to the perception of speech that is distinct from but on a par with the familiar examples of vision, hearing, smell, taste, and touch. For instance, speech perception does not involve awareness of novel perceptual objects, and it lacks proper sensibles inaccessible to other modalities. Speech perception lacks a distinguishing kind of proximal stimulus, and it lacks a dedicated sense organ and receptors. Its functional relations do not clearly mark it off as a wholly distinct way or manner of perceiving independent from audition or vision. And it is not apparent that its phenomenology has the type of proprietary, internally unified qualitative character that is distinctive to other perceptual modalities. For instance, while the phenomenology of other sensory modalities doubly dissociates, speech perception requires auditory or visual phenomenology and, thus, does not fully dissociate. Despite these indications, however, a more satisfactory theoretical understanding of the modalities of sensory perception will help to make progress on this question (see, e.g., Matthen ).

The weaker claim is that speech perception is modular. But good reasons also exist to doubt that a devoted perceptual module is responsible for the perception of speech. Appelbaum (1998) , for instance, argues forcefully against Fodor that domain general, top-down influences impact the perception of speech sounds. If a process is modular only if it is informationally encapsulated, then speech perception is not modular.

Perhaps it is possible to make do with a minimal story about the sense in which the processes associated with speech perception are special without appealing to a perceptual modality or even a perceptual module devoted to the perception of spoken language. Such a story may be framed in terms of our perceptual treatment of speech and speech sounds. Humans do have a special or differential selectivity or sensitivity for the sounds of speech. The striking evidence is that even neonates distinguish and prefer speech to non-speech sounds ( Vouloumanos and Werker, 2007 ). The sounds of spoken utterances are of special interest to us, relative to other kinds of environmental sounds and events.

Humans are not, however, born able to perceive all of the attributes that are distinctive to specific languages. Infants must prune and cease to perceive audible differences that are not linguistically significant in their own languages. They also must learn perceptually to discern linguistic sameness in the face of variation across speakers, moods, and contexts. This is learning perceptually to ignore irrelevant differences and to attend to crucial similarities, and it alters the language-specific perceptual similarity space involving speech sounds. Understanding a language, as it is spoken in a variety of contexts, demands such learning. In coming to know a spoken language, we begin to perceive the relevant language-specific features of sounds and utterances. Humans thus have a propensity for learning perceptually to discern the appropriate language-specific types to which spoken utterances belong.

6 What Makes Speech Special?

Perceiving the attributes that are distinctive to the speech sounds of a given language, I have argued, requires experience and learning. Learning a language thus is not simply a matter of learning a sound–meaning mapping. It involves acquiring the capacity perceptually to discern language-specific attributes of spoken utterances. In this sense, you learn to hear the sounds of your language. Learning a language is partly a matter of acquiring a perceptual skill.

Humans have a special propensity to learn to perceive language-specific attributes of speech sounds from birth, but this capacity develops later than other familiar perceptual capacities. For instance, young infants perceive individual objects and events, persistence, and sensible qualities, including colour, pitch, and loudness, prior to perceptually discerning types of sounds that are specific to a particular language. Perceptual awareness of spoken language may therefore be more like perceptual awareness of clapping of hands, barking dogs, or fingernails scratching a chalkboard, each of which involves acquired perceptual skills.

As with other auditory phenomena, the manner in which language-specific sounds are perceptually individuated and classified is illuminated by taking into account the environmental happenings that generate sounds. In particular, articulatory gestures and talking faces make sense of why users of a given language discern and treat various speech sounds as standing in relations of similarity and difference that do not stem in straightforward ways from acoustical characteristics. Considered as such, perceiving speech is a matter of detecting and discerning biologically significant kinds of sounds and happenings, rather than just detecting abstract features of an acoustic signal.

How does perceiving speech differ from perceiving other biologically significant kinds of environmental sounds? Consider a family of perceptual capacities attuned to varieties of animacy . For instance, humans may sometimes perceptually experience a pattern of moving dots as running , or seem to be aware of one dot chasing another dot around a display ( Heider and Simmel, 1944 ; see Scholl and Tremoulet, 2000 ; Gao et al., 2009 ). Here we describe the perception of inanimate things and motion in terms applicable to animate creatures and activities. Since such effects require only very minimal cues, this suggests humans have a special propensity to perceive aspects of animate creatures and their activities. That is, we have differential sensitivity to certain kinds of activity that creatures engage in, in contrast to simple mechanical patterns of motion traced by inanimate things. Perceiving speech is similar to such perceptual capacities in that its concern is a type of animacy exhibited by living things to which we have special sensitivity. In the case of speech (as in the case of faces ) this perceptual capacity is directed predominantly at members of our own species.

Speech perception belongs to an even more special subclass. Speech sounds are generated by communicative intentions of other humans. Like some facial expressions and non-linguistic vocalic sounds, the sounds of spoken utterances are caused by and thus have the potential to reveal the communicative intentions of their animate sources. Speech perception is among a class of ethologically significant perceptual phenomena that serve to disclose intentional activities involved in communication. Perceiving speech is detecting and discerning language-specific kinds of biologically significant events: those generated by communicative intentions of fellow human talkers. We hear people talking. We hear them as interlocutors.

Acknowledgements

I have learned a great deal about the philosophical issues raised by speech perception from Matthen (2005) , Mole (2009) , Remez and Trout (2009) , Rey (2012) , and Smith (2009) . These works, and conversations with their authors, drew me from my more general concern with sounds, audition, and multi-modality to the philosophically and empirically rich subject matter whose focus is the perception of spoken language. I gratefully acknowledge their influence upon my approach to this topic. Thanks to Mohan Matthen for helpful comments on this chapter.

Appelbaum, I. ( 1998 ). ‘ Fodor, modularity, and speech perception ’. Philosophical Psychology, 11(3), 317–330.

Google Scholar

Appelbaum, I. ( 1999 ). ‘ The dogma of isomorphism: A case study from speech perception ’. Philosophy of Science, 66, S250–S259.

Bayne, T. ( 2009 ). ‘ Perception and the reach of phenomenal content ’. The Philosophical Quarterly, 59(236), 385–404.

Bregman, A. S. ( 1990 ). Auditory Scene Analysis: The Perceptual Organization of Sound . Cambridge, MA: MIT Press.

Google Preview

Diehl, R. L. , Lotto, A. J. , and Holt, L. L. ( 2004 ). ‘ Speech perception ’. Annual Review of Psychology, 55, 149–179.

Eimas, P. D. , Siqueland, E. R. , Jusczyk, P. , and Vigorito, J. ( 1971 ). ‘ Speech perception in infants ’. Science, 171(3968), 303–306.

Fodor, J. ( 1983 ). The Modularity of Mind . Cambridge, MA: MIT Press.

Fowler, C. A. ( 1986 ). ‘ An event approach to the study of speech perception from a direct-realist perspective ’. Journal of Phonetics, 14, 3–28.

Fowler, C. A. and Rosenblum, L. D. ( 1990 ). ‘ Duplex perception: A comparison of monosyllables and slamming doors ’. Journal of Experimental Psychology: Human Perception and Performance, 16(4), 742–754.

Gao, T. , Newman, G. E. , and Scholl, B. J. ( 2009 ). ‘ The psychophysics of chasing: A case study in the perception of animacy ’. Cognitive Psychology, 59, 154–179.

Gick, B. and Derrick, D. ( 2009 ). ‘ Aero-tactile integration in speech perception ’. Nature, 462(7272), 502–504.

Harnad, S. ( 1987 ). Categorical Perception: The Groundwork of Cognition . Cambridge: Cambridge University Press.

Hauser, M. D. , Chomsky, N. , and Fitch, W. T. ( 2002 ). ‘ The faculty of language: What is it, who has it, and how did it evolve? ’ Science, 298, 1569–1579.

Heider, F. and Simmel, M. ( 1944 ). ‘ An experimental study of apparent behavior ’. The American Journal of Psychology, 57(2), 243–259.

Jusczyk, P. W. ( 1997 ). The Discovery of Spoken Language . Cambridge, MA: MIT Press.

Liberman, A. M. ( 1996 ). Speech: A Special Code . Cambridge, MA: MIT Press.

Liberman, A. M. and Mattingly, I. G. ( 1985 ). ‘ The motor theory of speech perception revised ’. Cognition, 21, 1–36.

Liberman, A. M. , Cooper, F. S. , Shankweiler, D. P. , and Studdert-Kennedy, M. ( 1967 ). ‘ Perception of the speech code ’. Psychological Review, 74(6), 431–461.

McDowell, J. ( 1998 ). Meaning, Knowledge, and Reality . Cambridge, MA: Harvard University Press.

McGurk, H. and MacDonald, J. ( 1976 ). ‘ Hearing lips and seeing voices ’. Nature, 264, 746–748.

Macpherson, F. ( 2011 ). ‘ Taxonomising the senses ’. Philosophical Studies, 153(1), 123–142.

Matthen, M. ( 2005 ). Seeing, Doing, and Knowing: A Philosophical Theory of Sense Perception . Oxford: Oxford University Press.

Mole, C. ( 2009 ). ‘The Motor Theory of speech perception’. In M. Nudds and C. O’Callaghan (eds), Sounds and Perception: New Philosophical Essays (pp. 211–233). Oxford: Oxford University Press.

Nudds, M. ( 2010 ). ‘ What are auditory objects? ’ Review of Philosophy and Psychology, 1(1), 105–122.

O’Callaghan, C. ( 2008 ). ‘ Seeing what you hear: Cross-modal illusions and perception ’. Philosophical Issues: A Supplement to Noýs, 18, 316–338.

O’Callaghan, C. ( 2011 ). Against hearing meanings.   Philosophical Quarterly, 61, 783–807.

O’Callaghan, C. ( 2012 ). Perception and multimodality. In E. Margolis , R. Samuels , and S. Stich , S. (eds), Oxford Handbook of Philosophy and Cognitive Science . Oxford: Oxford University Press.

Pinker, S. and Jackendoff, R. ( 2005 ). ‘ The faculty of language: What’s special about it? ’ Cognition, 95, 201–236.

Poeppel, D. ( 2001 ). ‘ Pure word deafness and the bilateral processing of the speech code ’. Cognitive Science, 25, 679–693.

Rand, T. C. ( 1974 ). ‘ Dichotic release from masking for speech ’. Journal of the Acoustical Society of America, 55, 678–680.

Remez, R. E. and Trout, J. D. ( 2009 ). ‘Philosophical messages in the medium of spoken language’. In M. Nudds , and C. O’Callaghan (eds), Sounds and Perception: New Philosophical Essays (pp. 234–264). Oxford: Oxford University Press.

’ Remez, R. E. , Rubin, P. E. , Pisoni, D. B. , and Carell, T. D. ( 1981 ). Speech perception without traditional speech cues ’. Science, 212, 947–950.

Rey, G. ( 2012 ). ‘Externalism and inexistence in early content’. In R. Schantz (ed.), Prospects for Meaning (pp. 503–530). New York: de Gruyter.

Saygin, A. P. , Leech, R. , and Dick, F. ( 2010 ). ‘ Nonverbal auditory agnosia with lesion to Wernicke’s area ’. Neuropsychologia, 48, 107–113.

Scholl, B. and Tremoulet, P. ( 2000 ). ‘ Perceptual causality and animacy ’. Trends in Cognitive Sciences, 4(8), 299–309.

Siegel, S. ( 2006 ). ‘Which properties are represented in perception?’ In T. S. Gendler and J. Hawthorne (eds), Percepual Experience (pp. 481–503). New York: Oxford University Press.

Smith, B. ( 2009 ). ‘Speech sounds and the direct meeting of minds’. In M. Nudds and C. O’Callaghan (eds), Sounds and Perception: New Philosophical Essays (pp. 183–210). Oxford: Oxford University Press.

Spence, C. and Driver, J. (eds) ( 2004 ). Crossmodal Space and Crossmodal Attention . Oxford: Oxford University Press.

Trout, J. D. ( 2001 ). ‘ The biological basis of speech: What to infer from talking to the animals ’. Psychological Review, 108(3), 523–549.

Vouloumanos, A. and Werker, J. F. ( 2007 ). ‘ Listening to language at birth: evidence for a bias for speech in neonates ’. Developmental Science, 10(2), 159–164.

Here I am alluding to but not endorsing the notorious ‘beads on a string’ analogy. I do not accept that characterization of phonological attributes, because I believe neither that they are items or individuals nor that they occur in neat, discrete sequences. Instead, I believe they are properties whose instances overlap. Further discussion in Section 4 .

For readability, I use the upright rather than inverted ‘r’ for the alveolar approximant. The upright ‘r’ standardly (in the International Phonetic Alphabet) is used for the trill.

Indeed, individuals with PWD perform poorly on tasks that require categorical perception for language-specific attributes. Thanks to Bob Slevc for discussion.

Early text-to-speech methods failed to appreciate this context dependence, and thus failed. Early attempts assigned each letter a sound and played the sounds assigned to specific letters in sequences that mirrored written texts. The results were unintelligible.

One complication is that due to coarticulation the gestures pronounced in normal speaking also exhibit some lack of invariance. Liberman and Mattingly (1985) revised the Motor Theory to claim that intended motor commands are the objects of speech perception. See Mole (2009) for a convincing critique of the revised account. Fowler’s (1986) Direct Realism maintains that articulatory gestures are the objects of speech perception but rejects that gestural events differ in kind from the objects of non-linguistic audition.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Psychology Dictionary

SPEECH PERCEPTION

a psychological process where the listener processes the speech in to a phonological presentation .

Avatar photo

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Posts

definition of speech perception

Meeting the Milestones: A Guide to Piaget's Child Developmental Stages

definition of speech perception

Counseling, Therapy, and Psychology: What Is The Difference?

definition of speech perception

The Psychology of Metaphysical Belief Systems

definition of speech perception

4 Key Considerations When Supporting a Loved One Through a Legal Battle for Justice 

How Exercise Can Boost Your Mental Health as You Age

Finding Balance: The Psychological Benefits of Staying Active

definition of speech perception

The Psychology of Winning: Case Studies and Analysis from the World of Sports

definition of speech perception

Transitioning to Digital Therapy: Navigating the Pros and Cons

definition of speech perception

From Loss to Liberation: The Psychological Journey Of Seniors Receiving All-On-4 Dental Implants

definition of speech perception

How to Create Family History Interview Questions?

definition of speech perception

The Most Common Addiction in the United States

Road to recovery: tools and resources for mental health treatment success.

definition of speech perception

Do Cat Allergy Shots for Humans Work?

Popular psychology terms, medical model, hypermnesia, affirmation, brainwashing, backup reinforcer, affiliative behavior, message-learning approach, behavioral sequence, contrast effect.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Speech perception as categorization

Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization , in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition.

Spoken syllables may persist in the world for mere tenths of a second. Yet, as adult listeners, we are able to gather a great deal of information from these fleeting acoustic signals. We may apprehend the physical location of the speaker, the speaker's gender, regional dialect, age, emotional state, or identity. These spatial and indexical factors are conveyed by the acoustic speech signal in parallel with the linguistic message of the speaker ( Abercrombie, 1967 ). Although these factors are of much interest in their own right, speech perception (SP) most commonly refers to the perceptual mapping from acoustic signal to some linguistic representation, such as phonemes, diphones, syllables, words, and so forth. 1

Most of the research in the field of SP has focused on the mapping from the acoustic speech signal to phonemes, the smallest linguistic unit that changes meaning within a particular language (e.g., /r/ and /l/ as in rake vs. lake ), with the often implicit assumption that phoneme representations are a necessary step in the comprehension of spoken language. The transformation from acoustics to phonemes occurs so rapidly and automatically that it mostly escapes our notice ( Näätänen & Winkler, 1999 ). Yet this apparent ease masks the complexity of the speech signal and the remarkable challenges inherent in phoneme perception.

As a starting point, one might presume that phoneme perception is accomplished by detecting characteristics in the acoustic signal that correspond to each phoneme or by comparing a phoneme template in memory with segments of the incoming signal. In fact, this was the presumption in the early days of SP, starting in the 1940s (see Liberman, 1996 ), and it led to the hope that machine speech recognition was on the horizon. However, it became clear rather quickly that SP was not a simple detection or match-to-pattern task ( Liberman, Delattre, & Cooper, 1952 ). Although there has been a wealth of studies documenting the acoustic “cues” that can signal the identity of different phonemes (see Stevens, 2000 , for a review), there is significant variability in the relationship of these cues to the intended phonemes of a speaker and the perceived phonemes of a listener. The variability is due to a multitude of sources, including differences in speaker anatomy and physiology ( Fant, 1966 ), differences in speaking rate ( Gay, 1978 ; Miller & Baer, 1983 ), effects of the surrounding phonetic context ( Kent & Minifie, 1977 ; Öhman, 1966 ), and effects of the acoustic environment such as noise or reverberation ( Houtgast & Steeneken, 1973 ). The end result of all of these sources of variability is that there appear to be few or no invariant acoustic cues to phoneme identity ( Cooper, Delattre, Liberman, Borst, & Gerstman, 1952 ; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967 ; but see Blumstein & Stevens, 1981 , for a possible exception). This means that listeners cannot accomplish SP by simply detecting the presence or absence of cues.

In place of a simple match-to-sample or detection approach, SP is now often conceived of as a complex categorization task accomplished within a highly multidimensional space. One can conceptualize a segment of the speech signal as a point in this space representing values across multiple acoustic dimensions. In most cases, the dimensions of this space are continuous acoustic variables such as fundamental frequency, formant frequency, formant transition duration, and so forth. That is, speech stimuli are represented by continuous values, as opposed to binary values of the presence or absence of some feature. SP is the process that maps from this space onto representations of phonemes or linguistic features that subsequently define the phoneme ( Jakobson, Fant, & Halle, 1952 ). This is an example of categorization , in that potentially discriminable sounds are assigned to functionally equivalent classes ( Massaro, 1987 ).

An early example of such an acoustic space representation for phoneme classes is present in Peterson and Barney (1952) , where vowel productions by adult males and females and children were displayed in terms of first and second formant ( F 1 and F 2) frequencies. This simple distribution map demonstrates that exemplars of particular phonemes tend to cluster together in acoustic space (e.g., instances of the vowel /i/ as in heat tend to have low F 1s and high F 2s), but there is a tremendous amount of overlap among the distributions of different vowels owing to variability in speech productions (see also Hillenbrand, Getty, Clark, & Wheeler, 1995 , for an update on these vowel measures, and Lisker & Abramson, 1964 , for overlap in consonant voicing distributions). Presumably, listeners have to determine boundaries in order to parse these acoustic spaces and perceive the intended phonemes despite acoustic variability. Whereas there are a few auditory perceptual discontinuities that may aid in parsing acoustic space into categories in some cases ( Holt, Lotto, & Diehl, 2004 ; Pisoni, 1977 ; Steinschneider et al., 2005 ), for the vast majority of cases listeners must determine the boundaries among phoneme categories on the basis of their experience with the language.

Unfortunately, even a perceptual categorization approach to SP does not provide easy answers to many of the questions regarding phoneme perception. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds, as well as the development of our conceptualization of SP that has resulted from these challenges. Because it is not possible to exhaustively review 60+ years of research and theory here, we focus on issues and experiments that define open research questions.

Challenges of Speech Sound Categorization

A major problem of mapping from multidimensional acoustic distributions to phonemes is that some of the variability in the acoustic input space is relevant to the linguistic message, some of the variability is related to characteristics of the speaker, and some of the variability is noise. To further complicate things, variation on any particular acoustic dimension could be the result of any of these sources, depending on the context. The pitch (fundamental frequency, f 0) of the vowel in the utterance /ba/, for example, may be linguistically insignificant as it varies with the sex and age of the speaker ( Klatt & Klatt, 1990 ), but relative pitch does serve as a linguistically reliable cue to /ba/ versus /pa/, with /pa/ having a higher pitch relative to /ba/ ( House & Fairbanks, 1953 ).

Voice pitch is one of as many as 16 cues that can distinguish /ba/ from /pa/ ( Lisker, 1986 ). Whereas any of these multiple cues may be informative for the speech categorization, the perceptual effectiveness of each cue varies. For example, when categorizing consonants such as /b/, /d/, and /g/, American English listeners make greater use of differences in formant transitions as opposed to frequency information in the noise burst that precedes the transitions even though both cues reliably covary with the consonants ( Francis, Baldwin, & Nusbaum, 2000 ). Of significance, listeners' relative reliance on particular acoustic cues changes across development (see, e.g., Nittrouer, 2004 ) and varies depending on the listener's native language (e.g., Iverson et al., 2003 ). Thus, establishing the mapping from an acoustic input space to a perceptual space is a developmental process that depends on language experience.

For several months after birth, normal-hearing infants appear to parse the speech input space in the same manner (see Kuhl, 2004 , and Werker & Tees, 1999 , for reviews). No matter the linguistic environment in which they are developing, the basic characteristics of the human auditory system's response to speech signals dictates perception. Since speech sounds must be discriminably different enough from one another to reliably convey meaning, languages have evolved inventories of speech sounds that exploit basic human auditory function ( Diehl & Lindblom, 2004 ; Lindblom, 1986 ). Thus, young infants tend to discriminate nearly any speech distinction they are presented ( Kuhl, 2004 ). However, by the first birthday, experience with the regularities of the native language restructures the perceptual space to which speech input maps ( Werker & Tees, 1984 ). By this time, infants developing in English-speaking environments perceive the same sounds differently, for example, than do infants developing in Swedish-speaking environments ( Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992 ). Infants appear to have parsed the perceptual space, finding regularity relevant to the native language amid considerable acoustic variability across other dimensions.

These changes have been described as a “warping” of the perceptual space ( Kuhl et al., 2008 ). If we imagine perceptual space as a multidimensional topography, the perceptual landscape can be described as relatively flat in early infancy, with any discontinuities arising from discontinuities in human auditory processing. With experience with the native language environment, the perceptual space is warped to reflect the regularities of the native speech input space ( Kuhl, 2000 ; Spivey, 2007 ), and infants begin to perceive speech relative to the characteristics of the native language rather than solely according to psycho-acoustic properties. The groundwork for reorganizing the perceptual space according to the regularities of the native language input thus begins in infancy (see Kuhl, 2000 ), although development of speech categories continues through childhood (see Walley, 2005 ). Although the development of speech categories is now widely documented, research is just beginning to uncover the learning mechanisms that guide this experience-dependent process.

A natural question that arises is, How does the initial categorization parsing based on one's native language affect the ability to learn a second language? A popular example of this issue comes from comparing English, which distinguishes /r/ from /l/, and Japanese, which does not use /r/ and /l/ to distinguish meaning and instead possesses a single lateral flap ( Ladefoged & Maddieson, 1996 ), which overlaps with /r/ and /l/ in an acoustic space defined by the onset frequencies of the second ( F 2) and third ( F 3) formants ( Lotto, Sato, & Diehl, 2004 ). Thus, English listeners must parse the perceptual space to best capture the linguistically relevant acoustic variability distinguishing /r/ from /l/, whereas Japanese listeners need not parse the space in quite the same manner, because variability in this region of the perceptual space is not relevant to Japanese ( Best & Strange, 1992 ). Once the perceptual system commits to a parse of the perceptual space, there are long-term consequences for SP; the experience that we have with the sounds of our native language fundamentally shapes how we hear speech. Specifically, between-category sensitivity (e.g., an English listener distinguishing the consonants of rock and lock ) is preserved, whereas within-category sensitivity (distinguishing two acoustically different instances of rock ) is attenuated ( Kuhl et al., 1992 ; Werker, 1994 ). This surely benefits our ability to communicate in a native language, but it has consequences for adults' perception and acquisition of nonnative speech categories.

An example of this is the difficulty native Japanese listeners have in perceiving English /r/ versus /l/ ( Goto, 1971 ; Miyawaki et al., 1975 ). Although Japanese adults can improve their English /r/–/l/ perception and production (e.g., Bradlow, Nygaard, & Pisoni, 1999 ; Bradlow & Pisoni, 1999 ; Logan, Lively, & Pisoni, 1991 ; McCandliss, Fiez, Protopapas, Conway, & McClelland, 2002 ), it may take decades of English experience for native Japanese listeners to approach native levels of perceptual performance with English /r/–/l/ ( Flege, Yeni-Komshian, & Liu, 1999 ), and even then, there are large individual differences in achievement (see, e.g., Slevc & Miyake, 2006 ). Native Japanese listeners' perceptual space has been tuned for the regularities of Japanese, and this organization is not entirely compatible with the speech input space of English.

The phenomenon of difficulty in perceiving nonnative speech categories demonstrates that speech is perceived through the lens of native language categories. Indeed, electrophysiological evidence suggests that the influence of categorization on SP is evident at very early stages of stimulus processing (e.g., Näätänen et al., 1997 ; Sharma & Dorman, 2000 ; Winkler et al., 1999 ; Zhang, Kuhl, Imada, Kotani, & Pruitt, 2001 ). The difficulties are greatest for nonnative sounds similar to native categories ( Best, 1994 ; Flege, 1995 ; Harnsberger, 2001 ), suggesting that the warping of the perceptual space by the first language especially influences SP of acoustically similar nonnative sounds. Although the difficulties appear to be related to the age of category acquisition ( Lenneberg, 1967 ), with adults having greater perceptual difficulty than younger listeners, much evidence suggests that this is related more to the length and degree of immersion in the second language environment than to maturation (e.g., Flege, 1995 ; Flege et al., 1999 ). Moreover, the perceptual changes introduced by parsing the perceptual space seem not to involve a loss of auditory sensitivity, since with sensitive measures adults can demonstrate an ability to distinguish difficult nonnative speech categories ( Werker & Tees, 1984 ).

Categorization, Not Categorical Perception

It is important to distinguish the description of SP as categorization from the notion that SP is categorical . Opening almost any perception or cognition textbook to the section on speech, one is likely to find an illustration displaying perhaps the best-known pattern of SP outside the field, categorical perception (CP; see Wolfe et al., 2008 ). In a typical CP experiment, a series of speech sounds varying in equal physical steps along some acoustic dimension is presented to listeners, whose task is to classify them as two or more phonemes. Typically, the proportion of each category response does not vary gradually with the change in acoustic parameters. Instead, there is an abrupt shift from consistent labeling of the stimuli as one phoneme to consistent labeling as a competing phoneme across a small change in the acoustics. This is one of three hallmarks of the phenomenon of CP. A second defining characteristic of CP is the pattern of discrimination across the acoustic speech series. When listeners discriminate pairs of stimuli along the series, the resulting function is discontinuous. Discrimination is nearly perfect for stimuli that lie on opposite sides of the sharp identification/categorization boundary, whereas discrimination is very poor for pairs of stimuli that are equally acoustically distinct but lie on the same side of the identification/categorization boundary. The final characteristic of CP is that identification/ categorization performance predicts discrimination performance; speech sounds that are given the same label (e.g., “ba”) are difficult to discriminate, whereas those given different labels are discriminated with high accuracy (see Harnad, 1987 ; Studdert-Kennedy, Liberman, Harris, & Cooper, 1970 ).

CP was formerly thought to be a peculiarity of SP ( Liberman, 1957 ; Liberman, Harris, Hoffman, & Griffith, 1957 ) and was among several perceptual phenomena that have had great impact on speech theories. Its interpretation served to ignite debates over the objects of SP and the mechanisms that support their processing (see Diehl, Lotto, & Holt, 2004 , for a review). However, CP has since been observed for perception of human faces ( Beale & Keil, 1995 ) and facial expressions ( Bimler & Kirkland, 2001 ), music intervals (see Krumhansl, 1991 , for a review), and artificial stimuli that participants learn to categorize in laboratory tasks ( Livingston, Andrews, & Harnad, 1998 ). It is observed in the behavior of nonhuman animals as well (see Kluender, Lotto, & Holt, 2005 , for a review). Moreover, the prototypical pattern of CP is not observed for all speech sounds. Its patterns are much weaker for vowels than for stop consonants like /b/ and /p/, for example ( Pisoni, 1973 ), and sensitive methods for measuring discrimination or discrimination training can cause the peaks in discrimination at the boundaries to disappear even for consonants ( Carney, Widin, & Viemeister, 1977 ; Samuel, 1977 ). Rather than a speech-specific phenomenon, CP is a far more general characteristic of how perceptual systems respond to experience with regularities in the environment ( Damper & Harnad, 2000 ) and, perhaps, of how time-varying signals are accommodated in perceptual memory ( Mirman, Holt, & McClelland, 2004 ). Thus, the theoretical implications associated with CP (such as the proposition that it is a speech-specific phenomenon or that it is a qualitatively different sort of perceptual process) have not withstood empirical scrutiny.

However, although much of the controversy about the interpretation of CP has settled, CP has left an indelible mark on thinking about SP (perhaps especially among those outside the immediate field of SP). The sharp identification functions of CP are characterized by their steep boundary, but also by the relative flatness of the function within categories giving the appearance that, within a speech category, tokens are equivalent and that their acoustic variability is uninformative to the perceptual system. The classic CP pattern of responses suggests that the mapping from acoustics to speech label is discrete, such that acoustically variable instances of /ba/, for example, are mapped to “ba” irrespective of the acoustic nuances of a particular /ba/, its speaker, or its context.

Relatedly, one of the ways in which CP has left its mark is that descriptions of SP tend to describe speech identification instead of speech categorization . On the face of it, this seems a small difference, especially since these terms are often used interchangeably in the SP literature. However, identification (at least as it is used in other categorization literatures) is a decision about an object's unique identity that requires discrimination between similar objects. Categorization , on the other hand, reflects a decision about an object's type or kind requiring generalization across the perceptually discriminable physical variability of a class of objects ( Palmieri & Gauthier, 2004 ). Whereas CP, with its suggested insensitivity to intracategory variability, is consistent with identification , there is much evidence that the facts of SP are better captured by categorization .

For example, when one exploits measures more continuous than the binary responses typical of CP tasks (e.g., was that sound /ba/ or /da/?), listeners' behavior suggests the rich internal structure of speech categories. Listeners rate some exemplars as “better” instances of a speech category than others (e.g., Iverson & Kuhl, 1995 ; Kuhl, 1991 ; Volaitis & Miller, 1992 ). Eyetracking paradigms further reveal that fine-grained acoustic details of an utterance affect its categorization (e.g., McMurray, Aslin, Tanenhaus, Spivey, & Subik, 2008 ; McMurray, Tanenhaus, & Aslin, 2002 ). It seems that the appearance of phonetic homogeneity in CP is largely a result of the binary response labels of CP identification tasks ( Lotto & Holt, 2000 ). Furthermore, SP is affected by the familiarity of the voice that utters a token ( Nygaard & Pisoni, 1998 ), suggesting that fine-grained acoustic details are retained in addition to phonemic labels. This more detailed information persists to influence word-level knowledge ( Hawkins, 2003 ; McMurray et al., 2002 ) and memory ( Goldinger, 1996 , 1998 ). It appears that SP is not completely based on discrete, arbitrary labels such as phonemes ( Lotto & Holt, 2000 ). Therefore, it is likely to be more productive to consider the mapping from the multidimensional input space to a perceptual space that has been studied by SP research as categorization rather than as categorical .

If SP is really a case of perceptual categorization, then our understanding of speech communication could benefit from what we know about general categorization processes. In fact, many of the models that have been successful for visual categorization have been applied to speech sound categorization, including classic prototype ( Samuel, 1982 ), decision bound ( Maddox, Molis, & Diehl, 2002 ; Nearey, 1990 ), and exemplar ( Johnson, 1997 ) models. However, although perceptual categorization has long been studied in the cognitive sciences (see, e.g., Cohen & Lefebvre, 2005 , for a review), the categorization challenges presented by speech signals are somewhat different from those for the visual categories that are more often studied: The speech input space is composed of mostly continuous acoustic dimensions that must be parsed into categories; there is typically no single cue that is necessary or sufficient for defining category membership; speech category exemplars are inherently temporal in nature, thereby limiting side-by-side comparisons; and information for speech categories is spread across time, thus creating segmentation issues. The evidence that exists suggests that these differences matter in understanding SP ( Mirman et al., 2004 ). Unfortunately, the literature available to guide our understanding of the processes, abilities, and constraints of general auditory categorization is quite limited (but see Goudbeek, Smits, Swingley, & Cutler, 2005 ; Goudbeek, Swingley, & Smits, 2009 ; Guenther, Husain, Cohen, & Shinn-Cunningham, 1999 ; Holt & Lotto, 2006 ; Holt et al., 2004 ; Mirman et al., 2004 ; Wade & Holt, 2005a ). Further research in auditory cognition will be needed in order to discover how auditory categorization and learning, in general, advance and limit SP (see Holt & Lotto, 2008 ).

The Adaptive Nature of Speech Categorization

The preceding description of SP as perceptual categorization illustrates some of the complexities in mapping from acoustics to phonemes. The reader may at this point find these complexities to be challenging but not particularly daunting. However, there is an additional level of complexity to phoneme categorization that has kept researchers busy for 60+ years. The problem was summed up well years ago by Repp and Liberman (1987) when they said that “phonetic categories are flexible” (p. 90). That is, phonetic categorization is extremely context sensitive.

One way in which context influences SP is that how speech sounds are labeled changes as a function of both the overall makeup of the stimulus set and the surrounding phonetic context. Even in classic CP tasks, the range of stimulus exemplars presented during the CP task influences the observed position of the category boundary along the stimulus series ( Brady & Darwin, 1978 ; Rosen, 1979 ). The presence of comparison categories available in a task (/r/ and /l/ vs. /r/ and /l/ and /w/, for example) also influences the mapping to speech categories ( Ingvalson, 2008 ). Thus, identical signals may be categorized as different speech sounds, depending on the characteristics of the other signals in the set in which they appear.

Adjacent phonetic context also strongly influences how a particular acoustic speech signal is categorized. For example, a syllable may be perceived as a /ga/ when preceded by the syllable /al/, but as a /da/ when preceded by /ar/ ( Mann, 1980 ). Context dependence in SP is even observed “backward” in time, such that sounds that follow a target speech sound may influence how listeners categorize the target (e.g., Mann & Repp, 1980 ). The rate of speech ( Miller & Liberman, 1979 ; Summerfield, 1981 ) or the acoustic characteristics of voice that produce a preceding sentence also influence how speech is categorized. Ladefoged and Broadbent (1957) demonstrated that they could shift a perceived target word from “bit” to “bet” by changing the acoustics of a preceding carrier phrase (e.g., raising or lowering the F 1 frequencies in the phrase “Please say what this word is”). Even nonspeech contexts that mimic spectral or temporal characteristics of speech signals, but are not perceived as speech, influence speech categorization (e.g., Holt, 2005 ; Lotto & Kluender, 1998 ; Wade & Holt, 2005b ). The fact that nonspeech signals shift the mapping from speech acoustics to perceptual space demonstrates that general auditory processes are involved in relating speech signals and their contexts. Effects of context also occur at multiple levels. SP can be shifted by phonotactic ( Pitt & McQueen, 1998 ; Samuel & Pitt, 2003 ), lexical ( Magnuson, McMurray, Tanenhaus, & Aslin, 2003 ; McClelland & Elman, 1986 ), and semantic ( Borsky, Tuller, & Shapiro, 1998 ; Connine, 1987 ) context, indicating the possibility of an influence of feedback from higher level representations onto speech categorization (see McClelland, Mirman, & Holt, 2006 , and Norris, McQueen, & Cutler, 2000 , for reviews and debate).

So what are the cues that allow listeners to reliably map from speech input to perception of native language categories? This is a difficult question to answer, because, as described above, the “cues” for SP change radically with task and context. This fact has long been acknowledged in the literature and studied as, for example, trading relations—examining how specific acoustics cues “trade” off one another to be more or less dominant in signaling particular speech categories (e.g., Oden & Massaro, 1978 ; Repp, 1982 ). However, our attempts to relate a set of cues as the definitive signals of speech categories ultimately may be misplaced, precisely because of the inherent flexibility of SP. Listeners have exquisite sensitivity to the regularity present in acoustic signals, including speech, and appear to dynamically adjust perception to characteristics of this regularity. Moreover, the nature of this regularity appears to be task dependent; the same speech stimulus set is perceived quite differently as the task varies. This suggests that the “cues” of speech categorization, to some extent, are determined online.

Perhaps the most convincing demonstrations of the flexibility of SP come from studies demonstrating that listeners can maintain veridical perception in the face of radical distortions of the speech signal. The upshot of this work is that there do not appear to be acoustic dimensions or features that are absolutely necessary for SP. Listeners can understand a signal of three sine waves following the center frequencies of the first three formants in so-called sine-wave speech, despite the loss of the harmonic structure and fine-grained acoustic detail ( Remez, Rubin, Pisoni, & Carrell, 1981 ). In this case, the spectral envelope defined by the formant frequencies and the temporal envelope defined by the changes in the overall amplitude of the signal across time are maintained. However, listeners can also maintain veridical SP when the spectral envelope and harmonic structure are distorted, as in the case of noise-vocoded speech ( Davis, Johnsrude, Hervais-Adelman, Taylor, & McGettigan, 2005 ; Hervais-Adelman, Davis, Johnsrude, & Carlyon, 2008 ; Shannon, Zeng, Kamath, Wygonksi, & Ekelid, 1995 ). This distortion involves dividing the signal into a small number of frequency bands and replacing acoustic information in those bands with noise that maintains the slow amplitude changes (typically less than 50 Hz) of the frequency band.

Noise-vocoded speech is similar, in some aspects, to the signal presented to listeners with cochlear implants, particularly in its destruction of frequency resolution and harmonic detail. The amazing perceptual performance of some listeners with cochlear implants is one of the most remarkable demonstrations of SP flexibility. Despite the major differences in the signal conveyed by a cochlear implant versus ordinary auditory processing, some implanted listeners achieve normal-level SP for sounds presented in quiet (e.g., Wilson & Dorman, 2007 ). With some training, normal-hearing listeners can also achieve reasonably good SP performance with severely time-compressed ( Dupoux & Green, 1997 ; Pallier, Sebastian-Gallés, Dupoux, Christophe, & Mehler, 1998 ), spectrally shifted ( Fu & Galvin, 2003 ), or highly synthetic ( Greenspan, Nusbaum, & Pisoni, 1988 ) speech signals. One can even divide the signal into 50-msec chunks, reverse each of these chunks in time (so that the chunks maintain their order, but are each reversals of original chunks), and maintain nearly 100% intelligibility ( Saberi & Perrott, 1999 ). We can maintain normal conversations on phones with bandwidths between 300 and 3000 Hz, suggesting that all of the important information in speech is in this frequency band. But, listeners can achieve nearly 90% correct categorization performance for consonants when the signal is filtered to contain information only below 800 Hz and above 4000 Hz ( Lippmann, 1996 ).

What does this mean for SP? It is common in the literature to see a constellation of acoustic cues associated with a speech category. This makes sense in many cases, because the task is constant, acoustics are relatively unambiguous, context is neutral, and perception is consistent. However, given the flexibility of SP detailed above, it is clear that we cannot hope to provide a definitive a priori description of the acoustic cues and dimensions that will be mapped to particular phonemes. A major challenge for SP researchers is to determine what kinds of processes allow listeners to maintain consistent perceptual performance in the face of varying acoustics and listening conditions.

Speech Communication = Speech Categorization? Perhaps Not in the Wild

Most models of language presume a mapping from acoustics to phoneme, with phonemes mapping to higher level language representations such as words (e.g., McClelland & Elman, 1986 ; Norris et al., 2000 ). However, it is worth keeping in mind that the evidence for speech categorization as a necessary stage of processing in everyday speech communication is not incredibly strong. For example, Broca's aphasia (which is produced by diffuse damage to the left frontal regions of the brain causing severe motor speech deficits while leaving speech recognition intact; Goodglass, Kaplan, & Barresi, 2001 ) may leave listeners impaired on SP tasks like classic syllable identification and discrimination CP ( Blumstein, 1995 ), but this deficit doubly dissociates from impairments on speech recognition (e.g., comprehending words; Miceli, Gainotti, Caltagirone, & Masullo, 1980 ). Thus, the kinds of tasks that require listeners to make explicit use of phonetic information may tap differentially into processes such as attention, executive processing, or working memory in comparison with ordinary speech communication (see Hickok & Poeppel, 2007 ).

Spoken language possesses information and regularity at multiple levels. A single utterance of cupcake , for example, conveys indexical characteristics of the speaker's gender, whether she is familiar to the listener, her emotion, and her sociolinguistic background. It conveys information for the phonetic categories /kΛpkek/. Moreover, we recognize it as a real English word and link it to our semantic knowledge of cupcakes. This brief acoustic signal conveys much potential information.

It is important to remember, however, that the tasks we use to study SP differentially tap into this information. The kinds of identification and discrimination tasks that create canonical CP data highlight phonetic-level processing in identifying and differentiating /kΛ/ versus /gΛ/, whereas a lexical decision task highlights word-level knowledge of “cupcake.” Moreover, listeners make greater use of fine phonetic detail when nonwords outnumber words in a stimulus set, but lexical influences predominate when the task is biased toward word recognition with a greater proportion of words ( Mirman, McClelland, Holt, & Magnuson, 2008 ). In SP research, the kinds of tasks and stimulus sets that we present shape the perceptual processing that we observe.

Everyday speech perception “in the wild” is likely to tap into a broader set of processes than those captured in individual laboratory tasks. It is important to note that this is not to suggest that adult (or even infant or animal) listeners cannot categorize speech; there is abundant evidence that they can. Rather, these data suggest that the cognitive and perceptual processes involved in speech categorization and those in online perception of fluent speech may not be one and the same. Although this possibility is not always acknowledged in research in SP, it is significant to our ultimate understanding of how SP relates to spoken language more generally.

Open Questions: Speech Perception in an Auditory Cognitive Neuroscience Framework

At first blush, the caveat above would seem to diminish the importance of studying and understanding speech categorization. On the contrary, however, the 60+ year history of SP research and its documentation of the multidimensional acoustic cues that covary with speech categories have provided what might be an unparalleled understanding of a natural, complex, ecologically valid perceptual categorization space ( Kluender, 1994 ). Even the perceptual dimensions of faces—another prominent ecologically relevant perceptual category space—have not been studied in this detail. What is more, categorization within the highly multidimensional “speech space” (to compare to the “face space” considered in visual face categorization; Valentine, 1991 ) is completely dependent on experience with a native language. Perhaps no other domain is so rich in its potential for understanding perceptual categorization.

There remains much to learn. Beyond informing our understanding of perceptual categorization and auditory processing, generally speaking, SP extends to many core areas of cognitive science. As categorization, SP offers a platform from which to investigate development ( Kuhl, 2004 ), learning ( Holt, Lotto, & Kluender, 1998 ), adult plasticity ( McClelland, 2001 ), and the prospect of critical periods in human learning ( Flege, 1995 ). The multiple sources of information that covary with the acoustic speech signal provide an opportunity for understanding cross-modal integration ( Massaro, 1998 ) and the role of feedback in language processing ( McClelland et al., 2006 ). Classic issues of cognitive science such as working memory ( Frankish, 2008 ), attention ( Francis & Nusbaum, 2002 ), and the interplay of production and perception ( Galantucci, Fowler, & Turvey, 2006 ) are all pieces of the puzzle in understanding SP. Moreover, the special status of speech as a human communication signal provides an opportunity for even further significant extensions. Research is just beginning to uncover how social cues support speech category acquisition ( Kuhl, 2007 ) and how personality variables may predict the degree to which information in the speech signal is integrated ( Stewart & Ota, 2008 ).

Studying SP informs us also about the general characteristics of auditory perception and cognition. Our understanding of auditory processing has come largely from studies of simple sounds such as tones, clicks, and noise bursts. By contrast, speech is much more like the complex sounds that our auditory systems have evolved to process ( Lewicki, 2002 ; Smith & Lewicki, 2006 ). As such, it is perhaps even better situated to reveal the nature of relatively poorly understood (at least in comparison with vision) processes of auditory perception and cognition. Already, studying speech categorization has provided information about the kinds of processing that the auditory system must accomplish (e.g., Holt, 2005 ). SP, with its complex, multidimensional input space and experience-dependent perceptual space, can reveal characteristics of general auditory processing that are just not apparent with simple acoustic stimuli.

SP is traditionally studied as the mapping from acoustics to phonemes. We have argued here that this process is best understood as one of perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. Whereas the study of SP has long been relegated to the periphery of cognitive science as a “special” perceptual system that can tell us little about general issues of human behavior, the latest research in SP guides us away from the classic way of thinking about SP, to consider categorization rather than identification, the regularity that exists amidst variable speech acoustics as a source of rich information, and the online adaptive nature of speech categorization. These issues place SP in a central position in the cognitive and perceptual sciences.

Acknowledgments

The authors were supported by collaborative awards from the National Science Foundation (BCS0746067) and the National Institutes of Health (R01DC004674).

1 Speech is not conveyed solely by sound. SP research has studied the influence of other important sources of information, especially visual information from the face (for a review, see Colin & Radeau, 2003 ). Some have argued that SP is best considered amodal ( Rosenblum, 2005 ), whereas others have fruitfully used speech as a means of investigating multimodal integration from separate sources of information ( Massaro, 1998 ). Nonetheless, SP is possible when only acoustic information is present (e.g., over a telephone), and since the majority of SP research has focused on the acoustic mapping, we highlight it in this review.

  • Abercrombie D. Elements of general phonetics. Aldine; Chicago: 1967. [ Google Scholar ]
  • Beale JM, Keil FC. Categorical effects in the perception of faces. Cognition. 1995; 57 :217–239. [ PubMed ] [ Google Scholar ]
  • Best CT. The emergence of native-language phonological influences in infants: A perceptual assimilation model. In: Goodman JC, Nusbaum HC, editors. The development of speech perception: The transition from speech sounds to spoken words. MIT Press; Cambridge, MA: 1994. pp. 167–224. [ Google Scholar ]
  • Best CT, Strange W. Effects of phonological and phonetic factors on cross-language perception of approximants. Journal of Phonetics. 1992; 20 :305–330. [ Google Scholar ]
  • Bimler D, Kirkland J. Categorical perception of facial expressions of emotion: Evidence from multidimensional scaling. Cognition & Emotion. 2001; 15 :633–658. [ Google Scholar ]
  • Blumstein SE. The neurobiology of the sound structure of language. In: Gazzaniga MS, editor. The cognitive neurosciences. MIT Press; Cambridge, MA: 1995. pp. 915–929. [ Google Scholar ]
  • Blumstein SE, Stevens KN. Phonetic features and acoustic invariance in speech. Cognition. 1981; 10 :25–32. [ PubMed ] [ Google Scholar ]
  • Borsky S, Tuller B, Shapiro LP. “How to milk a coat”: The effects of semantic and acoustic information on phoneme categorization. Journal of the Acoustical Society of America. 1998; 103 :2670–2676. [ PubMed ] [ Google Scholar ]
  • Bradlow AR, Nygaard LC, Pisoni DB. Effects of talker, rate, and amplitude variation on recognition memory for spoken words. Perception & Psychophysics. 1999; 61 :206–219. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bradlow AR, Pisoni DB. Recognition of spoken words by native and non-native listeners: Talker-, listener- and item-related factors. Journal of the Acoustical Society of America. 1999; 106 :2074–2085. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Brady SA, Darwin CJ. Range effect in the perception of voicing. Journal of the Acoustical Society of America. 1978; 63 :1556–1558. [ PubMed ] [ Google Scholar ]
  • Carney AE, Widin GP, Viemeister NF. Noncategorical perception of stop consonants differing in VOT. Journal of the Acoustical Society of America. 1977; 62 :961–970. [ PubMed ] [ Google Scholar ]
  • Cohen H, Lefebvre C. Handbook of categorization in cognitive science. Elsevier; Amsterdam: 2005. [ Google Scholar ]
  • Colin C, Radeau M. Les illusions McGurk dans la parole: 25 ans de recherches [The McGurk illusions in speech: 25 years of research] L'Année Psychologique. 2003; 103 :497–542. [With summary in English] doi:10.3406/psy.2003.29649. [ Google Scholar ]
  • Connine CM. Constraints on interactive processes in auditory word recognition: The role of sentence context. Journal of Memory & Language. 1987; 16 :527–538. doi:10.1016/0749-596X(87)90138-0. [ Google Scholar ]
  • Cooper FS, Delattre PC, Liberman AM, Borst JM, Gerstman LJ. Some experiments on the perception of synthetic speech sounds. Journal of the Acoustical Society of America. 1952; 24 :597–606. [ Google Scholar ]
  • Damper RI, Harnad SR. Neural network models of categorical perception. Perception & Psychophysics. 2000; 62 :843–867. [ PubMed ] [ Google Scholar ]
  • Davis MH, Johnsrude IS, Hervais-Adelman A, Taylor K, McGettigan C. Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences. Journal of Experimental Psychology: General. 2005; 134 :222–241. [ PubMed ] [ Google Scholar ]
  • Diehl RL, Lindblom B. Explaining the structure of feature and phoneme inventories. In: Greenberg S, Ainsworth W, Popper A, Fay R, editors. Speech processing in the auditory system. Springer; New York: 2004. pp. 101–162. [ Google Scholar ]
  • Diehl RL, Lotto AJ, Holt LL. Speech perception. Annual Review of Psychology. 2004; 55 :149–179. [ PubMed ] [ Google Scholar ]
  • Dupoux E, Green K. Perceptual adjustment to highly compressed speech: Effects of talker and rate changes. Journal of Experimental Psychology: Human Perception & Performance. 1997; 23 :914–927. [ PubMed ] [ Google Scholar ]
  • Fant G. A note on vocal tract size factors and non-uniform F-pattern scalings. Royal Institute of Technology; Stockholm: 1966. Speech Transmission Laboratory Quarterly Project Status Report No. 4, pp. 22-30. [ Google Scholar ]
  • Flege JE. Second language speech learning: Theory, findings, and problems. In: Strange W, editor. Speech perception and linguistic experience: Issues in cross-language research. York Press; Baltimore: 1995. pp. 233–277. [ Google Scholar ]
  • Flege JE, Yeni-Komshian GH, Liu S. Age constraints on second-language acquisition. Journal of Memory & Language. 1999; 41 :78–104. [ Google Scholar ]
  • Francis AL, Baldwin K, Nusbaum HC. Effects of training on attention to acoustic cues. Perception & Psychophysics. 2000; 62 :1668–1680. [ PubMed ] [ Google Scholar ]
  • Francis AL, Nusbaum HC. Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception & Performance. 2002; 28 :349–366. [ PubMed ] [ Google Scholar ]
  • Frankish C. Precategorical acoustic storage and the perception of speech. Journal of Memory & Language. 2008; 58 :815–836. [ Google Scholar ]
  • Fu Q-J, Galvin JJ., III The effects of short-term training for spectrally mismatched noise-band speech. Journal of the Acoustical Society of America. 2003; 113 :1065–1072. [ PubMed ] [ Google Scholar ]
  • Galantucci B, Fowler CA, Turvey MT. The motor theory of speech perception reviewed. Psychonomic Bulletin & Review. 2006; 13 :361–377. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Gay T. Effect of speaking rate on vowel formant movements. Journal of the Acoustical Society of America. 1978; 63 :223–230. [ PubMed ] [ Google Scholar ]
  • Goldinger SD. Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1996; 22 :1166–1183. [ PubMed ] [ Google Scholar ]
  • Goldinger SD. Echoes of echoes? An episodic theory of lexical access. Psychological Review. 1998; 105 :251–279. [ PubMed ] [ Google Scholar ]
  • Goodglass H, Kaplan E, Barresi B. The assessment of aphasia and related disorders. 3rd ed. Lippincott Williams & Wilkins; Philadelphia: 2001. [ Google Scholar ]
  • Goto H. Auditory perception by normal Japanese adults of the sounds “L” and “R.” Neuropsychologia. 1971; 9 :317–323. [ PubMed ] [ Google Scholar ]
  • Goudbeek M, Smits R, Swingley D, Cutler A. Acquiring auditory and phonetic categories. In: Cohen H, Lefebvre C, editors. Handbook of categorization in cognitive science. Elsevier; Amsterdam: 2005. pp. 497–514. [ Google Scholar ]
  • Goudbeek M, Swingley D, Smits R. Supervised and unsupervised learning of multidimensional acoustic categories. Journal of Experimental Psychology: Human Perception & Performance. 2009; 35 :1913–1933. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Greenspan SL, Nusbaum HC, Pisoni DB. Perceptual learning of synthetic speech produced by rule. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1988; 14 :421–433. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Guenther FH, Husain FT, Cohen MA, Shinn-Cunningham BG. Effects of categorization and discrimination training on auditory perceptual space. Journal of the Acoustical Society of America. 1999; 106 :2900–2912. [ PubMed ] [ Google Scholar ]
  • Harnad S. Categorical perception: The groundwork of cognition. Cambridge University Press; New York: 1987. [ Google Scholar ]
  • Harnsberger JD. On the relationship between identification and discrimination of non-native nasal consonants. Journal of the Acoustical Society of America. 2001; 110 :489–503. [ PubMed ] [ Google Scholar ]
  • Hawkins S. Roles and representations of systematic fine phonetic detail in speech understanding. Journal of Phonetics. 2003; 31 :373–405. [ Google Scholar ]
  • Hervais-Adelman A, Davis MH, Johnsrude IS, Carlyon RP. Perceptual learning of noise vocoded words: Effects of feedback and lexicality. Journal of Experimental Psychology: Human Perception & Performance. 2008; 34 :460–474. [ PubMed ] [ Google Scholar ]
  • Hickok G, Poeppel D. The cortical organization of speech processing. Nature Reviews Neuroscience. 2007; 8 :393–402. [ PubMed ] [ Google Scholar ]
  • Hillenbrand J, Getty LA, Clark MJ, Wheeler K. Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America. 1995; 97 :3099–3111. [ PubMed ] [ Google Scholar ]
  • Holt LL. Temporally nonadjacent nonlinguistic sounds affect speech categorization. Psychological Science. 2005; 16 :305–312. [ PubMed ] [ Google Scholar ]
  • Holt LL, Lotto AJ. Cue weighting in auditory categorization: Implications for first and second language acquisition. Journal of the Acoustical Society of America. 2006; 119 :3059–3071. [ PubMed ] [ Google Scholar ]
  • Holt LL, Lotto AJ. Speech perception within an auditory cognitive science framework. Current Directions in Psychological Science. 2008; 17 :42–46. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Holt LL, Lotto AJ, Diehl RL. Auditory discontinuities interact with categorization: Implications for speech perception. Journal of the Acoustical Society of America. 2004; 116 :1763–1773. [ PubMed ] [ Google Scholar ]
  • Holt LL, Lotto AJ, Kluender KR. Incorporating principles of general learning in theories of language acquisition. In: Gruber MC, Higgins D, Olson KS, Wysocki T, editors. Chicago Linguistic Society. Vol. 34. Chicago Linguistic Society; Chicago: 1998. pp. 253–268. The panels. [ Google Scholar ]
  • House AS, Fairbanks G. The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America. 1953; 25 :105–113. [ Google Scholar ]
  • Houtgast T, Steeneken HJM. The modulation transfer function in room acoustics as a predictor of speech intelligibility. Journal of the Acoustical Society of America. 1973; 54 :557. [ Google Scholar ]
  • Ingvalson EM. Predicting F3 usage in /r–l/ perception and production by native Japanese speakers. Carnegie Mellon University; 2008. Unpublished doctoral dissertation. [ Google Scholar ]
  • Iverson P, Kuhl PK. Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. Journal of the Acoustical Society of America. 1995; 97 :553–562. [ PubMed ] [ Google Scholar ]
  • Iverson P, Kuhl PK, Akahane-Yamada R, Diesch E, Tohkura Y, Kettermann A, Siebert C. A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition. 2003; 87 :B47–B57. [ PubMed ] [ Google Scholar ]
  • Jakobson RC, Fant GM, Halle M. Preliminaries to speech analysis: The distinctive features and their correlates. MIT, Acoustics Laboratory; Cambridge, MA: 1952. Tech. Rep. No. 13. [ Google Scholar ]
  • Johnson K. Speech perception without speaker normalization: An exemplar model. In: Johnson K, Mullennix JW, editors. Talker variability in speech processing. Academic Press; San Diego: 1997. pp. 145–165. [ Google Scholar ]
  • Kent RD, Minifie FD. Coarticulation in recent speech production models. Journal of Phonetics. 1977; 5 :115–133. [ Google Scholar ]
  • Klatt DH, Klatt LC. Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America. 1990; 87 :820–857. [ PubMed ] [ Google Scholar ]
  • Kluender KR. Speech perception as a tractable problem in cognitive science. In: Gernsbacher MA, editor. Handbook of psycholinguistics. Academic Press; San Diego: 1994. pp. 173–217. [ Google Scholar ]
  • Kluender KR, Lotto AJ, Holt LL. Contributions of nonhuman animal models to understanding human speech perception. In: Greenberg S, Ainsworth W, editors. Listening to speech: An auditory perspective. Oxford University Press; New York: 2005. pp. 203–220. [ Google Scholar ]
  • Krumhansl CL. Music psychology: Tonal structures in perception and memory. Annual Review of Psychology. 1991; 42 :277–303. [ PubMed ] [ Google Scholar ]
  • Kuhl PK. Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception & Psychophysics. 1991; 50 :93–107. [ PubMed ] [ Google Scholar ]
  • Kuhl PK. Language, mind, and brain: Experience alters perception. In: Gazzaniga MS, editor. The new cognitive neurosciences. 2nd ed. MIT Press; Cambridge, MA: 2000. pp. 99–115. [ Google Scholar ]
  • Kuhl PK. Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience. 2004; 5 :831–843. [ PubMed ] [ Google Scholar ]
  • Kuhl PK. Is speech learning “gated” by the social brain? Developmental Science. 2007; 10 :110–120. [ PubMed ] [ Google Scholar ]
  • Kuhl PK, Conboy BT, Coffey-Corina S, Padden D, Rivera-Gaxiola M, Nelson T. Phonetic learning as a pathway to language: New data and native language magnet theory expanded (NLM-e) Philosophical Transactions of the Royal Society B. 2008; 363 :979–1000. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kuhl PK, Williams KA, Lacerda F, Stevens KN, Lindblom B. Linguistic experience alters phonetic perception in infants by 6 months of age. Science. 1992; 255 :606–608. [ PubMed ] [ Google Scholar ]
  • Ladefoged P, Broadbent DE. Information conveyed by vowels. Journal of the Acoustical Society of America. 1957; 29 :98–104. [ PubMed ] [ Google Scholar ]
  • Ladefoged P, Maddieson I. The sounds of the world's languages. Blackwell; Oxford: 1996. [ Google Scholar ]
  • Lenneberg EH. Biological foundations of language. Wiley; New York: 1967. [ Google Scholar ]
  • Lewicki MS. Efficient coding of natural sounds. Nature Neuroscience. 2002; 5 :356–363. [ PubMed ] [ Google Scholar ]
  • Liberman AM. Some results of research on speech perception. Journal of the Acoustical Society of America. 1957; 29 :117–123. [ Google Scholar ]
  • Liberman AM. Speech: A special code. MIT Press; Cambridge, MA: 1996. [ Google Scholar ]
  • Liberman AM, Cooper FS, Shankweiler DP, Studdert-Kennedy M. Perception of the speech code. Psychological Review. 1967; 74 :431–461. [ PubMed ] [ Google Scholar ]
  • Liberman AM, Delattre PC, Cooper FS. The role of selected stimulus-variables in the perception of the unvoiced stop consonants. American Journal of Psychology. 1952; 65 :497–516. [ PubMed ] [ Google Scholar ]
  • Liberman AM, Harris KS, Hoffman HS, Griffith BC. The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology. 1957; 54 :358–368. [ PubMed ] [ Google Scholar ]
  • Lindblom B. Phonetic universals in vowel systems. In: Ohala JJ, Jaeger JJ, editors. Experimental phonology. Academic Press; Orlando, FL: 1986. pp. 13–44. [ Google Scholar ]
  • Lippmann RP. Accurate consonant perception without mid-frequency speech energy. IEEE Transactions on Speech & Audio Processing. 1996; 4 :66. [ Google Scholar ]
  • Lisker L. “Voicing” in English: A catalogue of acoustic features signaling /b/ versus /p/ in trochees. Language & Speech. 1986; 29 :3–11. [ PubMed ] [ Google Scholar ]
  • Lisker L, Abramson AS. A cross-linguistic study of voicing in initial stops: Acoustical measurements. Word. 1964; 20 :384–422. [ Google Scholar ]
  • Livingston KR, Andrews JK, Harnad S. Categorical perception effects induced by category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1998; 24 :732–753. [ PubMed ] [ Google Scholar ]
  • Logan JS, Lively SE, Pisoni DB. Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America. 1991; 89 :874–886. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lotto AJ, Holt LL. The illusion of the phoneme. In: Billings SJ, Boyle JP, Griffith AM, editors. Chicago Linguistic Society. Vol. 35. Chicago Linguistic Society; Chicago: 2000. pp. 191–204. The panels. [ Google Scholar ]
  • Lotto AJ, Kluender KR. General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification. Perception & Psychophysics. 1998; 60 :602–619. [ PubMed ] [ Google Scholar ]
  • Lotto AJ, Sato M, Diehl RL. Mapping the task for the second language learner: The case of Japanese acquisition of /r/ and /l/ In: Slifka J, Manuel S, Matthies M, editors. From sound to sense: 50+ years of discoveries in speech communication. MIT; Cambridge, MA: 2004. Online conference proceedings. [ Google Scholar ]
  • Maddox WT, Molis MR, Diehl RL. Generalizing a neuropsychological model of visual categorization to auditory categorization of vowels. Perception & Psychophysics. 2002; 64 :584–597. [ PubMed ] [ Google Scholar ]
  • Magnuson JS, McMurray B, Tanenhaus MK, Aslin RN. Lexical effects on compensation for coarticulation: The ghost of Christmash past. Cognitive Science. 2003; 27 :285–298. [ Google Scholar ]
  • Mann VA. Influence of preceding liquid on stop-consonant perception. Perception & Psychophysics. 1980; 28 :407–412. [ PubMed ] [ Google Scholar ]
  • Mann VA, Repp BH. Influence of vocalic context on perception of the [∫]-[s] distinction. Perception & Psychophysics. 1980; 28 :213–228. [ PubMed ] [ Google Scholar ]
  • Massaro DW. Categorical partition: A fuzzy-logical model of categorization behavior. In: Harnad S, editor. Categorical perception: The groundwork of cognition. Cambridge University Press; Cambridge: 1987. pp. 254–283. [ Google Scholar ]
  • Massaro DW. Perceiving talking faces: From speech perception to a behavioral principle. MIT Press, Bradford Books; Cambridge, MA: 1998. [ Google Scholar ]
  • McCandliss BD, Fiez JA, Protopapas A, Conway M, McClelland JL. Success and failure in teaching the [r]–[l] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception. Cognitive, Affective, & Behavioral Neuroscience. 2002; 2 :89–108. [ PubMed ] [ Google Scholar ]
  • McClelland JL. Failures to learn and their remediation: A Hebbian account. In: McClelland JL, Siegler RS, editors. Mechanisms of cognitive development: Behavioral and neural perspectives. Erlbaum; Mahwah, NJ: 2001. pp. 97–122. [ Google Scholar ]
  • McClelland JL, Elman JL. The TRACE model of speech perception. Cognitive Psychology. 1986; 18 :1–86. [ PubMed ] [ Google Scholar ]
  • McClelland JL, Mirman D, Holt LL. Are there interactive processes in speech perception? Trends in Cognitive Sciences. 2006; 10 :363–369. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McMurray B, Aslin RN, Tanenhaus MK, Spivey MJ, Subik D. Gradient sensitivity to within-category variation in words and syllables. Journal of Experimental Psychology: Human Perception & Performance. 2008; 34 :1609–1631. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McMurray B, Tanenhaus MK, Aslin RN. Gradient effects of within-category phonetic variation on lexical access. Cognition. 2002; 86 :B33–B42. [ PubMed ] [ Google Scholar ]
  • Miceli G, Gainotti G, Caltagirone C, Masullo C. Some aspects of phonological impairment in aphasia. Brain & Language. 1980; 11 :159–169. [ PubMed ] [ Google Scholar ]
  • Miller JL, Baer T. Some effects of speaking rate on the production of /b/ and /w/ Journal of the Acoustical Society of America. 1983; 73 :1751–1755. [ PubMed ] [ Google Scholar ]
  • Miller JL, Liberman AM. Some effects of later-occurring information on the perception of stop consonant and semivowel. Perception & Psychophysics. 1979; 25 :457–465. [ PubMed ] [ Google Scholar ]
  • Mirman D, Holt LL, McClelland JL. Categorization and discrimination of nonspeech sounds: Differences between steady-state and rapidly-changing acoustic cues. Journal of the Acoustical Society of America. 2004; 116 :1198–1207. [ PubMed ] [ Google Scholar ]
  • Mirman D, McClelland JL, Holt LL, Magnuson JS. Effects of attention on the strength of lexical influences on speech perception: Behavioral experiments and computational mechanisms. Cognitive Science. 2008; 32 :398–417. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Miyawaki K, Strange W, Verbrugge R, Liberman AM, Jenkins JJ, Fujimura O. An effect of linguistic experience: The discrimination of /r/ and /l/ by native speakers of Japanese and English. Perception & Psychophysics. 1975; 18 :331–340. [ Google Scholar ]
  • Näätänen R, Lehtokoski A, Lennes M, Cheour M, Huotilainen M, Iivonen A, et al. Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature. 1997; 385 :432–434. [ PubMed ] [ Google Scholar ]
  • Näätänen R, Winkler I. The concept of auditory stimulus representation in cognitive neuroscience. Psychological Bulletin. 1999; 125 :826–859. [ PubMed ] [ Google Scholar ]
  • Nearey TM. The segment as a unit of speech perception. Journal of Phonetics. 1990; 18 :347–373. [ Google Scholar ]
  • Nittrouer S. The role of temporal and dynamic signal components in the perception of syllable-final stop voicing by children and adults. Journal of the Acoustical Society of America. 2004; 115 :1777–1790. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Norris D, McQueen JM, Cutler A. Merging information in speech recognition: Feedback is never necessary. Behavioral & Brain Sciences. 2000; 23 :299–370. [ PubMed ] [ Google Scholar ]
  • Nygaard LC, Pisoni DB. Talker-specific learning in speech perception. Perception & Psychophysics. 1998; 60 :355–376. [ PubMed ] [ Google Scholar ]
  • Oden GC, Massaro DW. Integration of featural information in speech perception. Psychological Review. 1978; 85 :172–191. [ PubMed ] [ Google Scholar ]
  • Öhman SEG. Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America. 1966; 39 :151–168. [ PubMed ] [ Google Scholar ]
  • Pallier C, Sebastian-Gallés N, Dupoux E, Christophe A, Mehler J. Perceptual adjustment to time-compressed speech: A cross-linguistic study. Memory & Cognition. 1998; 26 :844–851. [ PubMed ] [ Google Scholar ]
  • Palmieri TJ, Gauthier I. Visual object understanding. Nature Reviews Neuroscience. 2004; 5 :291–303. [ PubMed ] [ Google Scholar ]
  • Peterson GE, Barney HL. Control methods used in a study of the vowels. Journal of the Acoustical Society of America. 1952; 24 :175–184. [ Google Scholar ]
  • Pisoni DB. Auditory and phonetic memory codes in the discrimination of consonants and vowels. Perception & Psychophysics. 1973; 13 :253–260. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Pisoni DB. Identification and discrimination of the relative onset time of two component tones: Implications for voicing perception in stops. Journal of the Acoustical Society of America. 1977; 61 :1352–1361. [ PubMed ] [ Google Scholar ]
  • Pitt MA, McQueen JM. Is compensation for coarticulation mediated by the lexicon? Journal of Memory & Language. 1998; 39 :347–370. [ Google Scholar ]
  • Remez RE, Rubin PE, Pisoni DB, Carrell TD. Speech perception without traditional speech cues. Science. 1981; 212 :947–950. [ PubMed ] [ Google Scholar ]
  • Repp BH. Phonetic trading relations and context effects: New experimental evidence for a speech mode of perception. Psychological Bulletin. 1982; 92 :81–110. [ PubMed ] [ Google Scholar ]
  • Repp BH, Liberman AM. Phonetic category boundaries are flexible. In: Harnad S, editor. Categorical perception: The groundwork of cognition. Cambridge University Press; Cambridge: 1987. pp. 89–112. [ Google Scholar ]
  • Rosen SM. Range and frequency effects in consonant categorization. Journal of Phonetics. 1979; 7 :393–402. [ Google Scholar ]
  • Rosenblum LD. Primacy of multimodal speech perception. In: Pisoni DB, Remez RE, editors. The handbook of speech perception. Blackwell; Oxford: 2005. pp. 51–78. [ Google Scholar ]
  • Saberi K, Perrott DR. Cognitive restoration of reversed speech. Nature. 1999; 398 :760. [ PubMed ] [ Google Scholar ]
  • Samuel AG. The effect of discrimination training on speech perception: Noncategorical perception. Perception & Psychophysics. 1977; 22 :321–330. [ Google Scholar ]
  • Samuel AG. Phonetic prototypes. Perception & Psychophysics. 1982; 31 :307–314. [ PubMed ] [ Google Scholar ]
  • Samuel AG, Pitt MA. Lexical activation (and other factors) can mediate compensation for coarticulation. Journal of Memory & Language. 2003; 48 :416–434. [ Google Scholar ]
  • Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science. 1995; 270 :303–304. [ PubMed ] [ Google Scholar ]
  • Sharma A, Dorman MF. Neurophysiologic correlates of cross-language phonetic perception. Journal of the Acoustical Society of America. 2000; 107 :2697–2703. [ PubMed ] [ Google Scholar ]
  • Slevc LR, Miyake A. Individual differences in second-language proficiency: Does musical ability matter? Psychological Science. 2006; 17 :675–681. [ PubMed ] [ Google Scholar ]
  • Smith EC, Lewicki MS. Efficient auditory coding. Nature. 2006; 439 :978–982. [ PubMed ] [ Google Scholar ]
  • Spivey M. The continuity of mind. Oxford University Press; New York: 2007. [ Google Scholar ]
  • Steinschneider M, Volkov IO, Fishman YI, Oya H, Arezzo JC, Howard MA., III Intracortical responses in human and monkey primary auditory cortex support a temporal processing mechanism for encoding of the voice onset time phonetic parameter. Cerebral Cortex. 2005; 15 :170–186. [ PubMed ] [ Google Scholar ]
  • Stevens KN. Acoustic phonetics. MIT Press; Cambridge, MA: 2000. [ Google Scholar ]
  • Stewart ME, Ota M. Lexical effects on speech perception in individuals with “autistic” traits. Cognition. 2008; 109 :157–162. [ PubMed ] [ Google Scholar ]
  • Studdert-Kennedy M, Liberman AM, Harris KS, Cooper FS. Motor theory of speech perception: A reply to Lane's critical review. Psychological Review. 1970; 77 :234–249. [ PubMed ] [ Google Scholar ]
  • Summerfield Q. Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception & Performance. 1981; 7 :1074–1095. [ PubMed ] [ Google Scholar ]
  • Valentine T. A unified account of the effects of distinctiveness, inversion, and race in face recognition. Quarterly Journal of Experimental Psychology. 1991; 43A :161–204. [ PubMed ] [ Google Scholar ]
  • Volaitis LE, Miller JL. Phonetic prototypes: Influence of place of articulation and speaking rate on the internal structure of voicing categories. Journal of the Acoustical Society of America. 1992; 92 :723–735. [ PubMed ] [ Google Scholar ]
  • Wade T, Holt LL. Incidental categorization of spectrally complex non-invariant auditory stimuli in a computer game task. Journal of the Acoustical Society of America. 2005a; 118 :2618–2633. [ PubMed ] [ Google Scholar ]
  • Wade T, Holt LL. Perceptual effects of preceding non-speech rate on temporal properties of speech categories. Perception & Psychophysics. 2005b; 67 :939–950. [ PubMed ] [ Google Scholar ]
  • Walley AC. Speech perception in childhood. In: Pisoni DB, Remez RE, editors. The handbook of speech perception. Blackwell; Oxford: 2005. pp. 449–468. [ Google Scholar ]
  • Werker JF. Cross-language speech perception: Development change does not involve loss. In: Goodman JC, Nusbaum HC, editors. The development of speech perception: The transition from speech sounds to spoken words. MIT Press; Cambridge, MA: 1994. pp. 93–120. [ Google Scholar ]
  • Werker JF, Tees RC. Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior & Development. 1984; 7 :49–63. [ Google Scholar ]
  • Werker JF, Tees RC. Influences on infant speech processing: Toward a new synthesis. Annual Review of Psychology. 1999; 50 :509–535. [ PubMed ] [ Google Scholar ]
  • Wilson B, Dorman MF. The surprising performance of present-day cochlear implants. IEEE Transactions on Biomedical Engineering. 2007; 54 :969–972. [ PubMed ] [ Google Scholar ]
  • Winkler I, Kujala T, Tiitinen H, Sivonen P, Alku P, Lehtokoski A, et al. Brain responses reveal the learning of foreign language phonemes. Psychophysiology. 1999; 36 :638–642. [ PubMed ] [ Google Scholar ]
  • Wolfe JM, Kluender KR, Levi DM, Bartoshuk LM, Herz RS, Klatzky RL, et al. Sensation and perception. 2nd ed. Sinauer Associates; Sunderland, MA: 2008. [ Google Scholar ]
  • Zhang Y, Kuhl PK, Imada T, Kotani M, Pruitt J. Brain plasticity in behavioral and neuromagnetic measures: A perceptual training study. Journal of the Acoustical Society of America. 2001; 110 :2687. Abstract. [ Google Scholar ]

IMAGES

  1. What is Speech Perception and Theories of Speech Perception

    definition of speech perception

  2. PPT

    definition of speech perception

  3. PPT

    definition of speech perception

  4. PPT

    definition of speech perception

  5. The Perception of Speech: From Sound to Meaning

    definition of speech perception

  6. PPT

    definition of speech perception

VIDEO

  1. Perception

  2. Speech Perception

  3. Definition speech 483A1FC4 6359 4055 A24D EC009D56F773

  4. What is Perception? Perception Explained Definition #perception

  5. Self Definition #shorts #jordanpeterson

  6. Speech Perception and Rational Analysis

COMMENTS

  1. Speech perception

    Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize ...

  2. Speech Perception

    Speech Perception. Andrew J. Lotto, Lori L. Holt, in Neurobiology of Language, 2016 16.1 Introduction. For much of the past 50 years, the main theoretical debate in the scientific study of speech perception has focused on whether the processing of speech sounds relies on neural mechanisms that are specific to speech and language or whether general perceptual/cognitive processes can account for ...

  3. Speech Perception

    Speech perception is conventionally defined as the perceptual and cognitive processes leading to the discrimination, identification, and interpretation of speech sounds. However, to gain a broader understanding of the concept, such processes must be investigated relative to their interaction with long-term knowledge—lexical information in ...

  4. Speech Perception

    Speech Perception. Speech perception refers to the suite of (neural, computational, cognitive) operations that transform auditory input signals into representations that can make contact with internally stored information: the words in a listener's mental lexicon. ... A classic definition of categorization is that it permits treating ...

  5. Speech Perception: Empirical and Theoretical Considerations

    This does not imply that speech perception involves entirely different kinds of objects or processes from ordinary non-linguistic audition, nor does it imply that speech perception is a uniquely human capacity. Nevertheless, speech clearly is special for humans, in that we have special sensitivity for speech sounds. Speech perception promises ...

  6. Speech Perception

    Speech perception as an experimental discipline has a roughly sixty-year history. In a very broad sense, much of the research in this field investigates how listeners map the input acoustic signal onto phonological units. Determining the nature of the mapping is an intriguing issue because the acoustic signal is highly variable, yet perception ...

  7. Speech Perception

    Speech Perception. D.W. Massaro, in International Encyclopedia of the Social & Behavioral Sciences, 2001 The study of speech perception is an interdisciplinary endeavor, which involves a varied set of experimental and theoretical approaches. It includes the fundamental psychophysical question of what properties of spoken language are perceptually meaningful and how these properties signal the ...

  8. Introduction. The perception of speech: from sound to meaning

    Infants' speech perception skills show two types of changes towards the end of the first year of life. First, the ability to perceive phonetic distinctions in a non-native language declines. Second, skills at making phonetic distinctions in the child's own language improve. The paper presents new data showing that both native and non-native ...

  9. The Auditory Cognitive Neuroscience of Speech Perception in Context

    The field of speech perception has radically shifted to embrace these reciprocal benefits, with a methodological toolbox equipped to support the endeavor. ... At the same time, almost by definition, these natural stimuli are not well controlled for various acoustic or linguistic features of interest. Thus, the strongest claims will likely need ...

  10. The Handbook of Speech Perception

    The Handbook of Speech Perception, Second Edition, is a comprehensive and up-to-date survey of technical and theoretical developments in perceptual research on human speech. Offering a variety of perspectives on the perception of spoken language, this volume provides original essays by leading researchers on the major issues and most recent ...

  11. Speech perception: Some new directions in research and theory

    The perception of speech is one of the most fascinating attributes of human behavior; both the auditory periphery and higher centers help define the parameters of sound perception. In this paper some of the fundamental perceptual problems facing speech sciences are described. The paper focuses on several of the new directions speech perception ...

  12. Sketching the Landscape of Speech Perception Research (2000-2020): A

    Speech perception is a vital means of human communication, which involves mapping speech inputs onto various levels of representation, for example phonetic/phonemic categories and words (Samuel, 2011). However, the underlying mechanism of this process is more complex than it appears to be. First, unlike a written sequence of words, the speech ...

  13. Introduction. The perception of speech: from sound to meaning

    Introduction. The perception of speech: from sound to meaning. Spoken language communication is arguably the most important activity that distinguishes humans from non-human species. This paper provides an overview of the review papers that make up this theme issue on the processes underlying speech communication.

  14. Speech Perception

    The hallmark of human speech perception is its perceptual robustness in the presence of variability in the acoustic signal. Normal-hearing listeners adapt very rapidly with little effort to many different sources of variability in the speech signal and listening environment and they are able to compensate for highly degraded acoustic inputs without significant loss of speech intelligibility.

  15. Speech perception as an active cognitive process

    If speech perception itself is mediated by active processes, which require cognitive resources, then the increasing demands on additional cognitive and perceptual processing for older adults becomes more problematic. ... the linguistic context of a phoneme is very much apart of the acoustic definition of a phoneme. While experience during ...

  16. Speech Perception and Comprehension

    The two-way theory of speech perception states that speech perception can lead to the co-activation of motor representations (dorsal route), but is also possible through passive perception using acoustic-auditory sound feature analysis and phonological processing, which directly leads to lexical analysis (ventral route). Conclusion to Sect. 3.1

  17. Speech Perception

    This chapter evaluates the evidence that speech perception is distinctive when compared with non-linguistic auditory perception. It addresses the phenomenology, contents, objects, and mechanisms involved in the perception of spoken language. According to the account it proposes, the capacity to perceive speech in a manner that enables ...

  18. The Motor Theory of Speech Perception

    The Motor Theory of Speech Perception is a proposed explanation of the fundamental relationship between the way speech is produced and the way it is perceived. Associated primarily with the work of Liberman and colleagues, it posited the active participation of the motor system in the perception of speech. Early versions of the theory contained ...

  19. Speech perception and production

    Abstract. Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context-free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however ...

  20. What is SPEECH PERCEPTION? definition of SPEECH PERCEPTION (Psychology

    Psychology Definition of SPEECH PERCEPTION: a psychological process where the listener processes the speech in to a phonological presentation.

  21. The motor theory of speech perception reviewed

    Abstract. More than 50 years after the appearance of the motor theory of speech perception, it is timely to evaluate its three main claims that (1) speech processing is special, (2) perceiving speech is perceiving gestures, and (3) the motor system is recruited for perceiving speech. We argue that to the extent that it can be evaluated, the ...

  22. Perception

    Perception (from Latin perceptio 'gathering, receiving') is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. [2] All perception involves signals that go through the nervous system, which in turn result from physical or chemical stimulation ...

  23. Speech perception as categorization

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes.In this tutorial, we present some of the main challenges ...