In communication acoustics, the communication channel consists of a sound source, a channel (acoustic and/or electric) and finally the receiver: the human auditory system, a complex and intricate system that shapes the way sound is heard. Thus, when developing techniques in communication acoustics, such as in speech, audio and aided hearing, it is important to understand the time–frequency–space resolution of hearing. This book facilitates the reader’s understanding and development of speech and audio techniques based on our knowledge of the auditory perceptual mechanisms by introducing the physical, signal-processing and psychophysical background to communication acoustics. It then provides a detailed explanation of sound technologies where a human listener is involved, including audio and speech techniques, sound quality measurement, hearing aids and audiology.
Key features:
Explains perceptually-based audio: the authors take a detailed but accessible engineering perspective on sound and hearing with a focus on the human place in the audio communications signal chain, from psychoacoustics and audiology to optimizing digital signal processing for human listening. Presents a wide overview of speech, from the human production of speech sounds and basics of phonetics to major speech technologies, recognition and synthesis of speech and methods for speech quality evaluation. Includes MATLAB examples that serve as an excellent basis for the reader’s own investigations into communication acoustics interaction schemes which intuitively combine touch, vision and voice for lifelike interactions.
By:
Ville Pulkki (University of Aalto Finland),
Matti Karjalainen (University of Aalto,
Finland)
Imprint: John Wiley & Sons Inc
Country of Publication: United States
Dimensions:
Height: 254mm,
Width: 180mm,
Spine: 28mm
Weight: 816g
ISBN: 9781118866542
ISBN 10: 1118866541
Pages: 464
Publication Date: 30 January 2015
Audience:
Professional and scholarly
,
Undergraduate
Replaced By: 9781394367276
Format: Hardback
Publisher's Status: Active
About the Authors xix Preface xxi Preface to the Unfinished Manuscript of the Book xxiii Introduction 1 1 How to Study and Develop Communication Acoustics 7 1.1 Domains of Knowledge 7 1.2 Methodology of Research and Development 8 1.3 Systems Approach to Modelling 10 1.4 About the Rest of this Book 12 1.5 Focus of the Book 12 1.6 Intended Audience 13 References 14 2 Physics of Sound 15 2.1 Vibration and Wave Behaviour of Sound 15 2.1.1 From Vibration to Waves 16 2.1.2 A Simple Vibrating System 16 2.1.3 Resonance 18 2.1.4 Complex Mass–Spring Systems 19 2.1.5 Modal Behaviour 20 2.1.6 Waves 21 2.2 Acoustic Measures and Quantities 23 2.2.1 Sound and Voice as Signals 23 2.2.2 Sound Pressure 24 2.2.3 Sound Pressure Level 24 2.2.4 Sound Power 25 2.2.5 Sound Intensity 25 2.2.6 Computation with Amplitude and Level Quantities 25 2.3 Wave Phenomena 26 2.3.1 Spherical Waves 26 2.3.2 Plane Waves and the Wave Field in a Tube 27 2.3.3 Wave Propagation in Solid Materials 29 2.3.4 Reflection, Absorption, and Refraction 31 2.3.5 Scattering and Diffraction 32 2.3.6 Doppler Effect 33 2.4 Sound in Closed Spaces: Acoustics of Rooms and Halls 34 2.4.1 Sound Field in a Room 34 2.4.2 Reverberation 36 2.4.3 Sound Pressure Level in a Room 37 2.4.4 Modal Behaviour of Sound in a Room 38 2.4.5 Computational Modelling of Closed Space Acoustics 39 Summary 41 Further Reading 41 References 41 3 Signal Processing and Signals 43 3.1 Signals 43 3.1.1 Sounds as Signals 43 3.1.2 Typical Signals 45 3.2 Fundamental Concepts of Signal Processing 46 3.2.1 Linear and Time-Invariant Systems 46 3.2.2 Convolution 47 3.2.3 Signal Transforms 48 3.2.4 Fourier Analysis and Synthesis 49 3.2.5 Spectrum Analysis 50 3.2.6 Time–Frequency Representations 53 3.2.7 Filter Banks 54 3.2.8 Auto- and Cross-Correlation 55 3.2.9 Cepstrum 56 3.3 Digital Signal Processing (DSP) 56 3.3.1 Sampling and Signal Conversion 56 3.3.2 Z Transform 57 3.3.3 Filters as LTI Systems 58 3.3.4 Digital Filtering 58 3.3.5 Linear Prediction 59 3.3.6 Adaptive Filtering 62 3.4 Hidden Markov Models 62 3.5 Concepts of Intelligent and Learning Systems 63 Summary 64 Further Reading 64 References 64 4 Electroacoustics and Responses of Audio Systems 67 4.1 Electroacoustics 67 4.1.1 Loudspeakers 67 4.1.2 Microphones 70 4.2 Audio System Responses 71 4.2.1 Measurement of System Response 71 4.2.2 Ideal Reproduction of Sound 72 4.2.3 Impulse Response and Magnitude Response 72 4.2.4 Phase Response 74 4.2.5 Non-Linear Distortion 75 4.2.6 Signal-to-Noise Ratio 76 4.3 Response Equalization 76 Summary 77 Further Reading 78 References 78 5 Human Voice 79 5.1 Speech Production 79 5.1.1 Speech Production Mechanism 80 5.1.2 Vocal Folds and Phonation 80 5.1.3 Vocal and Nasal Tract and Articulation 82 5.1.4 Lip Radiation Measurements 84 5.2 Units and Notation of Speech used in Phonetics 84 5.2.1 Vowels 86 5.2.2 Consonants 86 5.2.3 Prosody and Suprasegmental Features 88 5.3 Modelling of Speech Production 90 5.3.1 Glottal Modelling 92 5.3.2 Vocal Tract Modelling 92 5.3.3 Articulatory Synthesis 94 5.3.4 Formant Synthesis 95 5.4 Singing Voice 96 Summary 96 Further Reading 97 References 97 6 Musical Instruments and Sound Synthesis 99 6.1 Acoustic Instruments 99 6.1.1 Types of Musical Instruments 99 6.1.2 Resonators in Instruments 100 6.1.3 Sources of Excitation 102 6.1.4 Controlling the Frequency of Vibration 103 6.1.5 Combining the Excitation and Resonant Structures 104 6.2 Sound Synthesis in Music 104 6.2.1 Envelope of Sounds 105 6.2.2 Synthesis Methods 106 6.2.3 Synthesis of Plucked String Instruments with a One-Dimensional Physical Model 107 Summary 108 Further Reading 108 References 108 7 Physiology and Anatomy of Hearing 111 7.1 Global Structure of the Ear 111 7.2 External Ear 112 7.3 Middle Ear 113 7.4 Inner Ear 115 7.4.1 Structure of the Cochlea 115 7.4.2 Passive Cochlear Processing 117 7.4.3 Active Function of the Cochlea 119 7.4.4 The Inner Hair Cells 122 7.4.5 Cochlear Non-Linearities 122 7.5 Otoacoustic Emissions 123 7.6 Auditory Nerve 123 7.6.1 Information Transmission using the Firing Rate 124 7.6.2 Phase Locking 126 7.7 Auditory Nervous System 127 7.7.1 Structure of the Auditory Pathway 127 7.7.2 Studying Brain Function 129 7.8 Motivation for Building Computational Models of Hearing 130 Summary 131 Further Reading 131 References 131 8 The Approach and Methodology of Psychoacoustics 133 8.1 Sound Events versus Auditory Events 133 8.2 Psychophysical Functions 135 8.3 Generation of Sound Events 135 8.3.1 Synthesis of Sound Signals 136 8.3.2 Listening Set-up and Conditions 137 8.3.3 Steering Attention to Certain Details of An Auditory Event 137 8.4 Selection of Subjects for Listening Tests 138 8.5 What are We Measuring? 138 8.5.1 Thresholds 138 8.5.2 Scales and Categorization of Percepts 140 8.5.3 Numbering Scales in Listening Tests 141 8.6 Tasks for Subjects 141 8.7 Basic Psychoacoustic Test Methods 142 8.7.1 Method of Constant Stimuli 143 8.7.2 Method of Limits 143 8.7.3 Method of Adjustment 143 8.7.4 Method of Tracking 144 8.7.5 Direct Scaling Methods 144 8.7.6 Adaptive Staircase Methods 144 8.8 Descriptive Sensory Analysis 145 8.8.1 Verbal Elicitation 147 8.8.2 Non-Verbal Elicitation 148 8.8.3 Indirect Elicitation 148 8.9 Psychoacoustic Tests from the Point of View of Statistics 149 Summary 149 Further Reading 150 References 150 9 Basic Function of Hearing 153 9.1 Effective Hearing Area 153 9.1.1 Equal Loudness Curves 155 9.1.2 Sound Level and its Measurement 156 9.2 Spectral Masking 156 9.2.1 Masking by Noise 157 9.2.2 Masking by Pure Tones 159 9.2.3 Masking by Complex Tones 159 9.2.4 Other Masking Phenomena 161 9.3 Temporal Masking 161 9.4 Frequency Selectivity of Hearing 163 9.4.1 Psychoacoustic Tuning Curves 164 9.4.2 ERB Bandwidths 166 9.4.3 Bark, ERB, and Greenwood Scales 167 Summary 169 Further Reading 169 References 169 10 Basic Psychoacoustic Quantities 171 10.1 Pitch 171 10.1.1 Pitch Strength and Frequency Range 171 10.1.2 JND of Pitch 172 10.1.3 Pitch Perception versus Duration of Sound 173 10.1.4 Mel Scale 174 10.1.5 Logarithmic Pitch Scale and Musical Scale 175 10.1.6 Detection Threshold of Pitch Change and Frequency Modulation 176 10.1.7 Pitch of Coloured Noise 176 10.1.8 Repetition Pitch 177 10.1.9 Virtual Pitch 178 10.1.10 Pitch of Non-Harmonic Complex Sounds 178 10.1.11 Pitch Theories 178 10.1.12 Absolute Pitch 179 10.2 Loudness 179 10.2.1 Loudness Determination Experiments 179 10.2.2 Loudness Level 180 10.2.3 Loudness of a Pure Tone 180 10.2.4 Loudness of Broadband Signals 182 10.2.5 Excitation Pattern, Specific Loudness, and Loudness 183 10.2.6 Difference Threshold of Loudness 185 10.2.7 Loudness versus Duration of Sound 187 10.3 Timbre 188 10.3.1 Timbre of Steady-State Sounds 189 10.3.2 Timbre of Sound Including Modulations 189 10.4 Subjective Duration of Sound 189 Summary 191 Further Reading 191 References 191 11 Further Analysis in Hearing 193 11.1 Sharpness 193 11.2 Detection of Modulation and Sound Onset 195 11.2.1 Fluctuation Strength 195 11.2.2 Impulsiveness 197 11.3 Roughness 198 11.4 Tonality 200 11.5 Discrimination of Changes in Signal Magnitude and Phase Spectra 201 11.5.1 Adaptation to the Magnitude Spectrum 201 11.5.2 Perception of Phase and Time Differences 202 11.6 Psychoacoustic Concepts and Music 206 11.6.1 Sensory Consonance and Dissonance 206 11.6.2 Intervals, Scales, and Tuning in Music 208 11.6.3 Rhythm, Tempo, Bar, and Measure 211 11.7 Perceptual Organization of Sound 212 11.7.1 Segregation of Sound Sources 213 11.7.2 Sound Streaming and Auditory Scene Analysis 214 Summary 216 Further Reading 217 References 217 12 Spatial Hearing 219 12.1 Concepts and Definitions for Spatial Hearing 219 12.1.1 Basic Concepts 219 12.1.2 Coordinate Systems for Spatial Hearing 221 12.2 Head-Related Acoustics 222 12.3 Localization Cues 226 12.3.1 Interaural Time Difference 227 12.3.2 Interaural Level Difference 228 12.3.3 Interaural Coherence 231 12.3.4 Cues to Resolve the Direction on the Cone of Confusion 232 12.3.5 Interaction Between Spatial Hearing and Vision 234 12.4 Localization Accuracy 235 12.4.1 Localization in the Horizontal Plane 235 12.4.2 Localization in the Median Plane 236 12.4.3 3D Localization 237 12.4.4 Perception of the Distribution of a Spatially Extended Source 238 12.5 Directional Hearing in Enclosed Spaces 239 12.5.1 Precedence Effect 239 12.5.2 Adaptation to the Room Effect in Localization 240 12.6 Binaural Advantages in Timbre Perception 241 12.6.1 Binaural Detection and Unmasking 241 12.6.2 Binaural Decolouration 243 12.7 Perception of Source Distance 243 12.7.1 Cues for Distance Perception 244 12.7.2 Accuracy of Distance Perception 245 Summary 246 Further Reading 246 References 246 13 Auditory Modelling 249 13.1 Simple Psychoacoustic Modelling with DFT 250 13.1.1 Computation of the Auditory Spectrum through DFT 250 13.2 Filter Bank Models 255 13.2.1 Modelling the Outer and Middle Ear 255 13.2.2 Gammatone Filter Bank and Auditory Nerve Responses 256 13.2.3 Level-Dependent Filter Banks 256 13.2.4 Envelope Detection and Temporal Dynamics 258 13.3 Cochlear Models 260 13.3.1 Basilar Membrane Models 260 13.3.2 Hair-Cell Models 261 13.4 Modelling of Higher-Level Systemic Properties 263 13.4.1 Analysis of Pitch and Periodicity 263 13.4.2 Modelling of Loudness Perception 265 13.5 Models of Spatial Hearing 265 13.5.1 Delay-Network-Based Models of Binaural Hearing 265 13.5.2 Equalization Cancellation and ILD Models 268 13.5.3 Count-Comparison Models 268 13.5.4 Models of Localization in the Median Plane 270 13.6 Matlab Examples 270 13.6.1 Filter-Bank Model with Autocorrelation-Based Pitch Analysis 270 13.6.2 Binaural Filter-Bank Model with Cross-Correlation-Based ITD Analysis 272 Summary 274 Further Reading 274 References 274 14 Sound Reproduction 277 14.1 Need for Sound Reproduction 277 14.2 Audio Content Production 279 14.3 Listening Set-ups 280 14.3.1 Loudspeaker Set-ups 280 14.3.2 Listening Room Acoustics 282 14.3.3 Audiovisual Systems 283 14.3.4 Auditory-Tactile Systems 284 14.4 Recording Techniques 284 14.4.1 Monophonic Techniques 285 14.4.2 Spot Microphone Technique 285 14.4.3 Coincident Microphone Techniques for Two-Channel Stereophony 286 14.4.4 Spaced Microphone Techniques for Two-Channel Stereophony 286 14.4.5 Spaced Microphone Techniques for Multi-Channel Loudspeaker Systems 287 14.4.6 Coincident Recording for Multi-Channel Set-up with Ambisonics 287 14.4.7 Non-Linear Time–Frequency-domain Reproduction of Spatial Sound 290 14.5 Virtual Source Positioning 293 14.5.1 Amplitude Panning 293 14.5.2 Amplitude Panning in a Stereophonic Set-up 294 14.5.3 Amplitude Panning in Horizontal Multi-Channel Loudspeaker Set-ups 295 14.5.4 3D Amplitude Panning 295 14.5.5 Virtual Source Positioning using Ambisonics 296 14.5.6 Wave Field Synthesis 296 14.5.7 Time Delay Panning 297 14.5.8 Synthesizing the Width of Virtual Sources 298 14.6 Binaural Techniques 298 14.6.1 Listening to Binaural Recordings with Headphones 299 14.6.2 HRTF Processing for Headphone Listening 299 14.6.3 Virtual Listening of Loudspeakers with Headphones 300 14.6.4 Headphone Listening to Two-Channel Stereophonic Content 301 14.6.5 Binaural Techniques with Cross-Talk-Cancelled Loudspeakers 301 14.7 Digital Audio Effects 302 14.8 Reverberators 303 14.8.1 Using Room Impulse Responses in Reverberators 304 14.8.2 DSP Structures for Reverberators 305 Summary 306 Further Reading and Available Toolboxes 306 References 307 15 Time–Frequency-domain Processing and Coding of Audio 311 15.1 Basic Techniques and Concepts for Time–Frequency Processing 311 15.1.1 Frame-Based Processing 311 15.1.2 Downsampled Filter-Bank Processing 313 15.1.3 Modulation with Tone Sequences 315 15.1.4 Aliasing 316 15.2 Time–Frequency Transforms 317 15.2.1 Short-Time Fourier Transform (STFT) 318 15.2.2 Alias-Free STFT 320 15.2.3 Modified Discrete Cosine Transform (MDCT) 321 15.2.4 Pseudo-Quadrature Mirror Filter (PQMF) Bank 323 15.2.5 Complex QMF 323 15.2.6 Sub-Sub-Band Filtering of the Complex QMF Bands 325 15.2.7 Stochastic Measures of Time–Frequency Signals 325 15.2.8 Decorrelation 327 15.3 Time–Frequency-Domain Audio-Processing Techniques 328 15.3.1 Masking-Based Audio Coding 328 15.3.2 Audio Coding with Spectral Band Replication 328 15.3.3 Parametric Stereo, MPEG Surround, and Spatial Audio Object Coding 329 15.3.4 Stereo Upmixing and Enhancement for Loudspeakers and Headphones 330 Summary 332 Further Reading 332 References 332 16 Speech Technologies 335 16.1 Speech Coding 336 16.2 Text-to-Speech Synthesis 338 16.2.1 Early Knowledge-Based Text-to-Speech (TTS) Synthesis 339 16.2.2 Unit-Selection Synthesis 340 16.2.3 Statistical Parametric Synthesis 342 16.3 Speech Recognition 345 Summary 346 Further Reading 347 References 347 17 Sound Quality 349 17.1 Historical Background of Sound Quality 350 17.2 The Many Facets of Sound Quality 351 17.3 Systemic Framework for Sound Quality 352 17.4 Subjective Sound Quality Measurement 353 17.4.1 Mean Opinion Score 353 17.4.2 MUSHRA 354 17.5 Audio Quality 356 17.5.1 Monaural Quality 356 17.5.2 Perceptual Measures and Models for Monaural Audio Quality 356 17.5.3 Spatial Audio Quality 359 17.6 Quality of Speech Communication 360 17.6.1 Subjective Methods and Measures 361 17.6.2 Objective Methods and Measures 362 17.7 Measuring Speech Understandability with the Modulation Transfer Function 363 17.7.1 Modulation Transfer Function 363 17.7.2 Speech Transmission Index STI 367 17.7.3 STI and Speech Intelligibility 368 17.7.4 Practical Measurement of STI 369 17.8 Objective Speech Quality Measurement for Telecommunication 370 17.8.1 General Speech Quality Measurement Techniques 371 17.8.2 Measurement of the Perceptual Effect of Background Noise 372 17.8.3 Measurement of the Perceptual Effect of Echoes 373 17.9 Sound Quality in Auditoria and Concert Halls 374 17.9.1 Subjective Measures 374 17.9.2 Objective Measures 375 17.9.3 Percentage of Consonant Loss 377 17.10 Noise Quality 377 17.11 Product Sound Quality 378 Summary 380 Further Reading 380 References 380 18 Other Audio Applications 383 18.1 Virtual Reality and Game Audio Engines 383 18.2 Sonic Interaction Design 386 18.3 Computational Auditory Scene Analysis, CASA 387 18.4 Music Information Retrieval 387 18.5 Miscellaneous Applications 389 Summary 390 Further Reading 390 References 390 19 Technical Audiology 393 19.1 Hearing Impairments and Disabilities 393 19.1.1 Key Terminology 394 19.1.2 Classification of Hearing Impairments 395 19.1.3 Causes for Hearing Impairments 396 19.2 Symptoms and Consequences of Hearing Impairments 396 19.2.1 Hearing Threshold Shift 397 19.2.2 Distortion and Decrease in Discrimination 398 19.2.3 Speech Communication Problems 400 19.2.4 Tinnitus 400 19.3 The Effect of Noise on Hearing 401 19.3.1 Noise 401 19.3.2 Formation of Noise-Induced Hearing Loss 402 19.3.3 Temporary Threshold Shift 402 19.3.4 Hearing Protection 404 19.4 Audiometry 405 19.4.1 Pure-Tone Audiometry 405 19.4.2 Bone-Conduction Audiometry 406 19.4.3 Speech Audiometry 406 19.4.4 Sound-Field Audiometry 407 19.4.5 Tympanometry 407 19.4.6 Otoacoustic Emissions 408 19.4.7 Neural Responses 409 19.5 Hearing Aids 409 19.5.1 Types of Hearing Aids 409 19.5.2 Signal Processing in Hearing Aids 410 19.5.3 Transmission Systems and Assistive Listening Devices 414 19.6 Implantable Hearing Solutions 414 19.6.1 Cochlear Implants 414 19.6.2 Electric-Acoustic Stimulation 416 19.6.3 Bone-Anchored Hearing Aids 416 19.6.4 Middle-Ear Implants 416 Summary 416 Further Reading 417 References 417 Index 419
Ville Pulkki, Department of Signal Processing and Acoustics, University of Aalto, Finland Professor Pulkki is currently affiliated to the Department of Signal Processing and Acoustics at the University of Aalto, Finland where he leads the Spatial Sound Research Group. He is an AES Fellow. Professor Pulkki was General Chair of the AES 45th International Conference on Applications of Time-Frequency Processing in Audio (2012), and he is Associate Technical Editor of the Journal of the Audio Engineering Society. Matti Karjalainen, Department of Signal Processing and Acoustics, University of Aalto, Finland Professor Karjalainen was previously Head of the Laboratory of Acoustics and Audio Signal Processing at Helsinki University of Technology which now forms part of the Department of Signal Processing and Acoustics at the University of Aalto, Finland. Professor Karjalainen had long term cooperation with companies such as Nokia and loudspeaker manufacturer Genelec. For his scientific and educational merits in audio signal processing, he received the Audio Engineering Society Fellowship in 1999, the AES Silver Medal in 2006, and the IEEE Fellowship in 2009. He published over 350 scientific and technical publications. Professor Karjalainen passed away in May 2010.