Effects of Sound Level on Audiovisual Speech Perception in Multi-Talker Listening Environment
Yonghee Oh, PhD
Problem statement: In everyday scenarios, speech perception is a common occurrence. Humans are frequently challenged with speech perception situations in noisy environments, where multiple auditory signals compete with one another. The addition of visual cues such as talkers’ faces or lip movements to an auditory signal can help improve the intelligibility of speech in those suboptimal listening environments. This is referred to as audiovisual benefits. This study aimed to delineate the signal-to-noise (SNR) conditions under which visual presentations have their most significant impact on speech perception.
Methods: Twenty young adults with normal hearing were recruited. Participants were presented with spoken sentences in babble noise either in auditory-only or auditory-visual conditions with various SNRs at -7, -5, -3, -1, and 1 dB. The visual stimulus applied in this study was a sphere that varied in size syncing with the amplitude envelope of the target speech signals. Participants were asked to transcribe the sentences they heard.
Results: Average results showed that at the extremes, -7 dB and 1 dB, there was approximately 15 to 17% improvement of the mean word recognition accuracy when a synchronized visual cue was provided. While in the intermediate SNR levels (-1, -3, and -5 dB) there was an overlap between the audio-only and audio-visual conditions.
Conclusion: These results provide strong evidence that visual representations of the speech amplitude envelope can be integrated online in speech perception and the optimal audiovisual benefits can only be observed under specific SNR ranges.