• Login
  • Register

Work for a Member company and need a Member Portal account? Register here with your company email address.

Project

Look to Hear

Sound is a powerful force, capable of enriching our lives and disrupting them in equal measure. Unwanted sonic interference intrudes upon our health, work, and leisure, creating a constant battle for auditory focus. While the remarkable cocktail party effect allows us to selectively attend to desired sounds in noisy environments, this ability is not universal, and even those with acute auditory attention can struggle in particularly challenging situations.

This research envisions a future where we transcend these limitations, augmenting our auditory perception with intelligent, AI-driven systems that endow us with unprecedented sound control. Imagine a world where we can effortlessly quiet distracting noises, focus on specific conversations in crowded spaces, or even expand our hearing capabilities to perceive subtle sonic details. Such a world promises clearer communication in bustling restaurants, enhanced focus in busy work environments, and richer enjoyment of musical performances.

The emergence of audio computers—powerful devices controlled primarily through natural language interaction and capable of sophisticated audio manipulation—hints at this future. These systems, currently in their early stages of development, demonstrate the potential for intuitive and personalized sound control without relying on visual interfaces.

However, this research seeks to investigate a critical question: What unique role can vision play in enhancing these predominantly audio-centric systems? While our ears are remarkably adept at localizing sounds, vision provides valuable contextual information that can refine our auditory perception and guide our attention.

This project explores the integration of visual cues into audio user interfaces, investigating how head and eye tracking, combined with speech commands, can enable more intuitive sound selection, enhance source separation, and facilitate dynamic audio processing. We will explore this through a proof-of-concept system that manipulates pre-recorded and live video, utilizing off-the-shelf microphones, cameras, and motion sensors to simulate the capabilities of future wearable devices

To create a robust foundation for this research, we will draw upon the tools of information theory, control theory, and machine learning. Information theory, as exemplified in recent work on auditory attention decoding, provides a framework for quantifying the information content of sensory signals and understanding the limits of perception. Control theory offers methods for designing systems that respond dynamically to inputs and achieve desired outcomes. Machine learning enables us to create algorithms that can learn from data and adapt to complex auditory environments.

This research will not only develop a practical system but also strive to establish a sound mathematical description of the problem. By combining theoretical insights with experimental investigation, we aim to gain a deeper understanding of the interplay between vision, audition, attention, and control, pushing the boundaries of intuitive and personalized sound manipulation.