Extrapolation, Localisation, and Enhancement of Spatial Audio Recordings for the Reproduction of Real-World Scenes
Abstract
This thesis extends spatial audio toward the goal of recording real-world scenes and reproducing them in a manner that is perceptually indistinguishable from reality. Technological constraints such as the limited spatial resolution of commercial spherical microphone arrays still inhibit the complete recording, analysis, and reproduction of large sound scenes from being realised; where the listener can perceptually explore the re-synthesised environment. Hence, interest in advancing each facet of spatial audio continues to grow. Especially for techniques supporting immersive spatial sound in augmented and virtual reality applications.
To this end, we develop novel methods to sound field reproduction, source localisation, and signal enhancement by exploiting the underlying spatial characteristics of acoustic environments. Our approach is to use the spherical harmonic solution to the classical wave equation and spatial features such as the relative transfer function through machine learning algorithms, in conjunction with commercially available spherical microphone arrays to realise practical solutions.
We propose a building block for extrapolating and intermediately representing a recorded sound scene for spatial processing before reproduction, denoted as the mixedwave source distribution. The outcomes from this approach are: (i) enabling six degrees-of-freedom reproduction to sound field recordings by a single spherical microphone array, (ii) enhancing the binaural reproduction of recordings by estimating bilateral-ambisonics and rendering with ear-aligned HRTFs.
An advancement to the popular MUSIC source localisation method is realised, where we attain robust direction and distance localisation by exploiting the recording environment's reverberation characteristics in a spatial model. We propose a machine learning algorithm to learn the acoustic path of an interfering noise source, such that the noise's signal can be removed for a spatial method to enhance low (below negative 10 dB) SNR recordings.
In the future, the solutions in this thesis may complement each other in larger spatial audio exercises. For an example of recording a real-world scene and reproducing it binaurally; our spatial signal enhancement can first remove an unwanted noise source from the recording; primary sound sources such as people or instruments can then be localised with our spatial model; these sources and the environment can then be modelled by a mixedwave source distribution, thereby extending the perceptual range of motion for the listener in a six degrees-of-freedom spatial audio reproduction. Furthermore, we achieve this while maintaining the practical feasibility of using commercial spatial microphone technologies.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material