Canadian Musician - January/February 2021 | Page 28

COLUMNS

By Hugo Larin

Spatial Audio for Streaming Liv e Events

Part Two : Creating Binaural Audio Content & the Object-Based Mixing Workfl ow

The concept of spatial or binaural audio can sound like something quite geeky and complicated , so let ’ s democratize the subject here and now .

In traditional stereo mixing , audio engineers send audio signals to two channels , left and right , which we all know as a stereo master bus . This is called channel-based mixing . But the problem is that stereo does not reflect the reality of how we hear .
Nowadays , we are becoming increasingly familiar with the concept of object-based mixing , which simply means that each source ( object ) is being positioned in a virtual environment ( a virtual room ) accompanied by their spatial parameters . Object-based audio ( OBA ) represents a breakthrough in live production , with next-generation codecs enabling the mixer to represent the soundfield ( the scene ) as an immersive image instead of just two channels .
The second piece of this puzzle is to take this immersive sound image ( all the audio objects ) and render it to the desired format for playback . The audio information in a binaural rendering gives us what we need to deliver the audio for headphones , while a channel-based rendering gives us what we need for delivering the audio through loudspeakers .
Object-based mixing is far from new ; it ’ s been used in movie productions for many years . The multi-channel audio experience you hear in a cinema is usually composed of multiple audio objects that have been positioned and moved within a virtual environment by a mixing engineer . Unlike binaural , this type of multi-channel rendering is designed for a multi-speaker system using various panning techniques . The speaker arrangements in a movie theatre , or in your home entertainment system , are just various channel-based diffusion system formats .
Think Atmos , Auro 3D , DTS , Dolby Vision , IMAX , and all the other common surround sound formats .
Moving to object-based mixing for live sound engineers is quite simple in its essence . Individual audio tracks that were previously balanced ( panned ) between two stereo channels are now being declared as objects and defined by their position . This workflow makes these mixes completely agnostic of the rendering type or the format arrangement .
A mix is now based around a sound image that can be manipulated with a real-time renderer or exported as a multichannel audio file with a standardized metadata model such as Audio Definition Model ( ADM ). From a portability and deliverability perspective , these exports are ideal , as they aren ’ t limited to a specific speaker arrangement or channel count , and can be rendered in the desired format . For engineers , moving to object-based mixing is truly a gamechanger , opening the door to any format or stream type for a mix .
Binaural audio differs from stereo in that it is a synthesis that virtualizes every object , and delivers the mix over headphones in two conventional audio channels . Let ’ s look at some of the challenges in delivering immersive binaural content .
Picture this : If we stick a microphone in each of your ears and record what you hear , then play it back for another individual , will it sound the same to them as it did to you ? The answer is yes , to some extent . You see , for each of us , our body ( ears , upper torso , etc .) plays an important role in how we perceive sound . So , while another listener will get some sense of localization , it ’ s like making them hear with your ears .
As sound strikes the listener , a number of factors influence how the sound is perceived , including the size and shape of the head and ears , ear canal , nasal cavities , and more . This is what we refer to as HRTF filters ( Head Related Transfer Function ). The most commonly-used tools for these measurements are what we refer to as a Generic Dummy Head . Humans can adapt and compensate for a generic HRTF filter that isn ’ t the perfect signature for their hearing capabilities .
When streaming and recording material , many will rely on generic HRTFs such as Kemar , or the popular HRTF used in 360 / VR pipelines such as the Neumann KU 100 . Ideally , for a mixing engineer wanting to work in binaural audio , the ultimate way of getting the most truthful and reliable monitoring experience , with a far more natural sense of space and direction , is by having their own individual HRTF . Unfortunately , a personalized HRTF is prohibitively expensive for most engineers .
That said , services do exist for creating your own personal HRTF , your aural ID , such as Genelec Aural ID . Starting with a video of your head and shoulder region from your mobile phone camera , the aural ID process builds an accurately-scaled 3D model of your head and upper torso dimensions , and from this delivers your personal HRTF file . A simple import of your personal HRTF file in your binaural renderer or monitoring tool gives you a sonic reproduction adapted specifically to you .
As mentioned earlier , headphones break the link to these natural mechanisms we have acquired over our lifetime , making it harder to localize sounds , since sounds from headphones seem to reside ‘ inside ’ our heads rather than all around us . Your personal unique HRTF and its effect helps to calculate how your head , external ears , and upper body impact and colour the audio you hear .
Hugo Larin is the director of business development at FLUX ::, as well as the principal and head of business development at LS Media . He can be reached at hugolarin @ lsmediapro . ca . www . lsmediapro . ca . www . flux . audio .
28 CANADIAN MUSICIAN