FMOD Engine User Manual 2.03

5. Core API Spatializing Sounds

This chapter will introduce you to using 3D sound with the Core API. With it, you can easily implement interactive 3D audio and have access to features such as 5.1 or 7.1 speaker output, automatic attenuation, doppler, and more advanced psychoacoustic 3D audio techniques.

For information specific to the Studio API and FMOD Studio events, see the Studio API 3D Events chapter.

5.0.1 Speaker and Output Modes

You do not need to set the speaker mode for FMOD. Any sound using FMOD_3D is automatically positioned in a surround speaker system. As long as the player's sound card supports it, and their operating system speaker settings are correct, their audio device will be able to output the sound in 5.1 or 7.1.

5.0.2 Loading Sounds as 3D

When loading a sound or sound bank, the sound must be created with System::createSound or System::createStream using the FMOD_3D flag. ie.

result = system->createSound("../media/drumloop.wav", FMOD_3D, 0, &sound);
if (result != FMOD_OK)
{
    HandleError(result);
}

It is generally best not to try and switch between 3D and 2D at all, if you want though, you can change the Sound or Channel's mode to FMOD_3D_HEADRELATIVE at runtime which places the sound always relative to the listener, effectively sounding 2D as it will always follow the listener as the listener moves around.

5.0.3 Distance Models

A major part of spatialization is attenuating the volume of a channel based on its distance from the listener. The FMOD Engine supports multiple different models for how this should occur.

Inverse

This is the default FMOD 3D distance model. All sounds naturally attenuate (fade out) in the real world using an inverse distance attenuation. The flag to set to this mode is FMOD_3D_INVERSEROLLOFF but if you're loading a sound you don't need to set this because it is the default. It is more for the purpose or resetting the mode back to the original if you set it to FMOD_3D_LINEARROLLOFF at some later stage.

When FMOD uses this model, 'mindistance' of a Sound / Channel, is the distance that the sound starts to attenuate from. This can simulate the sound being smaller or larger. By default, for every doubling of this mindistance, the sound volume will halve. This roll-off rate can be changed with System::set3DSettings.

As an example of relative sound sizes, we can compare a bee and a jumbo jet. At only a meter or 2 away from a bee we will probably not hear it any more. In contrast, a jet will be heard from hundreds of meters away. In this case we might set the bee's mindistance to 0.1 meters. After a few meters it should fall silent. The jumbo jet's mindistance could be set to 50 meters. This could take many hundreds of meters of distance between listener and sound before it falls silent. In this case we now have a more realistic representation of the loudness of the sound, even though each wave file has a fully normalized 16bit waveform within. (ie if you played them in 2D they would both be the same volume).

The 'maxdistance' does not affect the rate of roll-off, it simply means the distance where the sound stops attenuating. Don't set the maxdistance to a low number unless you want it to artificially stop attenuating. This is usually not wanted. Leave it at its default of 10000.0.

Inverse Tapered

This is a combination of the inverse and linear-square roll-off models. At shorter distances where inverse roll-off would provide greater attenuation, it functions as inverse roll-off mode; then at greater distances where linear-square roll-off mode would provide greater attenuation, it uses that roll-off mode instead. For this roll-off mode, distance values greater than mindistance are scaled according to the rolloffscale. Inverse tapered roll-off mode approximates realistic behavior while still guaranteeing the sound attenuates to silence at maxdistance.

Linear and Linear Squared

These are alternative distance models, also available in the FMOD Engine. To use them, add the FMOD_3D_LINEARROLLOFF or FMOD_3D_LINEARSQUAREROLLOFF flag to System::createSound or Sound::setMode / ChannelControl::setMode. While less realistic, these models are more game programmer-friendly, as they result in the attenuation fading linearly between 'mindistance' and 'maxdistance'. In these modes, the mindistance is the same as it is in the inverse model (i.e.: the minimum distance before the sound starts to attenuate), but the maxdistance is the point where the volume = 0 due to 3D distance. The attenuation in-between those two points is linear or linear squared, depending on which model is selected.

Custom

Custom roll-off allows a FMOD_3D_ROLLOFF_CALLBACK to be set that allows you to calculate how the volume roll-off happens. If a callback is not convenient, the Core API also allows an array of points that are linearly interpolated between, to denote a 'curve', using ChannelControl::set3DCustomRolloff.

5.0.4 Speaker Channel Formats

If the player's sound card supports it, any sound using FMOD_3D is automatically positioned in a surround speaker system, so you do not need to set the speaker mode for FMOD. Provided the player has correctly set their operating system's speaker settings, their audio device will be able to output the audio in 5.1 or 7.1.

5.0.5 Advanced Global 3D Settings

There are three configurable settings in the FMOD Engine that affect all 3D sounds. These are:

Doppler factor. This is used to exaggerate or minimize the doppler effect.
Distance factor. This multiplies the langth of the distance units used by the FMOD Engine, allowing you to use distance units that match those used in your game (e.g.: centimeters, feet, meters, yards).
Roll-off scale. This affects 3D sounds that use roll-off modes other than FMOD_3D_CUSTOMROLLOFF, and controls how quickly such sounds attenuate as distance increases.

All three settings can be set with System::set3DSettings. In most games, there is no need to set them.

5.0.6 Advanced 3D Techniques

While spatialization is often enough on its own, some games benefit from more complex 3D behavior. Here's a few ideas.

Occlusion. A Sound's underlying Channels or ChannelGroups can have lowpass filtering applied to them to simulate sound going through walls or being muffled by large objects.
3D Reverb Zones for reverb panning. For more information, see the 3D Reverbs section of the Advanced Core API Topics chapter. Reverb can also be occluded to not go through walls or objects.
Polygon based geometry occlusion. Add polygon data to FMOD's geometry engine, and FMOD will automatically occlude sound in realtime using raycasting. See more about this in the 3D Polygon based geometry occlusion section of the Advanced Core API Topics chapter.
Morphing between 2D and 3D with multi-channel audio formats. Channels can be a point source, or be morphed by the user into 2D audio, which is great for distance based envelopment. The closer a Channel is, the more it can spread into the other speakers, rather than flipping from one side to the other as it pans from one side to the other. See ChannelControl::set3DLevel for the function that lets the user change this mix.
Stereo and multi-channel audio formats can be used for 3D audio. Typically, a mono audio format is used for 3D audio, but multi-channel audio formats can be used to give extra impact. By default, multi-channel sample data is collapsed into a mono point source. To 'spread' the multiple channels use ChannelControl::set3DSpread. This can give a more spatial effect for a sound that is coming from a certain direction. A subtle spread of sound in the distance may give the impression of being more effectively spatialized as if it were reflecting off nearby surfaces, or being 'big' and emitting different parts of the sound in different directions.
Spatialization plug-in support. 3rd party VR audio plug-ins can be used to give more realistic panning over headphones.

5.1 Controlling a Spatializer DSP

Controlling a spatializer DSP using the Core API requires setting the data parameter associated with 3D attributes for the Channel. This is a data parameter of type FMOD_DSP_PARAMETER_DATA_TYPE_3DATTRIBUTES or FMOD_DSP_PARAMETER_DATA_TYPE_3DATTRIBUTES_MULTI. When using the Core API System, you must set this DSP parameter explicitly. To do this, use ChannelControl::set3DAttributes with the handle that was returned from System::playSound for the channel. If 3D positioning of a ChannelGroup instead, set the ChannelGroup to be 3D once with ChannelControl::setMode, then call ChannelControl::set3DAttributes for that channel group.

Because the effect of a spatializer DSP depends on the position of the channel or channel group relative to the listener, it is also necessary to update the 3D attributes of the listener once per frame with System::set3DListenerAttributes.

Call System::update once per frame so the 3D calculations can update based on the positions and other attributes.

This method works with our pan DSP, the object panner DSP, the Resonance Source and Soundfield spatializers, and any other third party plug-ins that make use of the FMOD spatializers.

Attributes must use a coordinate system with the positive Y axis being up and the positive X axis being right (left-handed coordinate system). FMOD converts passed in coordinates from right-handed to left-handed for the plug-in if the System is initialized with the FMOD_INIT_3D_RIGHTHANDED flag.

The absolute data for the FMOD_DSP_PARAMETER_3DATTRIBUTES is straightforward, however the relative part requires some work to calculate.

/*
    This code supposes the availability of a maths library with basic support for 3D and 4D vectors and 4x4 matrices:

    // 3D vector
    class Vec3f
    {
    public:
        float x, y, z;

        // Initialize x, y & z from the corresponding elements of FMOD_VECTOR
        Vec3f(const FMOD_VECTOR &v);
    };

    // 4D vector
    class Vec4f
    {
    public:
        float x, y, z, w;

        Vec4f(const Vec3f &v, float w);

        // Initialize x, y & z from the corresponding elements of FMOD_VECTOR
        Vec4f(const FMOD_VECTOR &v, float w);

        // Copy x, y & z to the corresponding elements of FMOD_VECTOR
        void toFMOD(FMOD_VECTOR &v);
    };

    // 4x4 matrix
    class Matrix44f
    {
    public:
        Vec4f X, Y, Z, W;
    };

    // 3D Vector cross product
    Vec3f crossProduct(const Vec3f &a, const Vec3f &b);

    // 4D Vector addition
    Vec4f operator+(const Vec4f &a, const Vec4f &b);

    // 4D Vector subtraction
    Vec4f operator-(const Vec4f& a, const Vec4f& b);

    // Matrix multiplication m * v
    Vec4f operator*(const Matrix44f &m, const Vec4f &v);

    // 4x4 Matrix inverse
    Matrix44f inverse(const Matrix44f &m);
*/

void calculatePannerAttributes(const FMOD_3D_ATTRIBUTES &listenerAttributes, const FMOD_3D_ATTRIBUTES &emitterAttributes, FMOD_DSP_PARAMETER_3DATTRIBUTES &pannerAttributes)
{
    // pannerAttributes.relative is the emitter position and orientation transformed into the listener's space:

    // First we need the 3D transformation for the listener.
    Vec3f right = crossProduct(listenerAttributes.up, listenerAttributes.forward);

    Matrix44f listenerTransform;
    listenerTransform.X = Vec4f(right, 0.0f);
    listenerTransform.Y = Vec4f(listenerAttributes.up, 0.0f);
    listenerTransform.Z = Vec4f(listenerAttributes.forward, 0.0f);
    listenerTransform.W = Vec4f(listenerAttributes.position, 1.0f);

    // Now we use the inverse of the listener's 3D transformation to transform the emitter attributes into the listener's space:
    Matrix44f invListenerTransform = inverse(listenerTransform);

    Vec4f position = invListenerTransform * Vec4f(emitterAttributes.position, 1.0f);

    // Setting the w component of the 4D vector to zero means the matrix multiplication will only rotate the vector.
    Vec4f forward = invListenerTransform * Vec4f(emitterAttributes.forward, 0.0f);
    Vec4f up = invListenerTransform * Vec4f(emitterAttributes.up, 0.0f);
    Vec4f velocity = invListenerTransform * (Vec4f(emitterAttributes.velocity, 0.0f) - Vec4f(listenerAttributes.velocity, 0.0f));

    // We are now done computing the relative attributes.
    position.toFMOD(pannerAttributes.relative.position);
    forward.toFMOD(pannerAttributes.relative.forward);
    up.toFMOD(pannerAttributes.relative.up);
    velocity.toFMOD(pannerAttributes.relative.velocity);

    // pannerAttributes.absolute is simply the emitter position and orientation:
    pannerAttributes.absolute = emitterAttributes;
}

When using FMOD_DSP_PARAMETER_3DATTRIBUTES_MULTI, you must call calculatePannerAttributes for each listener, filling in the appropriate listener attributes.

Set this on the DSP by using DSP::setParameterData with the index of the FMOD_DSP_PARAMETER_DATA_TYPE_3DATTRIBUTES. You will need to check with the author of the DSP for the structure index. Pass the data into the DSP using DSP::setParameterData with the index of the 3D Attributes, FMOD_DSP_PARAMETER_DATA_TYPE_3DATTRIBUTES or FMOD_DSP_PARAMETER_DATA_TYPE_3DATTRIBUTES_MULTI.

The following is an example of a typical game's audio loop that uses System::update to update the 3D attributes of channels and listeners, as well as the FMOD channel management system, once per frame.

do
{
    UpdateGame();       // here the game is updated and the sources would be moved with channel->set3DAttibutes.

    system->set3DListenerAttributes(0, &listener_pos, &listener_vel, &listener_forward, &listener_up);     // update 'ears'

    system->update();   // needed to update 3d engine, once per frame.

} while (gamerunning);

Most games usually take the position, velocity and orientation from the camera's vectors and matrix.

5.1.1 Velocity

Velocity is only required if you want doppler effects. If you do not, you can pass 0 or NULL to both System::set3DListenerAttributes and ChannelControl::set3DAttributes for the velocity parameter, and no doppler effect will be heard.

It is important that the velocity passed to the FMOD Engine is in meters per second and not meters per frame. To get the correct velocity vector, use a method such as calculating it using vectors from your game's physics code. Don't just subtract the last frame's position from the current position, as this is affected by framerate, meaning that the higher the framerate the smaller the position deltas and thus the smaller the doppler effect, which is incorrect.

If the only way you can get the velocity is to subtract this and last frame's position vectors, then remember to time adjust them from meters per frame back up to meters per second. This is done simply by scaling the difference vector obtained by subtracting the two position vectors, by one over the frame time delta.

Here is an example.

velx = (posx-lastposx) * 1000 / timedelta;
velz = (posy-lastposy) * 1000 / timedelta;
velz = (posz-lastposz) * 1000 / timedelta;

timedelta is the time since the last frame in milliseconds. This can be obtained with functions such as timeGetTime(). So at 60fps, the timedelta would be 16.67ms. if the source moved 0.1 meters in this time, the actual velocity in meters per second would be:

vel = 0.1 * 1000 / 16.67 = 6 meters per second.

Similarly, if we only have half the framerate of 30fps, then subtracting position deltas will gives us twice the distance that it would at 60fps (so it would have moved 0.2 meters this time).

vel = 0.2 * 1000 / 33.33 = 6 meters per second.

5.1.2 Orientation and left-handed vs right-handed coordinate systems

Getting the correct orientation set up is essential if you want the source to move around you in 3D space.

By default, FMOD uses a left-handed coordinate system. If you are using a right-handed coordinate system then FMOD must be initialized by passing FMOD_INIT_3D_RIGHTHANDED to System::init. In either case FMOD requires that the positive Y axis is up and the positive X axis is right, if your coordinate system uses a different convention then you must rotate your vectors into FMOD's space before passing them to FMOD.

Note for plug-in writers: FMOD always uses a left-handed coordinate system when passing 3D data to plug-ins. This coordinate system is fixed to use +X = right, +Y = up, +Z = forward. When the system is initialised to use right-handed coordinates FMOD will flip the Z component of vectors before passing them to plug-ins.

5.1.3 Split Screen / Multiple Listeners

Some games have a split screen mode, where different sections of the screen represent cameras in different locations. As the listener is almost always positioned in the same location as the camera, this means that the FMOD Engine has to be able to handle more than one listener at once. This is handled by using System::set3DNumListeners and System::set3DListenerAttributes.

For example, if you have two player split screen, System::set3DNumListeners would be set to two. When updating the positions of the listener, for each 'camera' or 'listener' call System::set3DListenerAttributes with 0 as the listener number of the first camera, and 1 for the listener number of the second camera.

When using multiple listeners in the Core API, 3D Channels have the following behavior:

All doppler is disabled. This is because one listener might be going towards the sound, and another listener might be going away from the sound. To avoid confusion, the doppler is simply turned off.
All audio is mono. If to one listener the sound should be coming out of the left speaker, and to another listener it should be coming out of the right speaker, there will be a conflict, and more confusion, so all sounds are simply panned to the middle. This removes confusion.
Each sound is played only once as it would with a single player game, instead of a different instance of the sound being played for each listener. This saves voice and CPU resources. The sound's effective audibility is determined by the closest listener to the sound, which makes sense, as the sound should be the loudest to the nearest listener, and more distant listeners would not have any impact on the volume.

5.1.4 Stereo and Multi-channel Audio

A stereo sound, when played as 3d, is split into two mono voices internally which are separately 3D positionable. Multi-channel audio formats are also supported, so an eight channel sound (for example) allocates 8 mono voices internally in FMOD. To rotate the left and right part of a stereo 3D sound in 3D space, use the ChannelControl::set3DSpread function. By default, the subchannels position themselves in the same place, therefore sounding 'mono'.

5.2 Spatial Audio

Historically, audio spatialization (the process of taking an audio file and making it sound "in the world") has been all about positioning sound in speakers arranged on a horizontal plane. This arrangement is often seen in the form of 5.1 or 7.1 surround. With the advancement of VR technology, however, more emphasis has been put on making sound as immersive as the visuals. This is achieved by more advanced processing of the audio signals for the traditional horizontal plane as well as the introduction of height spatialization. This has given the rise of the term "spatial audio" which focuses on this more realistic approach to spatialization.

Within FMOD there are several ways you can achieve a more immersive spatialization experience, depending on your target platform some may or may not apply. The following sections outline a few general approaches with specific implementation details contained within.

5.2.1 Channel based approach

The most traditional way to approach spatialization is by panning signal into virtual speakers, so with the introduction of 7.1.4 (7 horizontal plane speakers, 1 sub-woofer, 4 roof speakers) you can do just this.

Set your FMOD::System to the appropriate speaker mode by calling System::setSoftwareFormat(0, FMOD_SPEAKERMODE_7POINT1POINT4, 0).
Select an output mode capable of rendering 7.1.4 content System::setOutput(FMOD_OUTPUTTYPE_WINSONIC).

You can now System::createSound and System::playSound content authored as 7.1.4. If you have the necessary sound system setup (i.e. Dolby Atmos) you will hear the sound play back including the ceiling speakers. If you have a headphone based setup (i.e. Windows Sonic for Headphones or Dolby Atmos for Headphones) you will hear an approximation of ceiling speakers.

To take an existing horizontal plane signal and push it into the ceiling plane you can create an FMOD spatializer and adjust the height controls.

Create the spatializer with System::createDSPByType(FMOD_DSP_TYPE_PAN).
Add it to an Channel or ChannelGroup with ChannelControl::addDSP.
Control the height by setting FMOD_DSP_PAN_2D_HEIGHT_BLEND via DSP::setParameterFloat.

Not only will this let you blend to the 0.0.4 ceiling speakers by setting the value between 0.0 and 1.0, it will also let you blend from the 0.0.4 ceiling speakers to the ground plane 7.1.0 by setting the value between 0.0 and -1.0.

The FMOD_OUTPUTTYPE_WINSONIC plug-in supports 7.1.4 output available on Windows, UWP, Xbox One and Xbox Series X|S. Also, the FMOD_OUTPUTTYPE_PHASE plug-in supports 7.1.4 output for iOS devices. Other platforms will fold 7.1.4 down to 7.1.

5.2.2 Object based approach

To get more discrete spatialization of an audio signal you can use the FMOD object spatializer, so named because the audio signal is packaged with the spatialization information (position, orientation, etc) and sent to an object mixer. Often used to highlight important sounds with strong localization to add interest to a scene, usually used in-conjunction with the channel based approach, be that 7.1.4 or even simply 5.1 / 7.1.

Set your FMOD::System to an object ready output plug-in by calling System::setOutput(FMOD_OUTPUTTYPE_WINSONIC) or System::setOutput(FMOD_OUTPUTTYPE_AUDIO3D) or System::setOutput(FMOD_OUTPUTTYPE_AUDIOOUT) or System::setOutput(FMOD_OUTPUTTYPE_PHASE).
Create an object spatializer with System::createDSPByType(FMOD_DSP_TYPE_OBJECTPAN).
Provide 3D position information with FMOD_DSP_OBJECTPAN_3D_POSITION via DSP::setParameterData.

There is no limit to how many FMOD_DSP_TYPE_OBJECTPAN DSPs you can create, however there is a limit to how many can be processed at a time. This limit is flexible, and varies from platform to platform. When there are more object spatializers in use than there are available resources for, FMOD virtualizes the least significant sounds by processing with a traditional channel based mix.

An important consideration, when using object spatializers, is signal flow. Unlike most DSPs, after the signal enters an object spatializer DSP it is sent out to the object mixer. Regardless of whether the object mixer is a software library or a physical piece of hardware, the result is that you no longer have access to that signal. Any processing you would like to perform on that signal must therefore be accomplished before it enters the object spatializer DSP. Despite this, to assist mixing, the object spatializer automatically applies any "downstream" ChannelGroup volume settings.

Object spatialization is available via the following output plug-ins:

FMOD_OUTPUTTYPE_WINSONIC for Windows, UWP, Xbox One and Xbox Series X|S
FMOD_OUTPUTTYPE_AUDIO3D for PS4 with PS VR breakout box
FMOD_OUTPUTTYPE_AUDIOOUT for PS5

Other output plug-ins will emulate object spatialization using traditional channel based panning.

5.2.3 Third Party Plug-ins

In addition to the built-in channel and object based approaches there are third party plug-ins available that can assist too. The FMOD DSP plug-in API (see FMOD_DSP_DESCRIPTION) allows any developer to produce an interface for their spatial audio technology and provide it across all FMOD platforms. Additionally the FMOD output plug-in API (see FMOD_OUTPUT_DESCRIPTION) allows developers to implement a renderer for the FMOD object spatializer extending the functionality to more platforms and more technologies.

Some examples of publicly-available third-party plug-ins:

Resonance Audio Spatializer. The Resonance Audio cross-platform suite of plug-ins comes bundled with FMOD. Resonance Audio offers a "Source" plug-in which behaves much like the FMOD object spatializer, in that audio is sent out to an object mixer, however the final signal returns as binaural output at the "Listener" plug-in. Resonance Audio also offers a "Soundfield" plug-in for playing back first order Ambisonic sound fields. For more details about the usage of Resonance Audio please check out the user guide.
Oculus Spatializer. Another cross-platform suite of spatial audio plug-ins is that offered by Oculus as part of their Audio SDK. You can find instructions and downloads for these available on their website.
Steam Audio. Valve Software offers another cross-platform suite of spatial audio plug-ins as part of their Steam Audio SDK. You can find getting started information available on their website with downloads on GitHub.