Ambisonic Enlightenment:
Filter:
atk-sc3/Tutorials (extension) | Libraries > Ambisonic Toolkit > Guides & Tutorials

Ambisonic Enlightenment
ExtensionExtension

On ways to think Ambisonic, a Novice's guide
 

Ambisonics is a holophonic soundfield sampling and synthesis technique.

Introduction

Jérôme Daniel in his landmark paper introducing NFC-HOA, describes Ambisonics as "a very versatile approach for the spatial encoding and rendering of sound fields," and lists the following advantages of the technique: 1

________________

Stepping onto our path towards enlightenment we'll begin by considering Ambisonics in the context of pair-wise panorama laws. We'll observe how the angular component of Ambisonics is similar to, but an optimized form of a panning technique with which we're familiar.

We'll then consider the meaning of Ambisonic order, the spatial resolution of the technique. We'll see how order relates to:

Our discussion then closes with a brief review of the Near-Field Controlled Ambsionic Soundfield Model. This is perhaps Daniel's most important contribution to the art, and moves the radius of the basic wave from infinity (classic, Gerzonic Ambisonics), to the mid-field.

We build a visualisation using a collection of virtual loudspeakers (secondary sources) and a virtual microphone (soundfield sampler). We then review three different travelling waves, observing the resulting encoding coefficients and returned encoded signals.

Panorama Laws

A panorama law, aka panning law, is a rule detailing how a loudspeaker array synthesizes a spatial sound image. This rule may act by creating amplitude, phase and time differences between loudspeakers to synthesize the desired phantom image. In practice, not all of these aspects are always touched, and different panning laws may emphasize one aspect over another.

In the discussion here we'll compare pair-wise panning laws with those returned by Ambisonics. Also, we'll restrict the Ambisonic laws to basic panning. I.e., sources to be panned and target loudspeakers are at the reference radius.

NOTE: When we do this, we are reviewing the angular component of Ambisonic panning laws.

We'll review radial aspects later.

Stereo with Pan2

Let's begin with the two channel stereophonic sine-cosine panning law, 2 as this is the panning law used by SuperCollider's Pan2 UGen. From the help, we see this is described as a "Two channel equal power panner". In other words, the panorama effect is a result of acting on the amplitude scaling of an input signal, scaling in an equal power distribution between two loudspeakers.

If we look at the source code, we can see the function used is sine.

Let's make a plot to visualize...

What we see is that we have a rule to govern how much signal is passed to the left and right to synthesize a phantom image.

Quad with PanAz

Reviewing the help for PanAz, we see it described as a "Multichannel equal power panner." When we peek at the source code, we can see that sine appears.

With the settings listed just below, PanAz will return the exact same rule as Pan2:

________________

Given the default arguments, and setting numChans to four:

will return a pair-wise equal power quadraphonic panning rule.

Let's go ahead and test this panner with DC and plot the results. We're starting at the left speaker and panning counter-clockwise all the way around:

What we see here is the amplitude scaling rule for all four speakers in order to pan a sound in a counter-clockwise rotation around the array. We can see that no more than two loudspeakers are active at once.

Also, note that the rule can be described as a collection of windows in space or spatial windows.

Keep this plot open, as we're going to compare this rule with Ambisonics.

Quad with PanB2 & DecodeB2

Here we'll start with two of SuperCollider's FOA built ins PanB2 and DecodeB2 to build a quadraphonic panner.3 This first UGen is a basic 2D encoder, and the second is a controlled opposites, aka cardioid, 2D decoder. Following an Ambisonic encoder with an Ambisonic decoder returns a panning law:

NOTE: We've split the law between PanB2 & DecodeB2!

In comparing the laws for Quad with PanAz and Quad with PanB2 & DecodeB2 we'll notice two things immediately. The spatial windows for :

  1. Quad with PanAz are sharply clipped
  2. Quad with PanB2 & DecodeB2 are very smooth

PanAz offers a parameter to adjust the amount of clipping by changing its width argument. We can modify the law, so it looks a bit more like what we see with PanB2 and DecodeB2:

NOTE: While reduced, there is are still sharp edges in the windows!

In time domain signal processing, sharp window shapes are associated with frequency domain aliasing4 .

In the spatial domain, sharp windows are associated with spatial domain aliasing.

Optimized Quad with HOA1

The original architects of classic first order Ambisonics were deeply concerned about the spatial domain aliasing found in the quad recordings of the Age of Quadraphonic Sound. One of their goals was to reduce or remove the spatial distortions found in these recordings.

Their solution was to offer three different panning laws on finishing off the rule. These choices are equivalent to PanAz's width parameter, but instead of being an ad hoc choice, the different laws for Ambisonics are defined against optimization criteria.

The ATK uses the parameter name beam shape within the HOA toolset.5

Three standard spatial windows are offered:

keywordbeam shapelocalisation vectorvirtual microphone
\basicstrict soundfieldmaximum velocity rVHyper-cardioid
\energyenergy optimisedmaximum energy rESuper-cardioid
\controlledcontrolled oppositesminimum diametric energyCardioid

In the codeblock immediately below you'll notice that the HOA toolset code for making an Ambisonic equivalent panner for quad is much more verbose. As a result, we have much greater control.

We'll use the ATK's projection decoder, HoaMatrixDecoder: *newProjection, to create the quad decoder. newProjection is a very simple, but powerful decoder. It quickly calculates the matrices required for decoders where space has been sampled equally. To design a 2D decoder, we just supply the vertices of a regular polygon.6

NOTE: In practice, we'd usually use HoaMatrixDecoder: *newPanto to return a quadraphonic, or other regular polygon, decoder, as it designs the required polygon internally. For ease of comparison we've used newProjection for the following examples, so as to directly map to the output ordering PanAz returns.

Go ahead and try each of the three window choices.

With basic and energy, we see the scaling function drops below zero in places. If plotted in a polar form, we'd see the familiar tails of first order hyper-cardiod and super-cardioid microphones.

Look closely to find where these tails appear in the windows. Of particular interest, by dropping below zero they are inverted in polarity. They appear at their peaks when there are a peaks in the loudspeaker opposite. We can say, where one loudspeaker pushes, the opposite pulls.

NOTE: In Ambisonics, the loudspeakers all work together to create the panorama.

(Feel free to close the open plots.)

Octa with PanAz

Let's try a pair-wise octaphanic rule with PanAz.

For convenience, we'll use an array where the first loudspeaker is in at front center, and we'll start the test from directly behind, so that the plot returns the first window centered. As before, the panning angle will rotate counter-clockwise.

This plot really gives a clear sense that panning laws are spatial windows. We see each window offset in space. (Keep this plot open.)

Now let's do the same analysis, but just keep the window for the first loudspeaker:

(And, keep this plot open, too!)

Optimized Octa with HOA3

Go ahead and try each of the three window choices.

(After inspection, feel free to close these.)

And, another plot, keeping just the front center loudspeaker:

(After inspection, feel free to close these.)

Let's do one more plot, where we compare the window shape of pair-wise octaphonic with HOA3 strict soundfield:

What we're seeing here is that in the main lobe of the two windows, the octaphonic pair-wise law is similar to the HOA3 strict soundfield law. That's interesting, in that it indicates that pair-wise octaphonic panning gives something in the neighborhood of Ambisonics!7

(go ahead and quit the server)

(and close the open plot windows, except for the last one comparing pair-wise and basic HOA3)

Spatial Nyquist filters

This isn't completely obvious, and seems counter intuitive, but an expert in windows for filtering will see the two plots as related. The HOA3 law looks like a smoothed version of the pair-wise law.

Let's do a little experiment.

The pair-wise window for the sine-cosine panning law is actually a zero padded Sine window.8

When we compare the sine window with a windowed sinc, we see some remarkable similarities with our previous plot:

A windowed sinc is a lowpass filter. Frequency domain anti-aliasing filters are often designed by starting with a windowed sinc.

For more insight, let's review the frequency response of these two:

What we are seeing here is that the windowed sinc is a fairly well behaved lowpass filter with a flat top and a smooth roll off. This isn't the case with the sine window.

Because we can, let's directly view the frequency response of the HOA3 strict soundfield panning law.

What we're seeing is that the HOA3 basic (strict) panning law has a well behaved lowpass response in the frequency domain when viewed as a time domain window.

________________

In the spatial domain, the Ambisonic panning law acts as a spatial lowpass filter. Its role is as a spatial anti-aliasing filter, aka a spatial Nyquist fiter.

Let's see how this works in practice by going back to quad comparing a pair-wise quad law with an HOA3 quad law:

Remarkably, when we go back to quad from HOA3, we see that the panning law window has opened up again!

This opening up is spatial smoothing, aka lowpass filtering in the spatial domain.

If we bother to do a check, we'll find that the quad law for HOA3 (when using the projection decoder) is the same as the one for HOA1.

This is a result of the Ambisonic laws applying a spatial anti-aliasing filter.

Also, we can see by inspecting the window frequency response, the spatial cutoff is higher for the octaphonic array. The octaphonic array has a higher spatial sampling rate. For HOA3 with the quadraphonic array, the spatial anti-aliasing filter rejects spatial detail that would otherwise alias.

In contrast, the pair-wise laws are very leaky. They have higher cutoffs, but significantly more spatial aliasing.

________________

 Important takeaways

(feel free to close any open plots)

Isotropy

Maintaining isotropy is one of the more important concerns in the design of Ambisonic panning laws.

Let's directly compare the panning laws of pair-wise sine-cosine quad with those of HOA basic quad.

The example code below makes a single window for each law. The directional amplitude and power response of the two arrays are then simulated. The plots returned illustrate these two measures for both arrays.

Here's what we see when we inspect these plots:

  1. both the pair-wise and the HOA basic laws are equal power
  2. only the HOA basic law is equal amplitude

The HOA quad law is isotropic for both of these measures.

NOTE: For this review, we've made these measures of the HOA law in a brute force manner. The HOA decoder tools offer the usual formalized measures via a convenient interface. See: HoaMatrixDecoder: Analysis.

Ambisonic Order

From the ATK-Glossary:

Ambisonic order 
Specifies the maximum Associated Legendre degree, ℓ, of a given signal set.

Ambisonic order indicates the Associated Legendre degree to which the detail of an Ambisonic soundfield is known.

There are a number of ways to consider the meaning of Ambisonic order. As Ambisonics is a holophonic technique, we'll begin by considering the effictive radius of soundfield resynthesis. We'll consider practical aspects of spatial sampling in the spherical and angular domains. And, then end with a brief discussion of localisation measures.

The ATK includes a class, HoaOrder, which can offer formalized understandings of these various aspects of an Ambisonic soundfield. We'll use this lens in much of the discussion that follows.

Effective radius & frequency

When we recall the OUTRS tetrahedral recording experiment, the origins of Ambisonics as a soundfield sampling technique become clear. The soundfield is sampled at a single point with a measurement array. We exactly know the soundfield at this point.9

Surprisingly, we also know the soundfield further away from the sampling point, in a frequency dependent way. This is the effective radius:

effective radius
Radius of the volume (or area) of exact soundfield reconstruction.

Let's plot the effective radius against Ambisonic order:

Ambisonic order is on the x-axis and effective radius in meters is on the y-axis. We're measuring at 700 Hz (or 1000 Hz, if you choose). This plot illustrates: as Ambisonic order increases, the region of exact soundfield reproduction also increases.

In particular, at fifth order, we can expect a region of nearly radius = 0.4 meter to be exactly reconstructed for frequencies at and below 700 Hz.

Let's try another plot:

As with our previous plot, Ambisonic order is on the x-axis. The y-axis is frequency, but on a log scale of decimal octaves. For instance:

This plot illustrates: as Ambisonic order increases, the cutoff frequency of exact soundfield reproduction also increases.

In particular, at third order, we can expect a region radius = 0.25 meter to be exactly reconstructed below 5.3333 decimal octaves:

Knowing the effective radius and effective frequency helps us decide which Ambisonic panning law to use. If the target for playback is a large audience, choosing the strict soundfield law is not necessarily ideal. The energy optimised or controlled opposites laws are better choices.

________________

Frequency dependent laws

Classic FOA employs the psycho-acoustic shelf filter10 to select the strict law at low frequencies and the energy law at highs. The ATK's HOA toolset includes a fiter kernel designer to do the job.11 Frequency dependent laws have traditionally been advised for studio and near-field listening. For example:

A single listener can expect a third order soundfield to be reproduced exactly, up to 1820 Hz. Above this point, the energy optimised law is the better choice, as the soundfield isn't exactly reconstructed.

Spherical basis functions

From the ATK-Glossary:

spherical harmonics (SH)  
A complete set of orthogonal, Fourier basis functions on the sphere. For Ambisonics, a set of real form harmonics truncated to a highest Associated Legendre degree, i.e., a given Ambisonic order, encodes a soundfield. See Spherical harmonics.

Open the following pages:

The first of these illustrates Spherical Harmonics (SH) up to degree 5; these are the SH for fifth order. We can understand these bubble shapes as illustrating the 3D polar response patterns of each SH. If we like, we can think of these as virtual microphones.

The second illustrates up to degree 4, so these are for fourth order. (We convert a fifth order soundfield to fourth by discarding the SH of degree 5.) These are illustrated as heat maps. Only one side of the "tree" is shown. The symmetries of the sectoral and tesseral SH are shown via the rotating SH.

More from the ATK-Glossary:

sectoral modes  
Spherical modes where AL degree and AL index are related ℓ = |m|. These modes encode azimuth. See Visualization of the spherical harmonics and Sectorial Harmonic.
tesseral modes  
Spherical modes not included as sectoral or zonal, encoding both azimuth and elevation. See Visualization of the spherical harmonics and Tesseral Harmonic.
zonal modes  
Spherical modes where AL index m = 0. These modes encode elevation. See Visualization of the spherical harmonics and Zonal Harmonic.

The spherical harmonics are the basis functions against which we measure the shape of a soundfield.

A zero-th order soundfield is a soundield without any shape; it has energy only in degree zero.

Spherical & angular SSR

It becomes immediately clear that Ambisonic order can be directly understood as a kind of spherical domain spatial sampling rate. The higher the order, the more spherical harmonics.

Let's explore some details. We'll begin by considering:

In 3D, aka Periphonic

How resolved, in terms of numbers of harmonics, are each of these?

We see that as order increases, so does the number of SH in the spherical domain. We can think of Ambisonic order as directly indicating a spatial sampling rate in the spherical domain.

For translations of soundfields to the angular domain, the ATK uses spherical t-designs. We can find the mimimum size design required for each order by observing the returned value:

3D Soundfield Spatial Sampling Rates

The table below compares the number of coefficients required for the spherical and angular domains:12

orderspherical SRangular SR
144
31624
53660

One way we can read the table immediately above is to understand that spherical harmonics are a fairly efficient way to represent a soundfield. For fifth order, we need only 36 harmonics, but in the angular domain, 24 more spatial samples are required for the job.

In 2D, aka Pantophonic

How resolved, in terms of numbers of harmonics, are each of these?

The sectoral harmonics, aka modes, encode the 2D soundfield. You can see we need significantly less harmonics here.

2D Soundfield Spatial Sampling Rates

The usual practice is to consider the angular sampling rate for 2D to be +1 that of the spherical, as doing so returns more stable image synthesis.13

orderspherical SRangular SR
134
378
51112

The rule for 2D arrays is:

Array resolution

As we saw above with Spatial Nyquist filters, an actual loudspeaker array has spatial Nyquist frequency. For instance, a quad decoder will only be able to synthesize a first order Ambisonic soundfield. This becomes apparent when we evaluate the rule of thumb immediately above.

For a regular polygon, 2D, we can re-write the rule as:14

The same principle is true for 3D loudspeaker arrays.15 If we are designing an isotropic (equal in space) decoder, the degree of resolution is limited by the number of loudspeakers available. For instance, a cube can only be first order:

Localisation measures

Another way we can understand Ambisonic order, and panning law choices (beam shapes) is to consider the localisation measures Ambisonics is designed to optimize:

velocity localisation vector (rV)
Vector quantity offering an estimate of the perceived localisation of a phantom source at low frequency, predicting imaging up to around 1.5 kHz. Can be found as the real part of acoustic admittance, the active acoustic admittance.
energy localisation vector (rE)
Vector quantity offering an estimate of the perceived localisation of a phantom source in terms of energy, expected to predict imaging between 500 and 5000 Hz.

The strict soundfield option maximizes rV, where rE is energy optimised. For off center listeners, rE is usually preferred.

Let's try a plot:

What we see here is that for a third order 2D array, the energy localisation measure for a synthesized Ambisonic image is more that 90% that of a real sound. We expect this energy optimized 2D array to be well defined in terms of energy.

Try:

For the controlled opposites law, we require fifth order to get above the 90% threshold.

Ambisonic Soundfield Model

Classic, aka Gerzonic, Ambisonics has always included the Near-Field Effect (NFE) within its theoretical framework. This inclusion, however, hasn't tended to be especially visible to users on the encoding side of the panning laws. As a result many users are only familiar with basic encoding, where the encoding coefficients are real.

In classic Ambisonics, basic encoding is planewave encoding.

________________

Daniel's Near-Field Compensated Higher Order Ambisonics (NFC-HOA)16 introduces the Near-Field Effect (NFE) reference radius into the Ambisonic framework to formalize what we might call the Near-Field Controlled Ambsionic Soundfield Model (NFC-ASM).

In practice, we can view this model as a collection of virtual loudspeakers at the reference radius with a virtual microphone at the center.

In theory, this isn't quite the whole story. Recall from our discussion of Panorama Laws that we should view the loudspeakers as a collection of spatial window functions, or basis functions, with look directions. Similarly we should view the microphone as another collection of spatial basis functions, the spherical harmonics. The number of each of these is governed by the principles outlined above.

The soundfield can be represented in both angular and spherical forms.

________________

We'll start with constructing a visualisation of the model. Then we'll consider encoding three different travelling waves. We'll finish up with synthesizing the associated waveforms, directly from the calculated encoding coefficients.

In designing the encoding coefficients for these different travelling waves, you'll see that the encoding law is split between angular and radial encoding. Radial encoding what allows us to move either side of the reference radius, and is where our near-field control is found.

Virtual loudspeakers & microphone

We'll start building our model by evenly distributing a number of points evenly over the surface of a sphere. As discussed above, we'll find a spherical t-design which has an angular spatial sampling rate high enough to meet the spherical sampling rate of a selected order:

Given this spherical design, we'll now explicitly collect Spherical coordinate instances, setting the radius of these to the reference radius.

Let's now use PointView to view this array of virtual loudspeakers at the reference radius:

Go ahead and touch the GUI with your mouse or pointer to re-orient the display.

Now, let's add a virtual soundfield microphone:

This is it!

We can imagine the NFC-ASM to be a collection of virtual loudspeakers evenly distributed across the surface of a sphere. The radius of the sphere is the reference radius. At the origin of the sphere is a virtual soundfield microphone.

Easy, peasy!!

When we're done inspecting:

Near-field travelling wave

The radial part of Ambisonic encoding (the start of the panning law) is frequency dependent, so for this demonstration we'll need to specify a frequency:

NOTE: Usually we'd use HoaEncodeDirection to encode a travelling wave when building a UGen graph. When using this UGen we don't need to specify a frequency. The included frequency dependent radial filter DegreeCtrl does the job for us.

Let's now specify a near-field source, encoded at half the reference radius. We'll use a shorthand of naming a travelling wave within the reference radius as a near-field source.

We can see this source is within the virtual loudspeaker array.

Now let's design the encoding coefficients. You'll see we design the angular and radial coefficients separately, and then bring them together for the final encoding law:

The designed coefficients are Complex. We have both real and imaginary parts for each coefficient!

When we inspect the magnitude and phase of the encoding coefficients, we're reviewing the magnitude and phase changes that are required to synthesize Ambisonic encoding of a sinusoid at the frequency we specified above:

Let's plot these values:

NOTE: Leave this plot window open. We'll compare with other travelling wave encodings, below.

Far-field travelling wave

Let's now specify a far-field source. Like above, we'll use a shorthand of naming a travelling wave beyond the reference radius as a far-field source.

This source is at one and a half times the reference radius (more like, far-ish, actually.17 ):

Now we can see both the near-field and the far-field source.

And, the far-field encoding coefficients:

We can inspect:

And plot:

NOTE: Leave this plot window open. We'll continue to compare with other travelling wave encodings.

When we compare the magnitude plots of the near and far-field travelling waves, we notice the two are substantially different. In particular, we see the near-field source has high gains in high harmonics, while in the far-field source we see the gains rolling off.

We can also notice that the phases are shifted in opposite rotations on camparison. E.g., positive phases for near-field are negative for far-field.

Let's compare the angular and radial coefficients for this pair:

So, yes, this test confirms that our two travelling waves have the same angular encoding. They have the same look direction.

Basic travelling wave

Recall from our discussion above: basic panning, aka basic encoding encodes a source at the reference radius.

Let's specify this:

Now we can see near-field, far-field and basic sources.

NOTE: These are all spherical travelling waves!

Now let's synthesize the coefficients of the basic source:

Inspect:

Notice, for the basic travelling wave, the phase of the encoding coefficients is either 0 or 180 degrees.

This corresponds to, the coefficients for basic encoding having no imaginary components:

Plot, and compare with our other plots:

One thing we can see is that as a source moves away from the reference radius, this change is encoded in both magnitude and phase changes.

Let's directly test the encoding coefficients:

So, yes, the angular coefficients are the same. The differences are in the radial coefficients.

NOTE: When you're ready, feel free to close the travelling wave coefficient plots.

Travelling waveforms

As we work with Ambisonic signals, we'll become accustomed to reviewing encoded waveforms. Let's now take the opportunity to synthesize and review a single cycle of our three sources.

For ease of viewing, we'll truncate our coefficients from HOA3 to HOA1:

Synthesize and plot:

When we cycle through these three plots, it becomes apparent that the first channel, degree zero, remains the same for all three travelling waves.

We see that the space of the sound is to be found in the higher degrees, and is encoded in both magnitude and phase.

[1] - J. Daniel, "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format," Paper 16, 23rd International Conference: Signal Processing in Audio Recording and Reproduction (2003 May.). Permalink: http://www.aes.org/e-lib/browse.cfm?elib=12321
[3] - Building a panner by directly connecting an encoder and a decoder is known as Ambisonic equivalent panning, aka AEP.
[5] - The FOA toolset uses the name k.
NOTE: We could have called this parameter spatial window or even panning law. The term beam shape appears to be a preferred name in the HOA technical literature.
[6] - This is what SuperCollider's DecodeB2 is doing under the hood.
[7] - Maybe that's why people like octaphonic sound?
[8] - Surprised?
[9] - Neglecting measurement errors having to do with the actual spatio-frequency response of the microphone. E.g, the spatial aliasing limit of the microphone, and other factors.
[12] - Using the energy criteria for t-designs as described by Zotter, et al.

Zotter, F., Frank, M., & Sontacchi, A. (2010). The Virtual T-Design Ambisonics-Rig Using VBAP. EAA Euroregio Ljubljana 2010.

Zotter, Franz, and Frank, Matthias. Ambisonics. Springer, 2019.

[13] - "In playback, to get a perfectly panning-invariant loudness measure E of the continuous panning function and also the perfectly oriented rE vector of constant spread arccos(|rE|), the parameter t must be t ≥ 2N + 1. In 2D, all regular polygons are t-designs with L = t + 1 points.

We can use the smallest set of 2N + 2... as optimal 2D layout."

Zotter, Franz, and Frank, Matthias. Ambisonics. Springer, 2019. (p. 60)

[14] - This will return the number of sectoral harmonics required.
[15] - There are some specific caveats if we're willing to accept a design that is not isotropic.
[16] - I prefer to name the technique as Near-Field Controlled, so as not to confuse with the usage of the term Near-Field Compensated in classic Ambisonics.
[17] - 10 meters is a better choice, as the encoding approaches that of a planewave.