Rendering spatial audio on a desktop: The SSR and APF libraries

I’ve been playing around with the SSR library in order to render a binaural soundscape for a project I’m working on. Whilst a great library, I struggled to get to grips with it for a few days. The library makes heavy use of templates (a powerful paradigm of C++ but can be awkward to read) and is written using some design patterns I had never encountered before.

I started by reading some of the source code, trying to compile the header-only library and get some test code working. After spending hours getting the library to compile, I was ready to try it out with some sample assets from Audio Defence. Compiler errors began to arise upon simply instantiating the ssr::BinauralRenderer so I had to dig a little deeper. I had a read of the source code and I came across some examples that were included in the repository. I also emailed one of the author’s of the library to ask for some help with integrating the SSR as a library into my existing framework in order to handle the binaural rendering.

After a week of bashing my head against the wall, I’ve now managed to pump out some spatial sound from the renderer, sending it out to the speakers and dumping it to a stereo file. I’ve decided to jot down some notes here to help me remember how the thing works!!

  • The ssr::BinauralRenderer is a subclass of the apf::MimoProcessor, which is an abstract multiple input/multiple output processor that enables the programmer to implement the processing callback while handling the threading and access control of the samples held in the processor’s buffer. The renderer is instantiated by passing a apf::paramter_map instance which is a key-value dictionary of configuration settings for the renderer. The main settings required are the sample rate, block size and the location (full path) to the HRIR file that contains the impulse responses for convoluion.
  • The processor functions through use of the Policy Design Pattern. This design pattern dictates that a can have a number of different policies for responding to similar situations (ie. a class may have a number of different policies regarding the printing of data to a file or to a TCP stream). In my case, the policies that the renderer is concerned with are its interface (how to process it’s data buffers at each audio cycle or rather how to ‘use’ and interface with it) and how to act in a threaded manner. To specify which policies to use, you simply include the header file of the policy and define a macro called APF_MIMOPROCESSOR_INTERFACE_POLICY for the interface policy and APF_MIMOPROCESSOR_THREAD_POLICY for the thread policy. The library comes with a default thread header which just uses a single threaded policy (ie. not implemented) on windows and the POSIX library for *nix and OSX systems.
  • In order to use the binaural rendererer, you need to specify the policies you want to use. The renderer relies on two policies in it’s implementation; an interface policy and a threading policy. To use the renderer as a standalone module, the pointer policy must be used. This then opens up the audioCallback function to be called manually by the application programmer when they want to process some data. The function accepts 3 arguments, the block size of the frame to be processed, a pointer to a series of inputs and a pointer to a pair of outputs (as the binaural renderer is an instance of an N-input, 2-output processor for stereo binaural output).
  • The processor requires it’s input to be a pointer to a list of channels. These channels can be implemented as a series of vectors. The renderer’s output is also expected to be a series of vectors representing the audio channels. The inputs should be a N * BLOCK_SIZE matrix where BLOCK_SIZE is the number of frames to be processed as a block during each run of the audio cycle. The N is the number of channels in the input. The outputs should be a 2 * BLOCK_SIZE matrix, indicating stereo output. The renderer expects a 1-1 mapping of input channels to sources, and the sources are ordered with the channels (ie Channels[0] is the first source, Channels[1] the second etc…)
  • Finally, for dumping to a file, you need to transpose the channels as libsndfile reads in row-wise order, intereleaving the channels as it dumps to the file. This was the major source of confusion for me and it took some fiddling and multiple reads of the source repo to understand how that worked.

So to recap; the binaural renderer can be instantiated after specifying the policies required to do it’s thing. Next you need to generate a parameter map, a key-value dictionary containing the configuration (block size, HRIR file path etc.) for the renderer. In order to use the renderer, you pass a pointer to a list of arrays representing the channels of the audio (best to use the apf::fixed_matrix container that comes with the APF framework).

A blunder was in dumping the output to a file; this wasn’t the SSR’s fault as libsndfile expects reads and writes in row wise order for (de)interleaving. All I had to do here is have a second output buffer which is the transposed matrix of the output list of channels. You can then call the writef() function of the SndFileHandle object in the libsndfile C++ API for writing stereo output to the file.

Having figured all of this out I can finally move onto the image processing aspect of my project. More on this to come later.

social