How balenaSound inspired the audio block

In this blog post, we’ll explain how the audio block came to be. We’ll put on our IoT fleet owner shoes, dive deep into balenaSound’s history, and show you how one of our most popular projects helped build the very first balenaBlock.

As you may already know, our mission statement at balena is to “reduce friction for fleet owners and unlock the power of physical computing.” Creating IoT applications should be easy, however that is often not the case. At balena, we work hard to solve these problems so that you, the user, don’t have to. For example, we announced balenaBlocks, a set of basic building blocks to help you jumpstart IoT applications.

Read on to learn how we noticed a pattern in how we were improving and maintaining balenaSound and realized a way to simplify our application builds.

Some background on balenaSound

Since its release in early 2019, balenaSound has been a huge success for us, providing our users with a way to easily create an audio streaming device while introducing a lot of new people to the IoT & embedded systems world. Evidence of this is the large amount of external contributions to the project and also the amount of bugs and issues that are reported and fixed.

It didn’t take long to realize that the problems we were solving for balenaSound were not unique to its particular use case, but instead they could apply to any project that required audio manipulation.

Spotting the maintenance pattern

A thorough inspection of the source code of the project at that time and the repository’s list of issues confirmed our suspicion: the vast majority of our time was spent developing and maintaining code that was not specific to balenaSound, but rather related to audio hardware configuration and audio routing.

We felt that if we extracted generic functionality into a reusable application fragment (aka a block), we could simplify the work needed when creating new audio applications in balenaOS. Also, any bug fixes or improvements made to this block would benefit not only balenaSound but any other project using it.

Creating the audio block

Setting the boundaries

With this in mind, we looked at balenaSound through the “blocks” lens and analyzed its internals. To define what an audio block should do we asked ourselves; what core features could we extract that were independent of balenaSound?

Here are the notes we made:

Inputs and outputs

Let’s think of the audio block as a black box with inputs and outputs:

  • Applications that generate audio streams are the inputs (for example: Spotify, a microphone stream, bluetooth audio, an mp3 file being played)
  • Audio interfaces on a device are the outputs (for example: audio jack, HDMI, DAC)

User Story: Charlie the Unicorn is a developer working on an audio application running on a Raspberry Pi 4. He wants to play an mp3 file over the HDMI1 port (not HDMI0! the Pi 4 has 2 HDMI outputs). What he doesn’t want is to learn or worry about audio routing configuration or embedded audio intricacies. Ideally, he just connects inputs with outputs without much consideration of what’s going on under the hood and it just works.

Audio processing

Besides connecting inputs with outputs, there are some audio operations happening inside the black box that can be useful for a broad range of applications. Audio resampling, mixing, and equalization are some common ones, but there are other audio processing effects such as echo cancellation, channel remapping, and LADSPA filtering that, albeit niche, could also be desirable.

User Story: Charlie wants his MP3 files to play. He doesn't want to reprocess them because they use a sampling frequency or bitrate that the HDMI1 port does not support. The audio block should account for that too.

Programmatic control

Another important feature to consider is to allow some type of external control of the audio block to be possible at runtime. We need to be able to dynamically modify the configuration of both the black box and the inputs and outputs, so we need some sort of programmatic API to interface with the block.

User Story: Charlie wants to impress his customers by adding a new feature: a physical knob to adjust the output volume level. We've got Charlie's back, so we should provide the means for him to write some code that does the job with a simple interface.

Centralized configuration

Lastly, a key aspect of the audio block should be to centralize hardware configuration and audio routing in one place. In a multi-container based application this would remove the need for applying the same configuration to multiple services. This is key for complex audio projects where there are multiple sources of audio streams (balenaSound for instance).

User Story: Charlie doesn’t have the time to set up a complex configuration in all his application services. The audio block should be the one and only place where audio configuration is happening, and it should require very little user intervention to setup.

Note that while we derived these features directly from balenaSound, and it perfectly solves the problem for that project, this is most likely what any audio application developer will be looking for. If we missed something please feel free to reach out and let us know :)

Building the block

So, how do we build this black box? Being a linux based system, balenaOS uses ALSA at the kernel level to offer an API for sound device drivers. ALSA’s libraries and utilities are extremely powerful tools that allow your application to interact with audio hardware. balenaSound was already making heavy usage of them so the first proposal for the audio block was obvious: let’s create an image that uses ALSA utilities to do the audio configuration heavy lifting.

However, there was one problem with this approach. ALSA is not easy to work with because it’s mainly a kernel-space software layer. The more complex your application gets, the harder it is to properly configure it correctly. Configuration woes often have uncomfortable results: high pitch noises, hisses, stuttery audio, distortion, or even no sound at all. So while it is possible to use ALSA for any type of audio application no matter the complexity, we felt there was a better solution.

ALSA’s difficulties can be avoided by using a sound server. Sound servers are software layers built on top of ALSA, higher level abstractions that provide simpler audio tooling at the cost of flexibility. Audio on Linux has historically been a complex puzzle to solve; with no standards, many sound servers emerged: PulseAudio, Jack, aRts, ESD, PipeWire, and some lesser known ones.

Documentation, feature set, and adoption were some of the main concerns we had when selecting which sound server would be driving the audio block. After careful consideration, PulseAudio looked like it checked all the boxes so that was our pick (sidenote: it seems the Raspberry Pi foundation agrees with our choice as back in December 2020 they added PulseAudio as the default sound server to their popular Raspberry Pi OS).

Creating the block was then a matter of optimizing PulseAudio to run in balenaOS, choosing the right defaults for the many configuration options there are and making sure the interface with ALSA and bluetooth devices was smooth. Wiring inputs and outputs is probably the number one thing users would be looking for so we made sure to provide sane defaults while also allowing enough breathing room for more experienced users to customize to their liking. The block should be good to go with little to no configuration for very basic use cases with environment variables allowing some degree of customization.

Advanced users can fully customize the PulseAudio installation by using PA’s config files. If you are interested in the implementation details of the audio block check out the GitHub repository, we’d be happy to share our thoughts (and receive PR’s too!).

How the audio block goes beyond balenaSound

We built the audio block based on our experience with balenaSound, so it’s only natural to go back and redesign the project to make use of it. balenaSound v3.0 introduced the first audio block based project, here is how it’s architectured:

With the audio block at its center, balenaSound can be seen as an audio multiplexer, both for inputs (plugins) and outputs (audio hardware). This is a very clear depiction of what the project really is about: a multi-room audio streaming device with support for multiple audio sources.

It also highlights how blocks can empower a developer to build applications out of their comfort zone; you don’t need to be an embedded audio wizard to develop something like balenaSound! If you were starting this project from scratch you could just drop in the audio block without knowing how it works internally (much like installing a dependency), the only requirement is to read the documentation to see how to interface with it.

How to try it out

If you are interested in an in depth explanation on how balenaSound makes use of the audio block please check out our architecture guide, it’s quite involved so we won’t go into it in this post but we do have a few simpler examples you can check out instead:

Until next time

Building IoT applications can be difficult, from hardware to software, it requires proficiency in a wide range of technologies and skills. Balena blocks aim to lower the barrier of entry and allow more developers to start building projects.

Hopefully this blog post is a good showcase of how you can solve a hard problem in a way that can be useful to a broad spectrum of users and applications. balenaHub hosts the blocks we developed at balena with this intent in mind but there are a lot of hard problems out there in the IoT space. Are you working in one of them? If you feel like you are, we encourage you to contribute, the solutions to the problems you are facing might help out other developers.

comments powered by Disqus
Terms of Service | Privacy Statement | Master agreement | Copyright 2019 Balena | All Rights Reserved