16 December 2020 / Last updated: 25 Jan 2021

How to use Nvidia Jetson devices on balena

Access the full capabilities of Jetson devices in balenaCloud. Recent versions of balenaOS provide easy access to JetPack components, giving your containers access to the GPU’s power. Learn how to build a project to start accessing a world of AI tools, including CUDA and OpenCV, below.
How to use edge AI tools with Nvidia and balenaOS

Hardware refresher

Nvidia Jetson is a series of embedded computing boards from Nvidia designed for accelerating Machine Learning (ML) applications using relatively low power consumption. These boards use various versions of Nvidia’s Tegra System on a Chip (SoC) that include an ARM CPU, GPU, and memory controller all-in-one.
The Jetson products include the Nano, TX2, and Xavier, which are supported by balenaOS. These products are actually modules intended to be integrated into end-products and devices. Nvidia also produces developer kits meant for non-production software testing that are Jetson modules attached to a carrier board that provide power, network, video and other external connections. Third party companies, such as Aetina and CTI, make production carrier boards for the Jetson modules.
Examples of Jetson SBCs
A Jetson Nano module vs. a developer kit (the Nano is located under the large heatsink).
There may be some small differences between a production module and the related developer kit. For example, the Jetson Nano is a commercial compute module with onboard eMMC storage, while the Jetson Nano Developer Kit includes a version of the module with an SDcard slot instead but otherwise has the same computational performance.
These differences in storage method necessitate different versions of balenaOS which you can see when you choose a default device type for your application:
Choosing a device type
For a Nano-based board with an SD card slot and no onboard eMMC (such as the Jetson Nano Developer Kit), the “SD-CARD (NEW)” balenaOS image should be downloaded and expanded/written to an SD card using a tool such as balenaEtcher. The resulting SD card will be bootable and becomes the primary storage for the device.
For a Nano-based board with eMMC, a custom flashing utility such as this one is required to flash the eMMC since it is not exposed as external storage. The flasher tool invokes Nvidia’s proprietary software to properly partition the eMMC and place the required balenaOS software in the necessary location to make it bootable.
The two Nano images described above are referred to as “non-flasher” because they do not themselves flash the eMMC storage. However, the Jetson TX2 uses what’s known as a “flasher” image. Once written to an SD card and then booted from that card, the image itself “flashes” the onboard eMMC with balenaOS.
Nvidia device typeBalena machine nameFlasher tool required?
D3 TX2srd3-tx2no
Jetson Nano SD-CARD (NEW)jetson-nanono
Jetson Nano eMMC (NEW)jetson-nano-emmcyes
Jetson TX1 (NEW)jetson-tx1no
Jetson TX2jetson-tx2no
Jetson Xavier (NEW)jetson-xavieryes
Jetson Xavier NX Devkit SD-CARD (NEW)jetson-xavier-nx-devkitno
Jetson Xavier NX Devkit eMMC (NEW)jetson-xavier-nx-devkit-emmcyes
Blackboard TX2 (COMMUNITY)blackboard-tx2no

Nvidia Software

The board support package for the Jetson series is named L4T, which stands for Linux4Tegra. It includes Linux Kernel 4.9, bootloader, NVIDIA drivers, flashing utilities, sample filesystem based on Ubuntu 18.04, and more for the Jetson platform. These drivers can be downloaded on their own, or as part of a larger bundle of Jetson software known as JetPack. In addition to the L4T package, JetPack includes deep learning tools such as TensorRT, cuDNN, CUDA and others. As of this writing, the latest version of L4T is 32.4.4 and the latest version of JetPack is 4.4.1.
There are a few ways the JetPack can be installed on a Jetson board:
  • For the Jetson Nano and Xavier, Nvidia provides SD card images.
  • JetPack is part of The Nvidia SDK Manager, which also includes software for setting up a development environment.
  • JetPack can also be installed using Debian packages
Using the SD card images or installing JetPack will also install a desktop version of Ubuntu 18 on your device. Since we want to use the minimal, yocto-based balenaOS on our Jetson device, we won’t use either of those tools. It’s possible to extract the L4T drivers from the SDK Manager, but there are easier ways to obtain the drivers as discussed below.

Using Nvidia GPUs with Docker

Before we get into loading the L4T drivers, let’s quickly review the history of Nvidia GPUs with Docker. (Feel free to jump ahead to the next section if you don’t want a slightly technical history lesson!)
Being platform and hardware-agnostic, Docker containers do not natively support Nvidia GPUs. One workaround is to install all of the necessary Nvidia drivers inside a container. A downside to this solution is that the version of the host driver must exactly match the version of the driver installed in the container, which reduces container portability.
To address this, in 2016 Nvidia introduced Nvidia Docker, which transparently provisioned a container with the necessary components to execute code on the GPU. Specifically, the two most critical components were:
  • Driver-agnostic CUDA images
  • A Docker command line wrapper that mounts the user mode components of the driver and the GPUs (character devices) into the container at launch.
In the first version of Nvidia Docker, a nvidia-docker daemon was used to replace the docker daemon to run the GPU image. In version 2, Nvidia Docker didn’t require wrapping the Docker CLI and didn’t need a separate daemon.
In 2019, Nvidia Docker 2 was deprecated (except for certain use cases such as Kubernetes) and replaced by the Nvidia Container Toolkit. This toolkit is a plugin for the Docker daemon that allows containers to communicate with the GPU drivers on the host system, providing full access to all NVIDIA GPU devices. NVIDIA Docker relied on the custom Nvidia Container Runtime to invoke the necessary setup code when starting containers, however since Docker version 19.03 no separate container runtime is necessary.
All that is required now is to install the Container Toolkit alongside Docker and run one of Nvidia’s base images with the --gpus flag. The first version of the Container Runtime compatible with the Jetson platform was introduced earlier this year.

balenaOS on Jetson

As we just learned, allowing containers to access the GPU and their drivers on the host requires the features of a newer version of Docker (>= 19.03) as well as the Nvidia Container Toolkit. balenaEngine (the Docker-compatible container engine in balenaOS) contains the necessary features of Docker >= 19.03, however you can’t simply install the Container Toolkit on balenaOS - that functionality needs to be “baked in” to the OS. The difficulty in doing that is the sheer size of all the files that need to be included to support the Toolkit, which is not ideal for a minimal OS designed for embedded devices. To address this situation, we are working on an update to balenaOS that allows inclusion of the Container Toolkit without increasing the size of the core OS. Although there is no firm ETA for this feature, we hope to release it in the first half of 2021.
This leads us back to the option mentioned earlier: installing all of the necessary Nvidia drivers into the container and making sure they match the driver versions on the host. The “host” in our case will be a Jetson device running balenaOS. Each release of balenaOS is built using a specific version of the Nvidia L4T package:
Device typebalenaOS versionL4T version
jetson-nano2.56.0+rev1, 2.51.1+rev132.4.2
jetson-nano2.47.1+rev332.3.1
jetson-tx22.56.0+rev4, 2.47.0+rev232.4.2
jetson-tx22.46.1+rev1, 2.45.1+rev328.3
jetson-xavier2.51.1+rev3, 2.43.0+rev4, 2.43.0+rev332.4.2
Now that we know which versions of L4T are running on the host, we can install the same version in our containers that require GPU access. What happens if there is a mismatch between these versions? Well, if the L4T in the OS is very old and the packages in the base image are very new, nothing will work. If the difference is smaller, such as 32.3 in the OS and 32.4.4 in the container, Nvidia apps will work, but there could be glitches which need to be investigated on a case by case basis to see if the version mismatch is the root cause. So the bottom line is to have them match exactly whenever possible to avoid potentially difficult-to-find issues.
balenaOS includes a feature (see “Contracts” below) to help you make sure that any differences between the host and container drivers do not negatively affect your applications. First, let’s go through an example of how to load the Nvidia drivers in our container.
Back in February, we posted a tutorial about how to include typical Jetson project dependencies such as L4T, CUDA, OpenCV and cuDNN in your Dockerfiles. That tutorial was a lengthy, multi-step process that included downloading and installing the Nvidia SDK Manager in addition to carefully copying and renaming files to get them into your project.
We now present a more streamlined process to accomplish the same goal but in basically one step instead of many!

So what changed?

All of our Nvidia base images for the Jeston family of devices now include Nvidia package repositories (currently based on JetPack 4.3) in their apt sources list. This means that installing CUDA is as simple as issuing the following command or including it in your Dockerfile:
apt-get install -y cuda-toolkit-10-0
(Assuming you are using our Nvidia base images dated after September 20, 2020)
To search for available packages, you can use the apt-cache search tool. Using apt-cache search, you can search for any package using a keyword related to its name or description. It will then output all the packages matching the search criteria. For example, apt-cache search cuda displays all of the packages available to install that have the word “cuda” in their name or description. To see more information about a package, use apt-cache show and then the name of the package.

The Jetson Nano sample app revisited

In our previous tutorial, we included a repository with Dockerfiles for CUDA samples and OpenCV samples. For this tutorial, we have created a multi-container application that contains updated versions of both containers which you can install using the button below:
You’ll be prompted to create a free balenaCloud account if you don’t already have one, and then the application will begin building in the background. At that point, click “add device” and then download the balenaOS disk image and burn it to an SD card using Etcher. After a few minutes, your Jetson Nano should show up on your balenaCloud dashboard.
You can also use the balena CLI and push the application to balenaCloud manually. You’ll first need to clone this repository to your computer. For detailed steps on using this method, check out our getting started tutorial.
Once your application has finished building (It could take a while!) and your Jetson Nano is online, you should see the application’s containers in a running state on the dashboard:
See your active services
At this point, make sure you have a monitor plugged into the Nano’s HDMI port and that it is powered on.

CUDA examples

CUDA is a parallel computing platform and programming model to help developers more easily harness the computational power of Nvidia GPUs. The cuda container has some sample programs that use CUDA for real time graphics. To run these samples, you can either SSH into the app container using the balena CLI with balena ssh <ip address> cuda (where <ip address> is the IP address of your Nano), or use the terminal built into the dashboard, selecting the cuda container.
A closer look at the CUDA service
First, start the X11 window system which will provide graphic support on the display attached to the Jetson Nano:
X &
(Note the & causes the process to run in the background and return our prompt, although you may need to hit enter more than once to get the prompt back.)
The most visually impressive demo is called “smokeParticles” and can be run by typing the following command:
./smokeParticles
It displays a ball in 3D space with trailing smoke. At first glance it may not seem that impressive until you realize that it’s not a video playback, but rather a graphic generated in real time. The Jetson Nano’s GPU is calculating the smoke particles (and sometimes their reflection on the “floor”) on the fly. To stop the demo you can hit CTRL + C. Below are the commands to run a few other demos, some of which just return information to the terminal without generating graphics on the display.
./deviceQuery
./simpleTexture3D
./simpleGL
./postProcessGL
You can use the Dockerfile in our CUDA sample as a template for building your own containers that may need CUDA support. (Installing and building the cuda samples can be removed to save space and build time.)

OpenCV examples

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library that can take advantage of Nvidia GPUs using CUDA. Let’s walk through some noteworthy aspects of the opencv Dockerfile.
Since OpenCV will be accelerated by the GPU, it has the same CUDA components as the cuda container. In addition, you’ll notice a bunch of other utilities and prerequisites for OpenCV are installed. We then download (using wget) zip files for version 4.0.1 of OpenCV and build it using CMake. Building OpenCV from source in this manner is the recommended way of installing OpenCV. The benefits are that it will be optimized for our particular system and we can select the exact build options we need. (The build options are in the long line with all of the -D flags.) The downside to building OpenCV from scratch is deciding which options to include and the time it takes to run the build. There are also file size considerations due to all of the extra packages that usually need to be installed.
Our example solves the size problem by using a multistage build. Notice that after we install many packages and build OpenCV we basically start over with a new, clean base image. Multi stage allows us to simply copy over the OpenCV runtime files we built in the first stage, leaving behind all of the extra stuff needed to perform the build.
To see the OpenCV demos, you can either SSH into the app container using the balena CLI with balena ssh <ip address> opencv (where is the IP address of your Nano), or use the terminal built into the dashboard, selecting the opencv container.
A closer look at OpenCV
Enter the following command to direct our output to the first display:
export DISPLAY=:0
Now enter this command to start the X11 window system in the background:
X &
Finally, type one of the following lines to see the samples on the display:
./example_ximgproc_fourier_descriptors_demo
./example_ximgproc_paillou_demo corridor.jpg
If all goes well, you will see example images on the monitor plugged into the Jetson Nano’s HDMI port. While these examples are not particularly exciting, they are meant to confirm that OpenCV is indeed installed.

Manage driver discrepancies with contracts

In our previous example, we saw the need to ensure that Nvidia drivers in a container match the driver versions on the host. As new versions of the host OS are released with updated versions of the drivers, it may seem a bit cumbersome to keep track of any discrepancies. Balena supervisors (>= version 10.6.17) include a feature called “contracts” that can help in this situation.
Container contracts are defined in a contract.yml file and must be placed in the root of the build context for a service. When deploying a release, if the contract requirements are not met, the release will fail, and all services on the device will remain on their current release. Let’s see what a container contract for our previous example might look like:
type: "sw.container"
slug: "enforce-l4t"
name: "Enforcel4t requirements"
requires:
    - type: "sw.l4t"
      version: "32.4.2"
Place a copy of this file named contract.yml in both the cuda and opencv folders then re-push the application. Now, if the version of L4T in both the host and container don’t match, the release will not be deployed. Since they currently match, you won’t notice any change. However, if you tried pushing to a device running an older OS for example 2.47.1+rev3 which is based on L4T 32.3.1, the release won’t deploy.
If you are using a multi-container application and you want to make sure that certain containers are deployed even if another container contract fails, you can use optional containers. Take a look at this example using optional containers which get deployed based on the L4T version that is running on each particular device.

Going further

If you’d like to learn more about using the Jetson Nano for AI projects on balenaOS, check out the examples below.
If you have any questions, you can always find us on our forums, on Twitter, Instagram, or on Facebook.
by Alan BorisHardware Hacker in Residence