Monitoring your balena devices with Datadog

Updated 28 July 2020: Double-checked that we're using the latest Datadog build steps.

Here at balena, we have developed a complete set of tools to deploy and manage your fleet of IoT devices, but what if you also want a monitor detailed metrics of a single device or of your entire fleet? That’s when Datadog comes into play as a great complement to your balenaCloud setup.

Datadog is software as a service (SaaS) that enables you to gather a wide range of metrics about your system, and display it all in one place. You can create multiple dashboards to help you monitor your devices and create alarms for different events, such as when a device goes offline, or if a value (temperature, bandwidth, CPU usage, etc.) passes a defined threshold.

In this blog post, we will walk you through the steps of setting up and configuring an application with balenaCloud and Datadog to deploy a fleet of two Raspberry Pi 3 devices, although this project can be scaled to any number of devices.

To get started, you will need accounts for both balenaCloud and Datadog. If you are new to the balena ecosystem, we highly recommend checking the getting started guide.

Using Deploy with balena

To begin, click the big blue button:

This will open up the balenaCloud dashboard and launch a modal for creating a new app, prepopulated with a name and device type. If it’s not already selected, choose Raspberry Pi 3 from the drop down menu on Device Type, as that is what we are using in this project. Then click Create and Deploy.

A build will begin running on our cloud build servers, and you can simply proceed to the next step, which is clicking the Add Device button. Since we’re going to use a fleet of Raspberry Pi 3’s for this project, so let’s go ahead and download the image for that device.

With the image downloaded to your computer, we need to flash it to an SD Card that will be inserted into the Raspberry Pi. For that, we will use balenaEtcher.

Once you are done flashing the device, insert it in the Raspberry Pi and power on your device.

When the device boots for the first time, it connects automatically to balenaCloud, and you’ll be able to see it listed as online on the dashboard and move onto the next step.

Troubleshooting: It should only take a few minutes for the new device to appear in your dashboard, If your device still hasn't shown up on your dashboard after a few minutes, something has gone wrong. There's an extensive troubleshooting guide in the documentation, with lots of information on why this could be, but if you still can't get your device online, come on over to the forums where we’ll be able to help out.

Create a datadog account

The next step is to create an account on Datadog and install the datadog-agent, which is a small piece of code that gathers the data from the Raspberry Pi and pushes it up to the Datadog cloud. Getting started is easy-- head to http://www.datadog.com, and click “Get Started Free” at the top-right:

This will bring you to a page where you need to enter some information, so go ahead and provide your name and email address, create a password, and click “Sign Up”:

On the next step, you will be asked to choose your software Stack, but we don’t need this step, so you can scroll to the bottom and just click “Next”:

Now on Step 3, Agent Setup, you are presented with a list of Operating Systems. This would normally be used if we were installing Datadog directly onto one of these devices or environments, but what we instead need from here is our DD_API_KEY. We can actually find it just by simply choosing the first item in the list, which happens to be Mac OS X, and having a look at the info that is displayed:

In the screenshot above, the relevant piece of information is the DD_API_KEY=645c449d3aa20878448b9910a7f6de71. Your key will vary of course, but make note of it, as we are going to need that string of characters later.

Configure environment variables

While Datadog is waiting for the agent to report back, let's jump back to the balenaCloud dashboard to finish the configuration process.

In order to visualize all devices inside our datadogMonitor application in one single Datadog instance, we need to add the API key variable as an Application Environment Variable, which makes it available to all devices in that application. From your Application, click on Environment Variables and add a variable called DD_API_KEY with the value from the API key that we saved from the Datadog setup.

Meanwhile, that background cloud-build that we kicked off earlier should be complete by now, and we can check the progress of downloading and installing the containers inside your device, as displayed in the image below:

This sample project is a multi-container application, including the simple-server-python project to create an HTTP server with the Datadog monitoring agent.

After the containers have finished downloading, they will automatically start, running the Flask server and the Datadog agent.

If you have done everything correctly you can now go back to the Datadog dashboard, and you will see that the devices are now showing up! 🎆🎆🎆

Now visit Infrastructure List and you will be able to view all devices connected to this datadog instance. Right now we are connecting two Raspberry Pi 3 devices, so you will be able to see two devices connected.

Note that the hostname displayed on datadog matches the UUID on the balenaCloud dashboard, which makes it straightforward to connect devices between the services.

💥 Boom, everything is ready to go and now you can take advantage of Datadog to help you monitor your device fleet. On the infrastructure list page, you can click on each device and it will open a dashboard showing live charts so that you can track anything happening on the system.

Monitoring multiple devices and triggers

A helpful feature that many Datadog users leverage is the ability to monitor and verify their devices are online and functional, and in the case that a device goes offline, be notified for example by email. Let’s go ahead and create a new monitor and trigger by going to https://app.datadoghq.com/monitors#/create.

For this example, we want to monitor a “Host”, so click on that on the left. You will come to a form that looks like:

Step One: Select All Monitored Hosts so that we can monitor the entire fleet of devices with that API key:

Step Two: Leave as is with Check Alert checked.

Step Three: Enter a customized message in this field, which can be written in markdown. This is what will be emailed to you in the event of downtime.

Step Four: Add your account name or email address in this field and you will receive an email every time this monitor changes state (device goes offline or online).

Click Save and you can now monitor all the devices in a timeline bar, get notifications if any devices go offline and also check the overall system uptime.

At this point, you have the basics covered. As a quick recap, we have set up a sample project connecting balenaCloud with Datadog, to monitor a fleet of devices and track metrics such as device uptime. We’re only scratching the surface of what Datadog can do and how it can help you manage your IoT project in conjunction with balenaCloud, so let us know what else you come up with!

If you have any questions or feedback, or have built something based on this project, we'd love to see you share your work over in our forums! Happy developing!

comments powered by Disqus
Terms of Service | Privacy Statement | Master agreement | Copyright 2019 Balena | All Rights Reserved