Our Dockerfile Tips & Tricks

Last week was Docker Global Mentor Week 2016 [https://blog.docker.com/2016/10/docker-global-mentor-week-2016/]- a great initiative by Docker [https://www.docker.com/] to help users improve at all skill levels. Docker is one of the key technologies in the resin.io stack- and we’ve found that there are a lot of Docker related best practices- tips- and tricks which can dramatically improve the…

Last week was Docker Global Mentor Week 2016, a great initiative by Docker to help users improve at all skill levels. Docker is one of the key technologies in the resin.io stack, and we’ve found that there are a lot of Docker related best practices, tips, and tricks which can dramatically improve the resin.io developer experience. Docker already has a Best Practices collection, however not all of them apply to the resin.io use case. In the spirit of Global Mentor Week, in this blogpost I’ve collected our highest impact Docker tips for resin.io applications & hardware devices.

The notes below are divided into two main parts: Must Have practices that you should really use every time, and Nice to Have tips that can further improve your code and your experience, but are a bit less hard and fast.

Must Have

The following practices should save you a lot of pain during your development process.

Pin Software Versions

The clear winner of the best practices lineup is pinning the versions of all your dependencies. This includes the base images, the code you pull from GitHub, the libraries your code relies on, and so on. With versioning, you can tie down much easier a known-working release of your application. Without it, it’s easy for your components to change such that a previously working Dockerfile does not build anymore.

You can find the latest available date-tagged base image versions at the resin.io Docker Hub listing, just choose your base image and look at the Tags tab. For example, here’s the tag listing for resin/raspberrypi3-debian. Thus you should for example use jessie-20161119, instead of the plain jessie tag, as the latter changes day-to-day:

Dockerfile
FROM resin/raspberrypi3-debian:jessie-20161119

The structure of our base images changes sometimes (rarely, but it does), while with the date tag you can rely on a known good version of the base image (and courtesy of Docker, they will always be available for download).

A trickier thing is pinning the verison of the software installed from the operating system’s package manager. In Debian this would be running apt-get with specific version information, such as

Dockerfile
RUN apt-get update && \
apt-get install -yq --no-install-recommends \
i2c-tools=3.1.1-1 \
...

Same goes for Debian packages, Alpine packages, and Fedora packages, and their respective package managers. It takes a bit more legwork to set up pinned versions if you have a decent number of packages you’ve installed, but it’s worth it on the long run.

Quite often you’ll install software from version control (such as from git/GitHub), in which case there’s no excuse for not using specific commits, defined by a unique ID (such as hash/SHA for git), or a tag. Here’s an example of how you would check out a specific tagged version of the code with git:

“`Dockerfile

Can use tag or commit hash to set MRAAVERSION

ENV MRAAVERSION v1.3.0
RUN git clone https://github.com/intel-iot-devkit/mraa.git && \
cd mraa && \
git checkout -b build ${MRAAVERSION} && \

“`

Finally, the pinning should be applied to every library that you install, whether it’s using requirements.txt (Python), package.json (Node.js), Cargo.toml (Rust), or some other programming language’s package manager. Always pin (or often called lock or freeze) the external libraries to a version number or unique commit!

Clean up After Yourself

It’s common wisdom that one of the best ways to speed up a computer program is to eliminate unnecessary calculations (“make it do less”). The same goes for software deployment: the best way to speed up deploys and updates is not to ship code that is not needed. In our case: clean up after yourself and remove the unneeded bits from your container.

What are unneeded bits? Most commonly they are temporary files left behind the package manager or source code of software that is built and installed in your Dockerfile.

The way to clean up after the package manager depends on the distribution used in your base image. In the case of Debian and Raspbian, that’s apt-get, and Docker already has quite a bit of advice regarding using apt-get in a Dockerfile. It comes down to finishing up the installation step with the removal of temporary information such that:

Dockerfile
RUN apt-get update && \
apt-get install -yq --no-install-recommends \
<packages> \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

The last line above removes the temporary files left behind by apt-get that you won’t need on your device.

If you use Alpine Linux, the apk package management tool has a handy --no-cache option, which leaves behind nothing to clean up:

Dockerfile
RUN apk add --no-cache <package>

For Fedora, the dnf package manager can be handled similarly to apt-get:

Dockerfile
RUN dnf makecache && \
dnf install -y \
<packages> \
&& dnf clean all && rm -rf /var/cache/dnf/*

Cleaning up the source codes of installed software is usually quite simple, just removing the directories created in earlier steps of the build process. To keep with the MRAA example above, this would be one way to clean up after a git checkout:

Dockerfile
ENV MRAAVERSION v1.3.0
RUN git clone https://github.com/intel-iot-devkit/mraa.git && \
cd mraa && \
git checkout -b build ${MRAAVERSION} && \
<some build steps>
make install && \
cd .. && rm -rf mraa

Also make sure that you keep all the cleanup statements in the same RUN section, otherwise they will appear to be cleared up, but still present in the final Docker container as ballast.

Combine RUN Statements

The last note above leads me to the last Must Have practice, which is combining the RUN statements logically within your Dockerfile. The steps that logically belong together should be in the same statement, to avoid a couple of common problems, mostly related to caching and using disk space unnecessarily. First, you can have unexpected build outcomes due to caching. If your apt-get update step is in a separate RUN from your apt-get install <package> step, the former might be cached and not updated while you expect it to be. Similar things can happen if you separate your git clone and the actual build.

Second, files deleted in separate later RUN steps are retained in the final container, but not accessible (ballast).

The Docker documentation has a few more notes and background on this advice.

Nice to Have

The following practices are highly recommended, usually taking your experience from good to great, but not necessarily being a bottleneck for getting things done.

Order Dockerfile Statements

Docker tries to cache all the steps in your Dockerfile that has not changed, but if you change any statement, all the steps following it will be redone. You can save quite a bit of time in the build process by arranging your Dockerfile in order of least likely to more likely to change, whenever possible. For example, general setup such as setting working directory, enabling the initsystem, setting maintainers should happen earlier.

Dockerfile
MAINTAINER Awesome Developer <[email protected]>
WORKDIR /usr/src/app
ENV INITSYSTEM on

These statements can be followed by installing packages using the operating system’s package manager, then compiling your dependencies, enabling system services, and other setup. For example, towards the end of this section of your Dockerfile you should be installing your Python:

Dockerfile
COPY requirements.txt ./
RUN pip install -r requirements.txt

or Node.js dependencies.

Dockerfile
COPY package.json ./
RUN npm install

Copying your application source code should come near the end, as that is most likely to change most often. It could just be a “copy everything” command, such as:

Dockerfile
COPY . ./

This way you can speed up the build and deployment process, and your Dockerfile will be easier to read as well! The examples above are just for reference, the logical order can greatly depend on your particular application!

Use .dockerignore

Connecting to the previous step, always define a .dockerignore, to tell our builders what content from your source code would not need to go on the device itself, not copied in the COPY . ./ step. The ignored content can be the README.md or other documentation, images included with that documentation, or any other pieces that are not required for your application’s functionality but that you are keeping in the same repository for one reason or another.

~~Add MAINTAINER~~

~~Always add a MAINTAINER entry to your Dockerfile. At resin.io we definitely need to do this, as there are a large number of example applications on GitHub at resin-io-projects and the playground, and it is very useful to know at a glance who created the given project – and likely to know the most about it if anyone has questions. It’s a very lightweight approach to improve communication and transparency, thus I’d recommend it for every Docker project.~~

The MAINTAINER tag has been deprecated in Docker 1.13 the day after posting this blog. While resin.io devices run an earlier Docker at the time of writing, I cannot recommend this practice anymore. The replacement is using the LABEL tag, but for the usecase outlined above, it’s better to just include the relevant contact info in a README if needed, and not making it part of the Dockerfile anymore.

Use a Start Script

Having created (and debugged) a large number of projects, this one would be personal advice: don’t call your application right from the CMD step, but call a start script there:

Dockerfile
CMD ["bash", "start.sh"]

and then inside your start.sh you can have for example python app.py or any other way to start your application. The advantage is that it’s much easier to expand or add debugging steps to the start script than to constantly rewrite the CMD step. You want to emit some debug info before your main code starts? Just add as many lines and as much testing logic to your start script as you like.

On the other hand, you can also speed up your development and testing using resin sync. Resin sync can copy your application source code into one of the running devices and update it in place (without rebuilding the Dockerfile), then restart the container with the updated settings. However, it can only do that effectively if the file is not cached by Docker, for example due to being referenced in CMD directly.

Create a Non-Root User

By Docker default, the code in your application container is run by root. As a good preventive security practice, it’s recommended to create a non-root user, and grant it only as much privilege as needed. For example:

Dockerfile
RUN useradd --user-group --shell /bin/false resin
USER resin

This will create a user called resin, and run all subsequent steps as that user. See more on this in the Docker docs, or this blogpost.



Kitty enjoying cuddling best practices

For further research, check our documentation on Build Optimisation or the Docker Best practices for writing Dockerfiles (those that apply). You might also want to take a look at the Dockerfile Linter for general improvements and advice.

Do you have any other Docker best practice on resin.io that you would like to share? Leave your advice here in the comments, chat with us on Gitter, or drop by the forums! Would love to hear!


Posted

in

Tags: