Docker Builder Pattern

Published on 2017-11-02

The Docker Builder Pattern is a highly useful pattern for leveraging docker containers to generate artifacts and then package those artifacts in a runtime-only image. For languages which produce binaries or single-data archives (e.g. Go, Java/JVM languages, Rust, etc), this pattern minimizes production container sizes, accelerating deployment while reducing incidents of broken dependencies, conflicting build libraries, and permits centralized control of build tools.

Note, this does not apply to interpreted languages such as Python, Ruby, or NodeJS, though similar results can be achieved in NodeJS with webpack and other preprocessing measures.

Examples below include both Windows and Linux/OSX equivalent commands.

Example Problem

Check out the code from https://github.com/tydavis/hello-world-docker and make sure you have Docker installed and running.

Build the Binary

If you don't have the Go compiler installed, don't worry! You're not going to need it.

Make sure your shell is in the hello-world-docker directory and execute the following command:

For both Linux and Windows (Powershell)

docker run -v ${PWD}:/go/src/github.com/tydavis/hello-world-docker \
-w /go/src/github.com/tydavis/hello-world-docker -it golang:alpine \
/bin/sh -c "CGO_ENABLED=0 go build "

Let me walk you through this command:

  1. "-v …" binds the current working directory to the relevant location inside the container (the part after the colon : )
  2. "-w …" sets the container's current working directory to your mounted directory
  3. "-it" grants you an interactive terminal connection
  4. "golang:alpine" is the latest set of Go compiler and utilities built on top of Alpine Linux. Since we don't need extra utilities like the race detector or other glibc-exclusives, it's a safe choice.
  5. /bin/sh -c "CGO\_ENABLED=0 go build" -- this command disables dynamic linking, which creates a statically-linked binary.

What did we actually do? We just created a Go binary without installing anything to our machine.

Build the Docker image

Now, with the binary created, it's just another file to docker. We can build our "production" image with:

docker build -t hello-world-docker:1 .

If you dig into the Dockerfile, you'll see that we start with the scratch layer. That means the only thing in this container is the binary. Let's look at the image size:

tydavis@utils:~/go/src/github.com/tydavis/hello-world-docker$ docker images
REPOSITORY           TAG                 IMAGE ID            CREATED         SIZE
hello-world-docker   1                   fc2081dda00b        10 seconds ago  2.03MB
golang               alpine              6e8378057093        7 days ago      269MB
tydavis@utils:~/go/src/github.com/tydavis/hello-world-docker$ du -h hello-world-docker
2.0M    hello-world-docker

What about Multi-Stage Builds

Multi-stage builds take the wrong approach.

Without using mount-points, users have been ADDing or COPYing their entire codebase into the container image via a docker build command, then using docker cp to extract the resulting artifacts. For languages which emit artifacts (binaries, or single archives like JARs), copying code into a container is fundamentally flawed.

The multi-stage concept takes this further down the "wrong" path, encouraging this same copy-code-into-image mindset and providing an unnecessary function to discard the image inline during build process. As demonstrated above, one does not have to build or modify the tools/compiler container every time, meaning artifact-build-time is significantly faster than the multi-stage process, even with build layer caching.

Conclusion

If one is using a language that permits generating artifacts (Go[lang], Java/JVM languages, C/++, etc) then copying code into the image will unnecessarily bloat the result. One should be using the builder pattern instead.

Conversely, if using something like Python, Ruby, or another interpreted language, then copying into the image may be the only solution due to runtime environment requirements.