Docker build caching in CI (for multi-stage builds)

If you are building Docker images in CI, there is a good chance you will want to set up build caching - images can really take a long time to build. This can be particularly frustrating as while developing locally you have quick builds with layer caching, that you lose in an ephemeral CI job. We can speed up those builds by setting up layer caching, but it won’t work without some config.

If you are building docker images with a self-hosted runner, you have probably enabled privileged execution to run docker-in-docker. The example in this post is for GitLab. You should spend some time understanding what that means as it’s very different security-wise to normal docker execution of CI jobs. Using buildah or buildkit rootless means an unfamiliar build process if you build with docker locally, so this post uses docker-in-docker. As a starting point, in privileged mode the CI job container (run by gitlab-runner) will effectively execute as root on the gitlab-runner host. You would need to be extremely careful what runs in the CI job (e.g. what is in .gitlab-ci.yml). As part of the docker build inside the job, the build containers will still be isolated like a normal build, so there is less risk from malicious code inside the Dockerfile, although there are still risks.

The first thing that is helpful to get your head around is that your docker build cache is separate to your images. You can delete all your images whilst maintaining your cache, or clear your cache (docker buildx prune -a) whilst keeping your images. The build cache metadata lets docker link image layers to Dockerfile instructions, and to decide whether to rebuild a layer or not.

Docker’s legacy builder had a way to use an image for build caching, people used this in CI so a lot of resources online will suggest this and it is quite confusing, but this doesn’t work with modern docker using BuildKit. See footnote.

By default BuildKit uses a local cache storage backend which is what you are likely using when building locally; local caching isn’t any good in CI where we have an ephemeral file system. Thankfully, BuildKit also supports other cache backends for caching to an external location, which is great for CI! There is a note in the docs that say that say something about other cache storage backends needing a ‘different driver’… what?

Aside: jargon busting

I am not a massive fan of how Docker’s documentation uses conflicting bits of terminology carelessly. I had to get my head around some of jargon to get this all working. This is just some of the terminology for this post.

docker engine: A client-server application for doing most things with docker, managing images, containers, networking etc. It includes the docker daemon (dockerd), which is the server part of that model. The docker CLI is normally the client.
BuildKit is the backend to the docker build part of the docker engine. It builds images. The reason it has a funky name is that it is a replacement to the legacy docker builder. It started out as a more powerful builder but now is the default.
build driver: sometimes just called driver in the docs when in a building context. These are different options for how BuildKit is executed.
- docker driver: The default builder, which is BuildKit bundled into the docker daemon (dockerd).
- docker-container driver: if this is used, a separate BuildKit container is spawned to do the building
- I think these are awful names, we are seriously overloading the word docker now. There are also kubernetes and remote drivers. The docs also have storage drivers and logging drivers; they’re not related.
buildx: Basically just a subcommand of the docker CLI to access more advanced options in BuildKit. It is a separate subcommand because for a while buildx let you use BuildKit while docker build used the legacy builder, but now docker build also uses BuildKit.

Back to it: external build caches

The most simple external build cache is the inline cache. Which you can use like this:

docker buildx build --push -t <registry>/<image> \
  --cache-to type=inline \
  --cache-from type=registry,ref=<registry>/<image> .

With the inline cache, metadata is stored in the image itself which is nice and simple but has two downsides:

cache metadata is stored in the image which will bloat it (probably not massively though)
As it is stored in your single final image, it won’t cache a multi-stage build.

I think most applications will benefit from a multi-stage build, reducing the size and security risk of the image is easily worth any added complexity for me.

Externally caching a multi-stage build

Let’s start working with an actual example, for a Go web server. We are building the Go binary, baking in a commit hash to the web server to help with debugging, and then copying to binary to a distroless image. As we aren’t using any CGO, we can use the static distroless image. It is only 2MB, and effectively just static assets so you are about as safe from CVEs as you can get. This is a great tutorial about the distroless images.

FROM golang:1.24 AS build-stage

ARG GIT_COMMIT

WORKDIR /app

COPY go.mod ./
RUN go mod download
COPY . ./

RUN CGO_ENABLED=0 GOOS=linux go build \
    -ldflags "-X main.GitCommit=${GIT_COMMIT}" \
    -o /server main.go

# Deploy the application binary into a lean image
FROM gcr.io/distroless/static-debian12 AS app

ARG GIT_COMMIT
LABEL git_commit=$GIT_COMMIT

WORKDIR /

COPY --from=build-stage /server /server

EXPOSE 8080

USER nonroot:nonroot

ENTRYPOINT ["/server"]

Now we have our multi-stage Dockerfile, we need to swap cache storage backends. Our best option here is the registry backend, which puts the build cache into a separate image. Note that we can call our cache image whatever we like, and even push it to a separate registry:

docker buildx build --push -t <registry>/<image> \
  --cache-to type=registry,ref=<registry>/<cache-image>[,parameters...] \
  --cache-from type=registry,ref=<registry>/<cache-image> .

There are two pitfalls with the registry cache:

The registry cache (as of docker 28.3.0) is not supported by the docker build driver. Instead we must use the docker-container build driver. See jargon busting above.
It has two mode settings, min and max. The min mode only caches the final image, not previous stages, so for a multi-stage build we need to opt-in to max. I lost quite a lot of time to this, partly due to trusting an LLM rather than the documentation.

To set up the docker-container driver:

docker context create my-builder
docker buildx create my-builder --driver docker-container --use

Example for GitLab CI

Here is a complete .gitlab-ci.yml, assuming you want to build on commits to main and tag the image as main:

build:
  stage: build
  image: docker:28.3.0
  services:
    - docker:28.3.0-dind
  variables:
    IMAGE_NAME: $CI_REGISTRY/myapp
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login $CI_REGISTRY -u $CI_REGISTRY_USER --password-stdin
  script:
    - docker context create builder-context
    - docker buildx create --use --driver docker-container --name mybuilder builder-context
    - |
      docker buildx build --push \
        -t $IMAGE_NAME:$CI_COMMIT_REF_SLUG \
        --cache-to type=registry,ref=$IMAGE_NAME:cache,mode=max \
        --cache-from type=registry,ref=$IMAGE_NAME:cache \
        .
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

This was mostly taken from GitLab’s docs.

$CI_COMMIT_REF_SLUG is the branch name on branches and tag on tags, slugified.

In that CI definition we are using a single cache image tag. This means builds on git tags would benefit from cached builds on dev, for example. Depending on your git strategy, it might make sense to have separate cache images for main and dev. You could set the cache-to/from to reference cache-main and cache-dev on branch builds. But for git tag builds, you could --cache-from cache-main assuming you tag releases from main.

Footnotes

With the legacy docker builder, there was a --cache-from flag, which allowed you to specify images that the builder could attempt to use for caching. It would try and match layers up with Dockerfile instructions, but that doesn’t seem to be an option with BuildKit.