Multi-stage Docker Builds for Dolt
Imagine if every change to your database was as traceable as every line of code in your Git repository. What if you could branch your production database for testing, merge schema changes with confidence, and roll back data migrations without breaking a sweat?
Welcome to Dolt, a MySQL-conformant database that brings Git-like superpowers to your data. Dolt transforms how we think about database evolution, making every table change auditable.
If you want a quick way to try out Dolt, we provide a basic Docker image that lets you experiment in seconds. At DoltHub, we are big fans of Docker. One of my colleagues, Dustin, even wrote a Docker trilogy! It starts with a Beginner's Guide, continues with a guide on Docker Compose, and finishes with an article on Docker Swarm.
If you're unfamiliar
with Docker, it's a platform that packages applications and their dependencies into
lightweight, portable containers that you can run anywhere that has Docker installed. If you want to quickly set up
users and permissions, we also provide a dolt-sql-server
image,
which also handles the above and is our main driver for today on multi-stage Docker builds.
While our previous Docker content from the Beginner's Guide to Getting Started tutorials taught you how to use Docker with Dolt, this blog will cover how to create Docker builds from the ground up.
What is a Dockerfile?
Before we walk the plank of multi-stage builds, let's quickly recap what a Dockerfile is. Similar to code, a Dockerfile is a text file that contains a series of directions that Docker uses to build a Docker image. You can think of a Docker image as a snapshot of a filesystem that includes everything needed to run an application. Docker images can be run as containers, which are isolated environments that share the host system's resources, but behave like independent machines.
Each instruction in the Dockerfile creates a new building block, a.k.a. layer, to the image. Docker caches these
layers to make later builds faster. You must create a new build when you change your Dockerfile and run the
docker build
command. Otherwise, if no changes are detected, Docker will use the cached layers from previous builds to
create the image faster.
A Dockerfile instruction is constructed in a way that makes it readable, even for those with little to no experience:
# FROM specifies the base image to build upon, a.k.a., the starting point (i.e., OS) for your image
FROM ubuntu:22.04
# ARG (i.e., argument) that can be passed at build time through the terminal `docker build --build-arg DOLT_VERSION=latest .`
ARG DOLT_VERSION
# RUN can be execute terminal commands inside the image, `\` is used to split commands across multiple lines with `&&` chaining different commands
RUN apt update -y && \
apt install -y \
curl \
tini \
ca-certificates && \
apt clean && \
rm -rf /var/lib/apt/lists/*
# we install dolt with the install.sh script, which will determine the platform/arch of the container
# and install the proper dolt binary
RUN bash -c 'curl -L https://github.com/dolthub/dolt/releases/download/v${DOLT_VERSION}/install.sh | bash'
RUN /usr/local/bin/dolt version
# VOLUME is a mount point for external storage, here we use it to persist dolt databases which are stored in /var/lib/dolt
# so that they are not lost when the container is removed
VOLUME /var/lib/dolt
# WORKDIR sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow it in the Dockerfile
WORKDIR /var/lib/dolt
# ENTRYPOINT sets the command and parameters that will be executed when the container starts
ENTRYPOINT ["tini", "--", "/usr/local/bin/dolt"]
These instructions above run in a sequential manner, and create a Docker image that has Dolt installed and ready to use.
Now all that's left is to run the following command in the terminal to build the image. We use the -t
option to name
and tag the image using the name:tag
format:
$ docker build --build-arg DOLT_VERSION=1.59.7 -t dolt:1.59.7 .
[+] Building 0.9s
(11/11)
FINISHED docker:desktop-linux
= >[internal] load build definition from Dockerfile 0.0s
= >= >transferring dockerfile: 617B 0.0s
= >resolve image config for docker-image://docker.io/docker/dockerfile:1.3-labs 0.2s
= >CACHED docker-image://docker.io/docker/dockerfile:1.3-labs@sha256:250ce669e1aeeb5ffb892b18039c3f0801466536cb4210c8eb2638e628859bfd 0.0s
= >= >resolve docker.io/docker/dockerfile:1.3-labs@sha256:250ce669e1aeeb5ffb892b18039c3f0801466536cb4210c8eb2638e628859bfd 0.0s
= >[internal] load .dockerignore 0.0s
= >= >transferring context: 2B 0.0s
= >[internal] load metadata for docker.io/library/ubuntu:22.04 0.2s
= >[1/5] FROM docker.io/library/ubuntu:22.04@sha256:4e0171b9275e12d375863f2b3ae9ce00a4c53ddda176bd55868df97ac6f21a6e 0.0s
= >= >resolve docker.io/library/ubuntu:22.04@sha256:4e0171b9275e12d375863f2b3ae9ce00a4c53ddda176bd55868df97ac6f21a6e 0.0s
= >CACHED [2/5] RUN apt update -y && apt install -y curl tini ca-certificates && apt clean &&
rm -rf /var/lib/apt/lists/* 0.0s
= >CACHED [3/5] RUN bash -c 'curl -L https://github.com/dolthub/dolt/releases/download/v${DOLT_VERSION}/install.sh | bash' 0.0s
= >CACHED [4/5] RUN /usr/local/bin/dolt version 0.0s
= >CACHED [5/5] WORKDIR /var/lib/dolt 0.0s
= >exporting to image 0.1s
= >= >exporting layers 0.0s
= >= >exporting manifest sha256:b2d9ae376ffdf63fa4eaee5bb4384a405fc14c5790a9b14856dfc053c78ca47b 0.0s
= >= >exporting config sha256:76e40bc5f15dd4062dde14b82f1a083ccbf86d68b2ebf40ed7e927b421a8fda9 0.0s
= >= >exporting attestation manifest sha256:bd78cbf03f155e2767840e0a1c1ae3289debf956060e978a2f217648460c9042 0.0s
= >= >exporting manifest list sha256:93aa2e4fb8ecada8adee7059c099d709935aead5d706fdffedd8b509b5205bfd 0.0s
= >= >naming to docker.io/library/dolt:1.59.7 0.0s
= >= >unpacking to docker.io/library/dolt:1.59.7
The .
at the end specifies the build context, which contains the files and subdirectories for the build to use (e.g.,
if you're copying files into the image, or the Dockerfile is not in the current directory). In this case, we don't have
any files to copy, so we just use the current directory.
If any instruction fails, the build process will stop, and you'll see an error message similar to this:
[+] Building 2.6s (8/10) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
=> ...
=> ERROR [3/5] RUN bash -c 'curl -L https://github.com/dolthub/dolt/releases/download/v${DOLTf_VERSION}/install.sh | bash'
There's extra options, i.e., -f
to specify the Dockerfile context alone without the build context. This may not seem
useful now, but later, you'll see we can leverage this to change build behavior conditionally.
What’s the Deal with Multi-Stage Builds?
Alright, so far, we've seen a Dockerfile that builds a Dolt image from a pre-compiled binary through a URL. We also understand that each instruction is predominantly creating a new layer, and that Docker caches these layers to make later builds faster. So, what's the big deal with multi-stage builds?
Well, layers aren't perfect. A single change to any instruction invalidates the cache for that layer and all subsequent layers. This is time-consuming, and frustrating, especially with instructions that are not likely to change but require a lot of time to execute.
Furthermore, our current image has only one purpose. What if we wanted to build Dolt from source for development purposes? You don't want to rebuild the same base layers again and again because you changed the build process. You also don't want to ship a large image with all the build tools and dependencies if you're running multiple containers (instances) of your application in production.
Enter multi-stage builds! You may have concluded the single-stage Dockerfile's FROM
instruction is limited to be
the first and only entrypoint that can exist. But, this isn't the case. A Dockerfile is more like a
dock (get it?), where you can have multiple entry points, a.k.a., stages, that each brings their own resource and
supplies from their manufacturing plant (i.e., base image) to serve an aggregated purpose:
# syntax=docker/dockerfile:1.3-labs
FROM debian:bookworm-slim AS base
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
curl tini ca-certificates && \
rm -rf /var/lib/apt/lists/*
# We use bookworm since the icu dependency ver. between the base and golang images is the same
FROM golang:1.25-bookworm AS build-from-source
ENV DEBIAN_FRONTEND=noninteractive
ARG DOLT_VERSION
# ...
FROM base AS download-binary
ARG DOLT_VERSION
# ...
FROM base AS runtime
# ...
I've taken out the details for brevity, but you can see we have four stages here: base
, build-from-source
,
download-binary
, and runtime
. Each stage has its own FROM
instruction, and can moreover have its own set of
instructions, similar to the single-stage Dockerfile we saw earlier.
Since each stage is treated as an independent build, they can be built in parallel! Naturally, this also means that if you change something in one stage, it doesn't invalidate the cache for the other stages. No more subsequent layer rebuilds!
Of course, we need a method to combine the artifacts (i.e., files, binaries, etc.) from these stages into a final image.
This is where the COPY --from=<stage>
instruction comes in handy:
# syntax=docker/dockerfile:1.3-labs
FROM debian:bookworm-slim AS base
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
curl tini ca-certificates && \
rm -rf /var/lib/apt/lists/*
# We use bookworm since the icu dependency ver. between the base and golang images is the same
FROM golang:1.25-bookworm AS build-from-source
ENV DEBIAN_FRONTEND=noninteractive
ARG DOLT_VERSION
# ...
FROM base AS download-binary
ARG DOLT_VERSION
RUN if [ "$DOLT_VERSION" = "latest" ]; then \
# Fetch latest version number from GitHub API
DOLT_VERSION=$(curl -s https://api.github.com/repos/dolthub/dolt/releases/latest \
| grep '"tag_name"' \
| cut -d'"' -f4 \
| sed 's/^v//'); \
fi && \
if [ "$DOLT_VERSION" != "source" ]; then \
curl -L "https://github.com/dolthub/dolt/releases/download/v${DOLT_VERSION}/install.sh" | bash; \
fi
FROM base AS runtime
COPY /usr/local/bin/dolt /usr/local/bin/
RUN /usr/local/bin/dolt version
RUN mkdir /docker-entrypoint-initdb.d && \
mkdir -p /var/lib/dolt && \
chmod 755 /var/lib/dolt
COPY docker/docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
VOLUME /var/lib/dolt
EXPOSE 3306 33060 7007
WORKDIR /var/lib/dolt
ENTRYPOINT ["tini", "--", "docker-entrypoint.sh"]
Here, a different layer handles the downloading of the binary, and the final runtime
stage copies the binary from the
download-binary
stage into the final image. We don't have any extra dependencies when downloading, but it's
nice to separate concerns. This way, if we ever need to change the download process, we don't have to rebuild the base
layer once again.
The behavior of download-binary
has changed a bit here too. We now support three build scenarios through the
DOLT_VERSION
build argument: latest
, x.y.z
(i.e., specific version), and source
. The first two scenarios
download a pre-compiled binary, while source
prevents the download from happening altogether.
The docker-entrypoint.sh
is a script we have
to initialize users and permissions for the SQL server. This is unrelated for our discussion today, but if you're
curious on creating a similar entrypoint
script, check it out!
Using Glob Patterns for Conditional Builds
You may have noticed I've been avoiding touching the build-from-source
stage. This is because I don't want to include
both my source binary and downloaded binary in the final image. Unfortunately, Docker'sCOPY --from=
instruction
requires the source path (first argument) to exist, otherwise, the build will fail. We could have both binaries exist
in the final image, but that would bloat the image size unnecessarily. We do have those conditional statements above
with DOLT_VERSION
, but the COPY
instruction unfortunately doesn't support them.
This all may seem inconsequential so far, but I was recently tasked with creating a battery of tests for our sql-server image. I'd like to use our existing bats testing framework, and a new separate Dockerfile building from source to run such tests would decouple production and development concerns. These tests are vital in stopping regressions customers may encounter when using the image in production.
Luckily, with a bit of Google and implicit conditional logic we can have our cake and eat it too! Turns out, Docker's
COPY
command supports glob patterns. Glob patterns specify sets of filenames with wildcard characters like *
and
?
. This solves our issue of failing on missing files, because if the pattern doesn't match any files, Docker will
simply skip the copy operation without throwing an error.
When we combine this behavior with build arguments, we get our desired conditional build behavior. There a number of mechanics at play here, so let's take it one stage at a time:
# syntax=docker/dockerfile:1.3-labs
FROM debian:bookworm-slim AS base
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
curl tini ca-certificates && \
rm -rf /var/lib/apt/lists/*
# We use bookworm since the icu dependency ver. between the base and golang images is the same
FROM golang:1.25-bookworm AS build-from-source
ENV DEBIAN_FRONTEND=noninteractive
ARG DOLT_VERSION
# COPY doesn't support conditionals, so we rely on the path context to maybe have a dolt/ directory
# to distinguish between source and binary builds using DOLT_VERSION=source too.
COPY dolt*/ /tmp/dolt
# Check for source to avoid unnecessary installation of build dependencies
RUN if [ "$DOLT_VERSION" = "source" ]; then \
cd /tmp/dolt/go || { echo "Make sure the `dolt/` directory exists in your workspace to build from source."; exit 1; }; \
apt-get update -y && \
apt-get install -y libicu-dev && \
rm -rf /var/lib/apt/lists/* && \
go build -o /usr/local/bin/dolt ./cmd/dolt && \
chmod +x /usr/local/bin/dolt; \
fi
# ...
There's existing comments in this code snippet that explain the behavior, but for further context: we instruct users to
build their docker images from the ~/dolt/docker
directory. We take advantage of this knowledge to conditionally copy
when the dolt/
directory exists in the build context. We later check for the DOLT_VERSION=source
argument and
attempt to switch into the /tmp/dolt/go
directory that should exist as a result, if not, we exit with an informative
error message.
I've skipped the download-binary
stage since it's unchanged, so let's look at the final runtime
stage:
# syntax=docker/dockerfile:1.3-labs
FROM debian:bookworm-slim AS base
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
curl tini ca-certificates && \
rm -rf /var/lib/apt/lists/*
# ...
FROM base AS runtime
ARG DOLT_VERSION
# icu dependency for source builds
RUN if [ "$DOLT_VERSION" = "source" ]; then \
apt-get update -y && \
apt-get install -y --no-install-recommends libicu-dev && \
rm -rf /var/lib/apt/lists/*; \
fi
# Only one binary is possible due to DOLT_VERSION, so we optionally copy from either stage
COPY /usr/local/bin/dolt* /usr/local/bin/
COPY /usr/local/bin/dolt* /usr/local/bin/
RUN /usr/local/bin/dolt version
RUN mkdir /docker-entrypoint-initdb.d && \
mkdir -p /var/lib/dolt && \
chmod 755 /var/lib/dolt
COPY docker*/docker-entrypoint*.sh /usr/local/bin/
COPY dolt*/docker*/docker-entrypoint*.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
VOLUME /var/lib/dolt
EXPOSE 3306 33060 7007
WORKDIR /var/lib/dolt
ENTRYPOINT ["tini", "--", "docker-entrypoint.sh"]
First thing to note is that we conditionally install the libicu-dev
package only when building from source. You may
have also noticed this is why our image is based on debian:bookworm-slim
, since the golang:1.25-bookworm
image we
use for building from source has the same version of the libicu
dependency. We avoid building our own golang image
to make use of existing caching systems Docker provides.
Next, when it comes to the copy instructions, we profit from the DOLT_VERSION
argument forcing only one binary to
exist. This means that only one of the COPY --from=
instructions will actually copy a file, while the other will be
skipped due to the wildcard pattern not matching any files.
Finally, we also conditionally copy the docker-entrypoint.sh
script, from both the docker/
and dolt/docker/
directories to support both build contexts. Source builds require the full dolt/
directory to exist, while binary
builds can be done from just the docker/
directory, so this works out nicely, maintaining existing behavior.
This is also works since only one of the paths will exist depending on the current working directory or build context. You can purposely break this by creating both directories, but I'll be blaming that one on you 🫵.
The result? A single Dockerfile that can build both source and binary images, with minimal size and the benefit of those parallel builds! Let's see how to build each scenario.
# Source build from ~/dolt_workspace
docker build -f dolt/docker/serverDockerfile --build-arg DOLT_VERSION=source -t dolt-sql-server:source .
# Binary build from ~/dolt_workspace/dolt
docker build -f docker/serverDockerfile --build-arg DOLT_VERSION=1.59.7 -t dolt-sql-server:1.59.7 .
Notice how we change the -f
argument to point to the same Dockerfile, but from different contexts. After all, source
builds require the full dolt/
directory to exist, still, this also opens up local dependencies to be copied into the
image if needed. Simply add another COPY
instruction inside the build-from-source
stage:
COPY dolt*/ /tmp/dolt
COPY go-mysql-server*/ /tmp/go-mysql-server
We can now try our new image by running the Docker container as we normally would. Here's an example utilizing that entrypoint script to set up a root user with a password and remote access:
docker run -e DOLT_ROOT_PASSWORD=secret2 -e DOLT_ROOT_HOST=% -p 3307:3306 dolt-sql-server:source
# or for the binary image
docker run -e DOLT_ROOT_PASSWORD=secret -e DOLT_ROOT_HOST=% -p 3307:3306 dolt-sql-server:1.59.7
Conclusion
Multi-stage Docker builds give you the best of both worlds: lean production images and flexible development builds, all from a single Dockerfile. By separating concerns into stages and leveraging glob patterns for conditional copies, you can speed up builds, reduce image size, and keep dev/test workflows close to production without duplication.
Hopefully, you've found some benefit or ideas in incorporating these features into your Dockerfiles in the future. If you have any questions or concerns, join us on Discord to chat with the team!
Once more, pull our Docker Hub images and spin up Dolt in seconds (unless you build from source, of course, 🙂↕️). Otherwise, if you want Dolt on your machine natively, check out our installation guide.
We have more blogs like this one in our blog page, so if you learned something and want to learn more, check them out!