You have to create a docker image containing an artifact. However to be built, this artifact requires tools that you don’t need to put in your docker image. How to ensure to have the smallest docker image without loading useless tools only used for building artifact ? The solution is docker multi-stage builds.

I want to create a docker image with a file countries.csv in its root directory, containing all countries in the world, that I retrieve from internet. To retrieve this file, I need curl software. So countries.csv file is my artifact and curl is my build tool. How to create an image with my countries.csv artifact but without curl ?

Simple build

I can, from a small docker image, install curl, download file and then remove curl, as shown in Dockerfile below:

FROM alpine:3.12
RUN apk add --no-cache --upgrade curl && \
    curl https://raw.githubusercontent.com/stefangabos/world_countries/master/data/en/countries.csv -o /countries.csv && \
    apk del curl

I build docker image with the docker build command:

% docker build --tag simple:latest /path/to/dockerfile/directory

I get a docker image with my artifact countries.csv without loading curl in it. However, this docker image build has two issues. First, every time I install a tool for building artifact, I need to uninstall it after build, making commands less readable. Second, even if I uninstall tools used to build artifact, I may forget to clean other remaining prints of my building process such as cache. To not bothering about cleaning after building, I can use multistage build.

Multistage build

In multistage build, I build two docker images. The first one, which will not be released, is here to build the artifact. Once the artifact is built, I start building a second image and copy the built artifact from first image to second one. At the end, I keep only the second docker image, containing only my artifact. Here is my Dockerfile:

FROM alpine:3.12 AS builder
RUN apk add --no-cache --upgrade curl && \
    curl https://raw.githubusercontent.com/stefangabos/world_countries/master/data/en/countries.csv -o /countries.csv

FROM alpine:3.12
COPY --from=builder /countries.csv /countries.csv

Each stage build starts with FROM alpine:3.12 instruction. First stage is similar to previous section’s simple build, except that I don’t bother to remove curl after using it to retrieve countries.csv file. I can name the first stage with AS keyword. Here I named the first stage builder. The second stage only contains a COPY instruction, that copy file /countries.csv from builder docker image to second docker image.

I build docker image with the docker build command:

% docker build --tag multistage:latest /path/to/dockerfile/directory

I got a docker image with my artifact countries.csv and only my artifact. If we compare the two docker images using docker images, we can see that the multistage docker image is smaller than the simple docker image:

% docker images
REPOSITORY                        TAG               IMAGE ID       CREATED              SIZE
simple                            latest            7cd21bc1ecc6   3 seconds ago        5.82MB
multistage                        latest            273812201f1f   About a minute ago   5.58MB

Of course, as this post presents a toy example, size difference is not that important. However, it grows for more complexe builds. You can find a more complete build example on Alex Ellis blog.