Gitlab-ci CICD pipelines can be run in docker containers. Although you can start from a common docker image and use before_script to install utilities used in your gitlab stage, it is better to create your own docker image. Indeed, it will make your CICD:

  • clearer: no need to skip first hundred installation logs lines to get to actual build logs

  • faster: no need to spend time installing packages at every build

  • more resilient: no need to worry about failures of repository servers containing packages to be installed

In this post, I will show you how to easily create a docker image with all needed utilities and push it to dockerhub in order to use it in gitlab-ci. As an example, I will create a docker image with the following utilities:

  • cloud utilities: awscli and databricks-cli

  • other utilities : git, jq, curl, envsubst

First, I need a base docker image as a start point to my custom docker image.

Base docker image

As base docker image, I use python image based on alpine because it is a small docker image. If you’re not familiar with this docker image, it is a good idea to explore it to see what utilities are already in it. You can do so by retrieving the docker image, launching and connect to it with a shell:

docker pull python:3.9-alpine
docker run -d -t --rm --name test python:3.9-alpine
docker exec -i -t test sh

Let’s look at those commands. First, docker pull retrieves docker image from dockerhub.

Next, docker run launches container from a retrieved docker image. Options of docker run command are:

  • -d for detach, container will be run in background

  • -t to allocate a pseudo tty, here it prevents container to stop immediately after being setup

  • --rm to destroy container once we stop it. It is useful when you don’t expect to start this container again, hence for testing

  • --name test to name your container, as it is easier to use an human-readable name instead of a container hash identifier

Finally, docker exec command connects you to your container. This command takes two arguments, the name/id of the container, and the command to execute on your container. To execute a shell, use sh as for alpine docker images, bash is not installed. Options of docker exec are:

  • -i for interactive, to be able to interact in your terminal

  • -t to allocate a pseudo tty.

Those two options combined with sh command creates an interactive shell

Once you finished, you can quit your container with exit and then stop your container:

docker stop test

As option --rm was set when we started the container, the container is destroyed and all commands you ran in this container are forgotten.

I now have my base docker image, time to customize it.

Customize docker image

To transform my base docker image to my desired docker image, with all utilities needed for my CICD, I need to install them. To do so, I can launch a container from base docker image, connect to it as explained above, install all I need and create a new docker image from my running container using docker commit. However, to create a lasting docker image, It is better to use a Dockerfile. First, I create a file named Dockerfile in an empty directory:

mkdir cicd-aws-databricks
touch cicd-aws-databricks/Dockerfile

It is very important to put Dockerfile in a dedicated directory, as building docker image from Dockerfile requires to pass a directory containing a Dockerfile as argument. I put the following lines in my Dockerfile:

FROM python:3.9-alpine

RUN apk add --no-cache --update git jq curl groff && \
    pip3 install --no-cache-dir awscli envsubst databricks-cli

I set base docker image using FROM instruction and then I execute commands using RUN instruction. You can notice two things: first, instead of having two RUN instructions, one for apk add and other for pip3 install, I execute them in one RUN instruction. It is actually a docker best practices in order to avoid too many layers. Second, I set "no cache" options for both apk add and pip3 install commands. Indeed, as I want to create the smallest docker image, I don’t keep useless cached packages.

Next, I can build my docker image using my Dockerfile with docker build. I pass the directory containing my Dockerfile as argument:

docker build --tag mydockerusername/cicd-aws-databricks:latest cicd-aws-databricks

I use --tag option to setup my docker image’s name. As I will push this docker image on repository mydockerusername/cicd-aws-databricks with the version latest, I tag this docker image with mydockerusername/cicd-aws-databricks:latest.

Now I have my docker image on my local machine, it is time to share it with the world.

Push docker image on dockerhub

I’ve already created an account on dockerhub. To push my docker image, I need to add a repository to this account. To do so, I go to repositories and I click on "Create Repository". Then I name my repository cicd-aws-databricks. The complete name of my repository will be mydockerusername/cicd-aws-databricks.

Then I push my docker image to my repository:

docker login --username=mydockerusername
docker push mydockerusername/cicd-aws-databricks:latest

And it’s done, my docker image is uploaded on dockerhub: https://hub.docker.com/r/vincentdoba/cicd-aws-databricks. Now I can use this docker image on gitlab-ci, by adding the following line at the top of my gitlab-ci.yml:

image: mydockerusername/cicd-aws-databricks:latest