Understanding the nuances of using a Docker image for your CI/CD job

March 26, 2023

Many CI/CD providers, like GitHub Actions and CircleCI, offer the options to run your CI/CD job using Docker images today.

This is a useful feature since you can ensure your job is always running with the same pre-installed dependencies.
Teams may choose to bring their own Docker image, or use readily-available community images on Docker Hub for instance.

One challenge is that the Docker image in use may not have been intended for use in CI/CD automation.
Your team may thus find yourselves trying to debug puzzles like:

I'm pretty sure XYZ is installed. Why does the CI/CD job fail to find XYZ?
Why is the FOOBAR environment variable different from what we have defined in the Docker image?

Docker image 101

When you execute the docker container run ... command, Docker will, by default, run a Docker container as a process based on the ENTRYPOINT and CMD definitions of your image. Docker will also load the environment variables declared in the ENV definitions.

Your ENTRYPOINT and CMD may be defined to run a long-running process (e.g., web application), or a short process (e.g., running Speccy to validate your OpenAPI spec).
This will depend on the intended use of your image.

In addition, Docker images will be designed to come with just-enough tools to run its intended purpose.
For example, the wework/speccy image understandably does not come installed with git or curl (see Dockerfile).

Docker images may also be published for specific OS architectures only (e.g., linux/amd64). You will want to confirm which OS and architecture the image can be run on.

These are important contexts, when designing CI/CD jobs using Docker images.

Understanding Docker images for CI/CD

Generally, for CI/CD automation, your job will run a series of shell commands in the build environment.

CI/CD providers like GitLab CI and CircleCI achieve this by override your Docker image's entrypoint with/bin/sh or /bin/bash when executing them as containers.

This is why you would want to use the -debug tag variant for Kaniko's Docker image when using in CircleCI for instance.

Additionally, your Docker image may not come with the required tools for your CI/CD automation.
For example, you would require git in order to clone the repository as part of the CI/CD job steps.

Debug Cheatsheet

With this information in mind, here is a list of commands you can run locally to debug your chosen Docker image.

# inspect the "built-in" environment variables of an image
$ docker image inspect docker.io/amazon/aws-glue-libs:glue_libs_2.0.0_image_01 | jq ".[0].Config.Env"
[
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  "LANG=en_US.UTF-8",
  "PYSPARK_PYTHON=python3",
  "SPARK_HOME=/home/glue_user/spark",
  "SPARK_CONF_DIR=/home/glue_user/spark/conf",
  "PYTHONPATH=/home/glue_user/aws-glue-libs/PyGlue.zip:/home/glue_user/spark/python/lib/py4j-0.10.7-src.zip:/home/glue_user/spark/python/",
  "PYSPARK_PYTHON_DRIVER=python3",
  "HADOOP_CONF_DIR=/home/glue_user/spark/conf"
]

# check the default entrypoint
$ docker image inspect docker.io/amazon/aws-glue-libs:glue_libs_2.0.0_image_01 | jq ".[0].Config.Entrypoint"
[
  "bash",
  "-lc"
]

# check the default cmd
$ docker image inspect docker.io/amazon/aws-glue-libs:glue_libs_2.0.0_image_01 | jq ".[0].Config.Cmd" 
[
  "pyspark"
]

# check tools installed
$ docker container run --rm docker.io/amazon/aws-glue-libs:glue_libs_2.0.0_image_01 "git --version"
...
git version 2.37.1

$ docker container run --rm docker.io/amazon/aws-glue-libs:glue_libs_2.0.0_image_01 "python --version"
...
Python 2.7.18

You can also find an example of this debugging for a CircleCI use-case here:
https://github.com/kelvintaywl-cci/docker-executor-explore/blob/main/.circleci/config.yml

#docker #cicd #debug #cheatsheet

buy Kelvin a cup of coffee ☕