How to Optimize Docker-based CI Runners with Shared Package Caches

At Unleashed Technologies we use Gitlab CI with Docker runners for our continuous integration testing. We’ve put significant effort into speeding up the build execution speeds. One of the optimizations we made was to share a cache volume across all the CI jobs, allowing them to share files like package download caches.

Configuring the Docker runner was really simple — we simply dropped volumes = ["/srv/cache:/cache:rw"] into our config.toml file:

concurrent = 6
check_interval = 0

[[runners]]
  name = "ut-ci01"
  url = "https://gitlab.example.com/"
  token = "xxxxxxxxxxxxx"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "unleashed/php:7.1"
    privileged = false
    disable_cache = false
    volumes = ["/srv/cache:/cache:rw"]
  [runners.cache]

As a result, all CI jobs will have a /cache directory available (which is mapped to /srv/cache on the Docker host).

The next step was making the package managers utilize this cache directory whenever jobs run commands like composer install or yarn install. Luckily, these package managers allow us to configure their cache directories using environment variables:

Composer: COMPOSER_CACHE_DIR
Yarn: YARN_CACHE_FOLDER
npm: NPM_CONFIG_CACHE
bower: bower_storage__packages
RubyGems: GEM_SPEC_CACHE
pip: PIP_DOWNLOAD_CACHE

So we simply added these ENV directives in the Dockerfiles for our base images:

ENV COMPOSER_CACHE_DIR /cache/composer
ENV YARN_CACHE_FOLDER /cache/yarn
ENV NPM_CONFIG_CACHE /cache/npm
ENV bower_storage__packages /cache/bower
ENV GEM_SPEC_CACHE /cache/gem
ENV PIP_DOWNLOAD_CACHE /cache/pip

Now, whenever a job needs a package installed, it’ll pull from our local cache instead of downloading from a remote server! This provides a noticeable speed improvement for our builds.

This quick tip was originally published on Colin’s blog, and republished here with the author’s permission.

Frequently Asked Questions on Optimizing Docker-Based CI Runners with Shared Package Caches

What are the benefits of optimizing Docker-based CI runners with shared package caches?

Optimizing Docker-based CI runners with shared package caches can significantly improve the efficiency and speed of your CI/CD pipelines. It allows for faster build times by reusing previously downloaded packages and dependencies, reducing the need for downloading the same packages multiple times. This not only saves time but also reduces network bandwidth usage. Additionally, it can help in maintaining consistency across different builds and environments, as the same version of the packages are used.

How can I set up shared package caches for Docker-based CI runners?

Setting up shared package caches involves creating a Docker volume that will be used as a cache storage. This volume can be attached to your CI runners, allowing them to share and reuse the same package cache. You can define this volume in your Docker Compose file or Docker run command. Once the volume is set up, you need to configure your package manager to use this cache volume.

What are some common issues when optimizing Docker-based CI runners with shared package caches?

Some common issues include cache invalidation, where the cache becomes outdated and needs to be refreshed, and cache pollution, where unwanted or unnecessary files take up space in the cache. These issues can be mitigated by setting up proper cache management strategies, such as using cache eviction policies and regularly cleaning up the cache.

How does Docker’s build cache work?

Docker’s build cache works by storing intermediate images created during the build process. When building an image, Docker checks if there is an existing intermediate image that can be reused, which can significantly speed up the build process. However, it’s important to note that Docker’s build cache is not shared across different hosts by default.

How can I optimize my Docker image size?

There are several strategies to optimize Docker image size. One common method is to use multi-stage builds, where you use one stage to build your application and a second, lighter stage to run it. Another method is to remove unnecessary files and packages after they are no longer needed. Additionally, you can use smaller base images and avoid installing unnecessary packages.

How can I speed up my GitLab CI pipelines?

There are several ways to speed up GitLab CI pipelines. One method is to use parallel execution to run multiple jobs at the same time. Another method is to use caching to avoid redundant work, such as downloading dependencies. Additionally, you can optimize your pipeline configuration to reduce the number of stages and jobs.

What is YAML optimization in the context of GitLab CI?

YAML optimization in GitLab CI involves structuring your .gitlab-ci.yml file in a way that makes your pipeline more efficient. This can include using parallel execution, caching, and only running jobs when necessary. It can also involve using GitLab CI’s features, such as only/except and rules, to control when jobs are run.

How can pipeline efficiency be improved in GitLab CI?

Pipeline efficiency in GitLab CI can be improved by using parallel execution, caching, and optimizing your .gitlab-ci.yml file. Additionally, you can use GitLab CI’s Auto DevOps feature, which automatically configures your pipeline with best practices for efficiency.

What are the best practices for managing Docker volumes?

Best practices for managing Docker volumes include regularly cleaning up unused volumes, using named volumes for important data, and avoiding the use of host volumes for portable applications. Additionally, it’s recommended to use volume plugins for managing volumes in a multi-host environment.

How can I troubleshoot issues with Docker-based CI runners?

Troubleshooting Docker-based CI runners can involve checking the runner’s logs, verifying the runner’s configuration, and testing the runner with a simple job. Additionally, you can use Docker’s built-in debugging tools, such as docker inspect and docker logs, to investigate issues.