GitLab CI/CD – Cache

This article provides a detailed exploration of the cache keyword in GitLab CI/CD, explaining its role in optimizing pipeline performance by storing and reusing dependencies between jobs and pipeline runs. We will cover the mechanics of caching, including keying strategies, path definitions, and cache policies, along with practical examples and best practices for efficient dependency management to accelerate your continuous integration and continuous delivery processes.

Understanding the cache Keyword in GitLab CI/CD

The cache keyword in GitLab CI/CD is a powerful feature designed to speed up your pipeline jobs by reusing a defined set of files and directories between subsequent runs. Its primary purpose is to store project dependencies (like downloaded libraries, compiled code, or package manager modules) so that they do not need to be re-downloaded or re-generated in every job or pipeline run. This significantly reduces job execution times, conserves network bandwidth, and leads to faster feedback cycles.

Why Use the cache Keyword?

The main benefits of using cache include:

  • Faster Pipeline Execution: Reduces the time spent downloading or generating dependencies, directly impacting overall pipeline duration.
  • Reduced Network Usage: Minimizes redundant downloads, which is especially beneficial for large projects with many dependencies.
  • Improved Efficiency: Optimizes resource usage on your CI/CD runners.
  • Consistent Environments: Helps ensure that jobs start with a consistent set of dependencies.

How cache Works

When a job with a cache definition runs:

  1. Before Script: GitLab CI/CD attempts to restore the cache based on the key defined. If a cache with that key exists, its contents are unzipped into the job’s working directory.
  2. After Script: If the job succeeds and the policy allows it, the files and directories specified in paths are compressed and uploaded as a new cache, associated with the key. This new cache overwrites any existing cache with the same key.

Configuring cache in .gitlab-ci.yml

The cache keyword is typically defined globally (for all jobs) or at the individual job level.

Basic cache Configuration

To specify which files or directories should be cached, use the paths keyword:

# .gitlab-ci.yml

cache:
  paths:
    - node_modules/ # Cache Node.js dependencies
    - .cache/       # Cache any other general build cache directory

stages:
  - build
  - test

install_dependencies:
  stage: build
  script:
    - echo "Installing dependencies..."
    - mkdir -p node_modules # Ensure directory exists for caching
    - touch node_modules/dummy_package.txt # Simulate package installation
    - echo "Dependencies installed."

run_tests:
  stage: test
  script:
    - echo "Running tests using cached dependencies..."
    - ls node_modules/ # Verify cached content is available
  • paths: A list of files or directories (relative to the project’s root) to cache. These paths are usually where your project’s dependencies are installed (e.g., node_modules/, vendor/, ~/.m2/repository/).

Cache Key (key)

The key determines how caches are stored and retrieved. Jobs with the same cache key will share the same cache. This is crucial for managing different caches for different branches, operating systems, or dependency sets.

Static Key

A simple, fixed string. All jobs using this key will share the same cache.

cache:
  key: my-static-cache
  paths:
    - .bundle/
Dynamic Keys using Predefined Variables

Using predefined CI/CD variables in the key is the most common and powerful approach.

$CI_COMMIT_REF_SLUG: Caches per branch or tag. Ideal for maintaining separate caches for feature branches.

cache:
  key: "$CI_COMMIT_REF_SLUG"
  paths:
    - node_modules/

$CI_COMMIT_REF_SLUG-$CI_JOB_NAME: Caches per branch/tag and per job. Useful if different jobs generate distinct caches that should not conflict.

cache:
  key: "$CI_COMMIT_REF_SLUG-$CI_JOB_NAME"
  paths:
    - build-artifacts/

files keyword for key: Dynamically generates a key based on the content of specified files. If the content of these files changes, a new cache is created. This is excellent for package manager lock files (e.g., package-lock.json, Gemfile.lock), as the cache updates only when dependencies change.

cache:
  key:
    files:
      - package.json
      - yarn.lock
    prefix: "$CI_COMMIT_REF_SLUG" # Optional prefix for the key
  paths:
    - node_modules/
  • If package.json or yarn.lock changes, a new cache key is generated, and a fresh node_modules cache is created. Otherwise, the existing cache for that key is used.

Cache Policy (policy)

The policy determines when the cache is uploaded and downloaded.

  • pull-push (default): Cache is downloaded before the job and uploaded after the job succeeds.
  • pull: Cache is only downloaded, not uploaded. Useful for jobs that consume dependencies but do not modify them (e.g., test jobs).
  • push: Cache is only uploaded, not downloaded. Useful for jobs specifically dedicated to preparing and uploading a cache (e.g., an install_deps job).
install_dependencies_job:
  stage: build
  script:
    - npm install
  cache:
    key: "$CI_COMMIT_REF_SLUG"
    paths:
      - node_modules/
    policy: push # Only upload the cache from this job

test_job:
  stage: test
  script:
    - npm test
  cache:
    key: "$CI_COMMIT_REF_SLUG"
    paths:
      - node_modules/
    policy: pull # Only download the cache for this job, do not re-upload

Cache Expiration (expire_in)

Similar to artifacts, caches can also expire. The expire_in keyword specifies how long a cache should be kept.

cache:
  key: "$CI_COMMIT_REF_SLUG"
  paths:
    - node_modules/
  expire_in: 1 week # Cache will be kept for 1 week
  • expire_in: Accepts human-readable durations like 1 day, 3 months, 30 mins. If not specified, GitLab uses a default retention period (often 7 days on GitLab.com, or forever on self-hosted instances unless configured).

Global vs. Job-level Cache

You can define cache globally at the top level of your .gitlab-ci.yml file, which applies to all jobs. Individual jobs can then override or extend this global definition.

# Global cache definition
cache:
  key: "$CI_COMMIT_REF_SLUG"
  paths:
    - .npm/
    - .gradle/

stages:
  - build
  - test

build_frontend:
  stage: build
  script:
    - npm install
    - npm run build
  cache:
    paths: # Overrides global paths for this job
      - node_modules/
      - dist/ # Add project-specific build output to cache if needed
    # Key and policy inherited from global unless overridden

test_backend:
  stage: test
  script:
    - ./gradlew test
  cache:
    policy: pull # Override global policy for this job

Important Considerations and Best Practices

  • Artifacts vs. Cache:
    • Cache is for dependencies that are downloaded or installed from external sources to speed up future job runs (input optimization). They are volatile and might be evicted.
    • Artifacts are for outputs generated by a job that need to be preserved and passed to subsequent jobs in the same pipeline or downloaded by users (output preservation).
    • Do not use cache for build outputs that are critical for subsequent jobs in the same pipeline. Use artifacts for that purpose.
  • Choose the Right Cache Key:
    • Branch-specific caches ($CI_COMMIT_REF_SLUG) are good for development, as different branches often have different dependency versions.
    • File-based keys (key: files: [...]) are ideal for projects with lock files, as they ensure the cache is only invalidated when dependencies truly change, not on every commit.
    • Be mindful of cache size when using file-based keys, as they can create many distinct caches.
  • Be Specific with paths: Only cache what is necessary. Caching too many files or large directories can negate the benefits by increasing upload/download times and storage usage. Exclude build artifacts that are specific to a single job and should not be reused.
  • Use policy: push and policy: pull Strategically:
    • A dedicated “dependency installation” job with policy: push can prepare the cache.
    • Subsequent jobs can then use policy: pull to consume that cache efficiently.
  • Clear or Invalidate Caches: If you encounter unexpected behavior or corrupted caches, you can manually clear project caches through the GitLab UI (Project Settings > CI/CD > Caches). A new cache will be generated on the next pipeline run. Changing the key also effectively invalidates the old cache.
  • Runners and Cache Storage: Caches are stored on the GitLab Runner (often on local disk or in object storage configured for the runner). The performance of caching depends on the runner’s I/O speed and network proximity to the cache storage.
  • Cache Size Limits: Be aware of potential cache size limits imposed by your GitLab instance administrator. Large caches can take longer to transfer.
  • Debugging Caching Issues: If caching does not seem to work, check:
    • Are the key definitions consistent across jobs that should share a cache?
    • Are the paths correctly specified and do the directories actually exist when the job runs?
    • Are the policy settings appropriate?
    • Are there any expire_in settings or manual expirations that might be removing the cache?

FAQs – Cache


What is the cache keyword in GitLab CI/CD?
The cache keyword defines files or directories that GitLab Runner should save and reuse between different job runs or pipeline executions. Caching improves performance by preventing repeated downloads or rebuilds of dependencies like node_modules, .m2, or vendor/.


How do I define a cache in a GitLab CI/CD job?
You define cache inside a job or globally, and specify the paths to be cached:

cache:
  paths:
    - node_modules/

build:
  script:
    - npm install

This stores node_modules/ so that future jobs or pipelines can reuse it, speeding up builds.


What is the difference between artifacts and cache in GitLab?

  • Artifacts are saved after a job finishes and are typically used for sharing files across jobs/stages or for download.
  • Cache is used to speed up jobs by storing and reusing files between different runs or pipelines.

Artifacts are tied to a pipeline; cache is shared and persistent (within configured keys).


Can I use cache across different jobs and stages?
Yes. As long as the cache key is the same, GitLab Runner shares the cache across different jobs or stages.

cache:
  key: shared-cache
  paths:
    - node_modules/

build:
  script: npm install

test:
  script: npm test

Both jobs use the same node_modules/ cache.


What is the key in the cache keyword, and why is it important?
The key uniquely identifies a cache. If two jobs share the same key, they share the same cache.

cache:
  key: my-cache
  paths:
    - node_modules/

You can use static or dynamic keys (e.g., branch-specific):

key: "$CI_COMMIT_REF_SLUG"

This makes the cache unique per branch.


Can I use different caches for different branches or environments?
Yes. Use dynamic keys based on GitLab predefined variables:

cache:
  key: "$CI_COMMIT_REF_SLUG"
  paths:
    - venv/

This creates separate caches for each branch.


How can I share a cache between different pipelines?
GitLab caches are persisted on the same runner. To share caches between pipelines:

  • Use a consistent key.
  • Use the same path structure.
  • Ensure pipelines run on the same GitLab Runner or executor with shared storage (e.g., Docker volumes or mounted directories).

How do I cache multiple paths in GitLab CI/CD?
You can list multiple directories or files under the paths key:

cache:
  paths:
    - node_modules/
    - .cache/pip
    - vendor/

All listed paths will be stored and restored when the job runs.


What does policy mean in cache configuration?
The policy setting controls when caching happens. It supports:

  • pull-push (default): download the cache before the job and upload afterward.
  • pull: only download the cache.
  • push: only upload the cache after the job finishes.

Example:

cache:
  key: packages
  policy: pull
  paths:
    - .npm/

This job only pulls the cache and would not update it after execution.


Can I disable cache for a specific job?
Yes. You can override or clear the global cache by setting an empty cache block:

job:
  cache: {}
  script:
    - echo "No cache used here"

This ensures the job does not use or create any cache.


Where is GitLab cache stored and how long does it persist?
Cache files are stored on the GitLab Runner machine, typically in a directory like /home/gitlab-runner/cache/.
GitLab does not automatically expire caches. You must manage them manually or set up a cache key rotation policy using dynamic keys.


How do I cache dependencies from package managers like npm, pip, or Maven?

npm:

cache:
  paths:
    - node_modules/

pip:

cache:
  paths:
    - .cache/pip

Maven:

cache:
  paths:
    - .m2/repository

Using these paths ensures that downloaded packages are reused in future runs.


How do I debug or check if the cache is working in GitLab?
Enable verbose logging in your jobs and watch for messages like:

  • Checking cache for...
  • Successfully restored cache
  • Creating cache...

Example:

job:
  script:
    - echo "Debugging cache"
  cache:
    key: cache-debug
    paths:
      - temp/

Also inspect the job log in the GitLab UI to confirm cache pull/push actions.


Can I use cache with GitLab’s Docker executor?
Yes. Caches work with the Docker executor but are stored outside the Docker container (on the host runner). GitLab mounts them into the container during execution.


Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment