This article focuses on the retry
keyword in GitLab CI/CD, explaining its utility in enhancing pipeline robustness by automatically re-executing failed jobs. We’ll cover how to configure retry
attempts, set delay intervals, and apply it to specific jobs, providing practical examples and best practices for creating more resilient and reliable continuous integration and continuous delivery workflows.
Understanding the retry
Keyword in GitLab CI/CD
The retry
keyword in GitLab CI/CD allows you to automatically re-execute a job a specified number of times if it fails. This is an incredibly useful feature for dealing with transient failures – issues that are not caused by a fundamental problem in your code or configuration but rather by temporary external factors, such as network glitches, flaky test environments, or brief service outages. By configuring retries, you can make your CI/CD pipelines more resilient and reduce the need for manual intervention.
Why Use the retry
Keyword?
The primary reasons to implement the retry
keyword include:
- Handling Transient Failures: Automatically recover from intermittent issues without human intervention.
- Improving Pipeline Reliability: Increase the success rate of your pipelines by giving jobs multiple chances to pass.
- Reducing Manual Reruns: Save time and effort by eliminating the need to manually retry jobs that failed due to temporary problems.
- Stabilizing Flaky Tests: While not a substitute for fixing truly flaky tests, retries can help mitigate their impact on pipeline stability during the diagnosis and resolution phase.
Configuring retry
in .gitlab-ci.yml
The retry
keyword is defined at the job level within your .gitlab-ci.yml
file. It accepts a dictionary of configuration options or a simple integer.
Basic retry
Configuration
The simplest way to use retry
is to specify an integer, which defines the number of times the job should be retried. The maximum number of retries is 2.
stages:
- build
- test
build_job:
stage: build
script:
- echo "Building..."
- exit 0 # This job will always succeed, no retry needed
flaky_test_job:
stage: test
script:
- echo "Running flaky test..."
- >
if (( $RANDOM % 2 == 0 )); then
echo "Test failed temporarily."
exit 1 # Simulate a temporary failure
else
echo "Test passed."
exit 0
fi
retry: 1 # If this job fails, retry it 1 time (total of 2 attempts)
In the example above, flaky_test_job
will attempt to run. If it fails, GitLab CI/CD will automatically rerun it once more. If the second attempt also fails, the job will be marked as failed.
Advanced retry
Configuration with max
, when
, and delay
For more granular control, you can use a dictionary with max
, when
, and delay
to define the retry behavior.
retry:
max: <integer> # Maximum number of retry attempts (0 to 2)
when: # Condition(s) under which to retry (optional)
- always
- unknown_failure
- script_failure
- api_failure
- runner_system_failure
- runner_unsupported
- staled_and_stuck_runner
- job_execution_timeout
- archiver_failure
- missing_dependency_failure
- docker_system_failure
- docker_unknown_failure
- oom_failure
- build_script_failure
- stuck_or_timeout_failure
- runner_service_shutdown
- scheduler_failure
- data_integrity_failure
delay: <duration> # Delay before retrying (e.g., '10s', '5m') - requires GitLab Premium/Ultimate
max
: The maximum number of retry attempts. The value must be between 0 and 2 (inclusive). If set to 0, no retries will occur.when
: (Optional) A list of failure conditions under which the job should be retried. If not specified, the job is retried on any failure (equivalent toalways
). This is a powerful feature for targeting specific types of transient issues.always
: Retry on any failure. (Default ifwhen
is omitted)unknown_failure
: A job failed for an unknown reason.script_failure
: The job’s script exited with a non-zero status.api_failure
: A GitLab API call failed (e.g., during artifact upload/download).runner_system_failure
: The runner itself experienced a system error.runner_unsupported
: The runner is outdated or unsupported.staled_and_stuck_runner
: The runner became unresponsive or stuck.job_execution_timeout
: The job exceeded itstimeout
limit.archiver_failure
: Artifact archiving failed.missing_dependency_failure
: A required artifact or cache dependency was not found.docker_system_failure
: A Docker-related system error (e.g., unable to pull image).docker_unknown_failure
: An unknown Docker error.oom_failure
: Job failed due to out of memory.build_script_failure
: Specifically for script failures within the build context.stuck_or_timeout_failure
: General failure due to a job getting stuck or timing out.runner_service_shutdown
: The runner service shut down during job execution.scheduler_failure
: Failure related to the job scheduler.data_integrity_failure
: Data integrity issues related to the job’s execution environment.
delay
: (GitLab Premium/Ultimate) Specifies a duration to wait before retrying the job. This is crucial for transient issues that might resolve themselves after a short period (e.g., network congestion). The delay is applied after each failed attempt.- Accepted formats:
10s
(seconds),5m
(minutes),1h
(hours). - Example:
delay: 10s
will wait 10 seconds before the first retry, then another 10 seconds before the second (ifmax: 2
).
- Accepted formats:
Example with Advanced retry
Configuration
stages:
- deploy
deploy_to_dev:
stage: deploy
script:
- echo "Attempting to deploy to development environment..."
- >
if (( $CI_JOB_ATTEMPT <= 1 )); then # CI_JOB_ATTEMPT is 1 for the first run, 2 for first retry, etc.
echo "Simulating a temporary network issue on first attempt."
exit 1
else
echo "Deployment successful."
exit 0
fi
retry:
max: 2 # Total 3 attempts (initial + 2 retries)
when:
- script_failure # Only retry if the script itself fails
- runner_system_failure # Or if there's a runner system issue
delay: 5s # Wait 5 seconds before each retry (Premium/Ultimate feature)
network_check_job:
stage: test
script:
- ping -c 1 example.com # Simulate a network check
retry:
max: 1
when:
- script_failure # Only retry if ping fails due to script exit code
delay: 2s
In the deploy_to_dev
job:
- It will run up to 3 times (initial run + 2 retries).
- Retries will only happen if the job fails due to a
script_failure
orrunner_system_failure
. - There will be a 5-second delay before each retry attempt.
The CI_JOB_ATTEMPT
predefined variable can be useful within your script to behave differently on retry attempts, as shown in the example.
Important Considerations and Best Practices
- Do Not Retry Persistent Failures: The
retry
keyword is not a substitute for fixing fundamental issues in your code, tests, or environment. If a job consistently fails for the same reason, retries will only waste resources. Identify and fix the root cause. - Use
delay
for Transient Issues: For issues like network problems or external service outages, adding adelay
can be very effective as it gives the underlying problem a chance to resolve itself. - Choose
when
Conditions Carefully: Be specific about when a job should be retried. Retrying onscript_failure
is common, but you might want to exclude it if your scripts are expected to be robust and failures indicate a deeper problem. - Limit
max
Attempts: Keep themax
retry attempts low (usually 1 or 2). Too many retries can significantly slow down your pipeline without providing much benefit if the failure is not transient. - Monitor Retried Jobs: Even with retries, keep an eye on jobs that frequently get retried. Frequent retries might indicate an underlying flakiness or instability that needs to be addressed.
- Consider Impact on Downstream Jobs: If a job eventually succeeds after several retries, ensure that downstream jobs are robust enough to handle any slight delays or inconsistent states that might arise from the retries.
- Visibility in UI: GitLab’s UI clearly indicates when a job has been retried, showing multiple attempts for a single job and which attempt ultimately passed or failed. This helps in debugging.
FAQs – GitLab CI/CD retry
What is the retry
keyword in GitLab CI/CD?
The retry
keyword in GitLab CI/CD allows you to automatically rerun a job if it fails. This is helpful for handling intermittent errors, such as network timeouts, flaky tests, or resource constraints during job execution.
How do I use the retry
keyword in a job?
You can add retry
to any job definition by specifying the number of retry attempts:
test-job:
script:
- run_tests.sh
retry: 2 # Retry the job up to 2 times if it fails
This means GitLab will attempt to run the job a total of up to 3 times (initial + 2 retries) before marking it as failed.
What is the maximum number of retries allowed in GitLab?
The retry
keyword can be set to an integer from 0 to 2. That means a job can be retried a maximum of 2 times automatically.
If you specify a number greater than 2, GitLab will throw a validation error during pipeline compilation.
Can I retry a job only for specific types of failures?
Yes, you can specify conditions for retrying by using the when
option along with retry
. Available values are:
always
(default): Retry on any failure.unknown_failure
script_failure
api_failure
stuck_or_timeout_failure
runner_system_failure
missing_dependency_failure
runner_unsupported
stale_schedule
Example:
deploy-job:
script:
- ./deploy.sh
retry:
max: 1
when:
- script_failure
- api_failure
This will retry the job only if it fails due to a script or API-related error.
How does retry
interact with manual job retrials in the UI?
The retry
keyword defines automatic retries. Even after GitLab has exhausted the automatic retries, you can still manually retry the job from the GitLab UI by clicking the Retry button.
Does retry
apply to failed manual
or allow_failure
jobs?
No. The retry
keyword does not apply to:
- Jobs with
when: manual
- Jobs with
allow_failure: true
These jobs must be manually triggered or will not fail the pipeline, respectively.
Can I use retry
for jobs in any stage?
Yes, you can use retry
for jobs in any stage of the pipeline: build
, test
, deploy
, etc. However, use it thoughtfully—especially for deployment jobs—to avoid unintended side effects like multiple deploys.
Can I see how many times a job was retried in the GitLab UI?
Yes, each retry attempt is logged in the Job details view in GitLab. You’ll see an indication like “Retry #1” or “Retry #2”, along with the status of each attempt.
Is retry
useful for flaky tests or unstable runners?
Absolutely. If your CI runners are occasionally unstable or your test suite has non-deterministic failures (i.e., flaky tests), using retry
can reduce false negatives in your pipelines.
However, it’s better to investigate root causes rather than relying too heavily on retries.
Does retrying a job affect pipeline duration?
Yes. Retries can increase total pipeline time, especially if failures happen frequently. GitLab waits for each retry to complete before moving to the next stage, which can delay overall execution.
Use retry
only where it’s necessary to ensure pipeline efficiency.
How do I disable retry for a job?
To explicitly prevent a job from retrying, set retry: 0
, although this is redundant because 0 is the default value.
build-job:
script:
- make build
retry: 0
Author

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com