Introduction
When I set out to optimize CI/CD pipelines across GitHub Actions and GitLab CI for our team, I was facing the same problems I see in many projects: builds were slow, flaky, and increasingly expensive as the codebase grew. Engineers were waiting minutes (sometimes tens of minutes) just to learn a test had failed because of a misconfigured job or unnecessary workflow step.
In this case study, I walk through how I approached these issues in real projects: cutting redundant jobs, tightening caching strategies, and standardizing configuration so that both GitHub Actions and GitLab CI behaved predictably. My goal is to show, step-by-step, what actually moved the needle—how we reduced average pipeline time, stabilized test runs, and made it easier for developers to ship confidently, without rewriting everything from scratch.
Background & Context: A Growing Monorepo and Slowing CI/CD Pipelines
When I first joined this project, we were building a B2B web application with a fairly standard stack: a TypeScript frontend, a Python backend, and a handful of shared libraries for utilities and design components. Over time, we consolidated everything into a single monorepo to simplify dependency management and cross-service changes. That decision paid off for development velocity at first—but it quickly exposed weaknesses in how our CI/CD was structured.
The team had grown to around a dozen active contributors: a mix of backend, frontend, and platform engineers. We also had a split hosting story. Some services and internal tools lived in GitHub, using GitHub Actions, while core, compliance-sensitive components remained in GitLab, wired up to GitLab CI. In practice, this meant two sets of pipelines, two styles of configuration, and plenty of duplication that nobody really wanted to touch.
Initially, the CI/CD setup was straightforward: run the full test suite on every push, build Docker images on every merge to main, and deploy to staging on successful pipelines. As the monorepo grew, that simplicity became a liability. Pipelines stretched from 5–7 minutes to 20+ minutes, sometimes more during peak hours. In my experience, once feedback loops get that slow, developers naturally start bypassing checks, batching risky changes, or running fewer tests locally—all of which we started to see.
On top of that, our cloud bill for runners was creeping up, and product managers were noticing that hotfixes took too long to ship. That combination of developer frustration, operational cost, and business pressure made the need to optimize CI/CD pipelines impossible to ignore. This case study captures how we tackled that turning point and what actually worked in both GitHub Actions and GitLab CI.
The Problem: Slow, Flaky, and Expensive CI/CD Pipelines
By the time we decided to optimize CI/CD pipelines, the pain was hard to ignore. On GitHub Actions, our average pull request pipeline hovered around 18–22 minutes; GitLab CI was even slower, often crossing 25 minutes for backend-heavy changes. During peak hours, queues added another 5–10 minutes of idle waiting. In my day-to-day work, it meant I’d context-switch two or three times before a single PR went green.
Flakiness made things worse. Roughly 10–15% of pipelines failed for reasons unrelated to the change: transient network issues when pulling images, race conditions in integration tests, and brittle end-to-end tests that timed out unpredictably. I saw teammates rerun the same pipeline two or three times just to get a pass, which quietly normalized “red doesn’t always mean broken.”
Cost was the third pressure point. Our monthly spend on hosted and self-hosted runners had nearly doubled in six months. Most of that came from running the full test matrix and Docker builds on every push, even for tiny docs updates or comments-only changes. Meanwhile, developers started avoiding small, incremental PRs because each one felt expensive in both time and resources.
When I pulled the baseline metrics together—median pipeline duration, failure rate, and monthly runner hours—it was clear we were paying too much in latency, reliability, and money for the value we were getting. Those numbers became the starting point and success criteria for the optimization work that follows in this case study.
Constraints & Goals for Optimizing CI/CD Pipelines
Before touching any YAML, I had to be clear about what we could and couldn’t change. Budget-wise, we had room to reshuffle spend but not to introduce a whole new CI platform. That meant I needed to optimize CI/CD pipelines within the existing GitHub Actions and GitLab CI ecosystems, reusing current runners and secrets management. Security and compliance requirements also ruled out sending builds to third-party services or relaxing checks on protected branches.
Time was another constraint: I had a few weeks, not months, and any refactors had to avoid long-lived “CI freeze” periods. So I planned changes in small, reversible steps with feature-flagged workflows where possible. Tooling-wise, we committed to staying in YAML, using only first-party or well-established marketplace actions on GitHub and official images and templates on GitLab.
Within those constraints, I set explicit goals: cut median pipeline time in half, reduce flaky failures by at least 50%, and lower runner hours by 25–30%, while keeping the same (or better) test coverage and security scanning. Having those targets made trade-offs much clearer when deciding where to invest effort.
Approach & Strategy: Profiling and Redesigning the Pipelines
To really optimize CI/CD pipelines, I started by treating them like any other performance problem: measure first, then change. Instead of guessing, I pulled timing data from both GitHub Actions and GitLab CI, looking at where jobs spent most of their time—checkout, dependency installation, tests, builds, and image pushes. I also tracked failure reasons for a couple of weeks to separate true test failures from infrastructure or configuration issues.
From there, I grouped work into three streams. First, I focused on structural changes: splitting monolithic pipelines into smaller, parallel jobs and introducing conditional execution so we didn’t run the world on every tiny change. Second, I tuned the hot paths—dependency caching, base images, and test sharding—where I knew, from past experience, we could win back minutes with relatively little effort. Third, I aligned GitHub Actions and GitLab CI configurations so they followed the same logical stages and naming, which made it much easier to reason about behavior across platforms.
Here’s a simplified example of how I started experimenting with a more modular GitHub Actions workflow:
name: CI
on:
pull_request:
paths-ignore:
- 'docs/**'
jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm test -- --runInBand
build_docker:
needs: tests
if: github.event.pull_request.base.ref == 'main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: ./scripts/build_docker.sh
On the GitLab side, I mirrored this structure with explicit stages and rules to skip jobs for docs-only or non-impacting changes. Throughout the process, I made one change at a time, watched the metrics for a few days, and only then moved on. That discipline kept risk low and made it obvious which tweaks actually contributed to faster, more reliable pipelines. Understanding GitHub Actions – GitHub Docs
Implementation: Concrete Changes in GitHub Actions and GitLab CI
Once I had a clear profile of where time and money were going, I focused on a few high-impact levers that would reliably optimize CI/CD pipelines across both platforms: better caching, smarter job fan-out, conditional execution, and shared templates so we stopped solving the same problems twice.
Faster Feedback with Smarter Caching
On GitHub Actions, dependency installation was eating several minutes per run. I switched to key-based caching tied to lockfiles and OS, which in my experience gives a big win with minimal complexity.
# .github/workflows/ci.yml
jobs:
node_tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- uses: actions/cache@v4
with:
path: ~/.npm
key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
npm-${{ runner.os }}-
- run: npm ci
- run: npm test
In GitLab CI, I mirrored the idea using per-branch caches keyed by lockfiles. That alone shaved 3–5 minutes off many pipelines:
# .gitlab-ci.yml
cache:
key: "$CI_COMMIT_REF_SLUG-$(md5sum package-lock.json | cut -d' ' -f1)"
paths:
- node_modules/
node_tests:
stage: test
image: node:20
script:
- npm ci
- npm test
Matrix Builds and Targeted Parallelization
Our test suites covered multiple Node and Python versions. Instead of serial jobs, I moved to matrix builds in GitHub Actions to parallelize them while keeping configuration DRY:
jobs:
node_tests_matrix:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
node: [18, 20]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
cache: 'npm'
- run: npm ci
- run: npm test
On GitLab, I used the parallel:matrix feature for equivalent coverage. One thing I learned the hard way was to cap concurrency so we didn’t overwhelm shared databases or external services during integration tests.
Conditional Jobs to Avoid Useless Work
Next, I cut out obviously wasted runs. Docs-only or frontend-only changes no longer triggered backend-heavy jobs. In GitHub Actions, I leaned on paths, paths-ignore, and if: conditions:
on:
pull_request:
paths-ignore:
- 'docs/**'
jobs:
backend_tests:
if: contains(github.event.pull_request.changed_files, 'backend/')
runs-on: ubuntu-latest
steps:
# backend test steps
docs_check:
if: contains(github.event.pull_request.changed_files, 'docs/')
runs-on: ubuntu-latest
steps:
# lint docs
In GitLab CI, I shifted from simple only/except to rules, so we could express the same intent more precisely:
backend_tests:
stage: test
rules:
- changes:
- backend/**
- when: never
script:
- pytest
After this change, we saw fewer than half as many full pipelines for trivial changes, which contributed significantly to cutting runner hours.
To keep GitHub Actions and GitLab CI aligned, I introduced shared templates. On GitHub, I pulled common steps into a reusable workflow:
# .github/workflows/reusable-tests.yml
name: Reusable Tests
on:
workflow_call:
inputs:
path:
required: true
type: string
jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci --workspace ${{ inputs.path }}
- run: npm test --workspace ${{ inputs.path }}
Then each service-specific workflow just called this shared definition. On GitLab, I used YAML anchors and !reference to avoid copy-paste. In my experience, this was a big maintainability win—fixing a flaky step in one place instead of four.
Stabilizing Flaky Tests and Infrastructure
Finally, to make pipelines more trustworthy, I hardened the most failure-prone steps: added retry logic on network-heavy operations, isolated noisy integration tests into their own job, and increased timeouts only where justified. I also added lightweight health checks around databases and services before running tests so failures were more clearly about infra vs. code.
None of these changes were individually exotic, but together they dramatically optimized CI/CD pipelines while keeping the configuration understandable for the rest of the team. Flaky Tests in Software Testing: How to Identify, Fix, and Prevent Them
Results: Faster Feedback, Lower Costs, and More Reliable CI/CD
Within a few weeks of rolling out the changes, the impact was obvious in both day-to-day work and the metrics. On GitHub Actions, median pull request pipeline time dropped from ~20 minutes to about 9 minutes; GitLab CI fell from roughly 25 minutes to around 11 minutes. The longest pipelines still existed for full integration and release flows, but even those shrank by 30–40%. In my experience, this was the difference between “go make coffee” and “wait a moment and keep your current context.”
Reliability improved just as much. Flaky failures fell from the 10–15% range down to about 3–4%, largely thanks to better caching, test isolation, and small infrastructure tweaks. That change alone made the CI “red vs green” signal something we could trust again, which encouraged more frequent, smaller merges.
On the cost side, total runner hours across GitHub and GitLab dropped by roughly 35%. Most of that came from conditional jobs and avoiding unnecessary builds on docs or non-impacting changes. The finance team noticed the reduced cloud spend; the engineers noticed that optimizing CI/CD pipelines meant they could ship more often without feeling like each PR had an invisible tax attached.
The final, less tangible result was cultural: people stopped dreading CI. When I asked the team a month later, the common theme was that CI/CD had gone from a bottleneck to something boring and predictable—which, in my experience, is exactly what you want from your pipelines. CI/CD Process: 6 Efficient ways for Continuous Improvement in SDLC
What Didn’t Work: Missteps While Trying to Optimize CI/CD Pipelines
Not every attempt to optimize CI/CD pipelines paid off. A few ideas looked good on paper but caused more trouble than they were worth once I tried them with real workloads and a real team.
The biggest misstep was over-optimizing with aggressive timeouts and retries. Early on, I tightened timeouts across several jobs in both GitHub Actions and GitLab CI to force faster failures. In practice, this just turned slow-but-healthy runs into flaky ones, especially when shared services were under load. I eventually rolled those changes back and focused on reducing the work each job did instead of just making it fail sooner.
I also experimented with a very granular job breakdown—splitting tests into many tiny jobs to chase maximum parallelism. That made the YAML harder to read, inflated orchestration overhead, and didn’t yield the speedup I expected. One thing I learned the hard way was that there’s a sweet spot: enough parallelism to utilize runners well, but not so much that debugging pipelines becomes an ordeal.
Finally, I briefly tried a homegrown script to dynamically reorder tests by historical failure duration. It seemed clever, but the maintenance cost and debugging complexity quickly outweighed the gains. In the end, simple, predictable structures beat “smart” but opaque optimizations.
Lessons Learned & Recommendations for GitHub Actions and GitLab CI Users
Looking back at this project, a few practical lessons stand out for anyone trying to optimize CI/CD pipelines on GitHub Actions or GitLab CI.
First, always start with measurement. In my experience, it’s easy to lose days tweaking YAML in the wrong places. Pull basic stats on job durations, queue times, and failure rates, then target the slowest and most frequently run jobs first. Both platforms expose enough logs and metrics to make this a quick win.
Second, keep your workflows simple and readable. Parallelization, matrices, and reusable templates are powerful, but I’ve learned they only pay off if another engineer can understand what’s happening without a guided tour. Prefer a small set of clearly named stages and jobs over a tangle of clever conditions.
Third, align your optimizations with team behavior. For us, that meant optimizing the PR path first, because that’s where developers feel friction most. Release pipelines came later. I’d recommend you do the same: make the common path fast and reliable, then polish the edges.
Finally, treat CI config as versioned, reviewed code. Require code reviews for CI changes, introduce small experiments behind conditions, and roll back quickly if something increases flakiness. On both GitHub Actions and GitLab CI, this mindset made it much easier for me to improve performance without eroding trust in the pipelines.
If you keep those principles in mind—measure first, favor clarity, optimize the main developer loop, and treat YAML as real code—you’ll be in a strong position to continuously improve your setups instead of doing one big “CI rewrite” every year. CI/CD: Complete Guide to Continuous Integration and Delivery
Conclusion / Key Takeaways
For this project, the biggest win came from treating GitHub Actions and GitLab CI like performance-critical systems, not just background tooling. By profiling first, then iteratively redesigning workflows, I was able to cut pipeline times roughly in half, reduce flaky failures to a small minority, and trim runner costs without sacrificing coverage.
If you want to optimize CI/CD pipelines on either platform, my distilled checklist is simple: measure where time and failures actually occur; fix the hot paths with caching and sensible parallelism; avoid running work that doesn’t matter through conditional jobs; and centralize common logic in shared templates so improvements apply everywhere. Above all, keep the YAML readable and make small, reversible changes. Done consistently, those habits turn CI/CD from a bottleneck into quiet, reliable infrastructure that supports faster, safer releases.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





