Skip to content
Unverified — AI-generated content. Help verify this page

Pipeline Patterns

Why Pipeline Patterns Matter

A CI/CD pipeline is not just a sequence of shell commands — it is a production system that every developer interacts with dozens of times per day. A poorly designed pipeline creates a bottleneck that multiplies across every engineer, every PR, every deployment. The difference between a 5-minute and a 30-minute pipeline, across a team of 50 engineers making 100 PRs per week, is 2,000 engineer-minutes per week — or roughly one full-time engineer doing nothing but waiting for CI.

Pipeline patterns are the architectural blueprints that solve specific organizational and technical challenges. Just as software design patterns provide reusable solutions to recurring problems, pipeline patterns provide proven approaches to common CI/CD challenges.

The Problem Space

ChallengeNaive ApproachCostPattern Solution
Slow feedbackRun everything sequentially30+ min per PRDAG pipelines, parallel testing
Monorepo blast radiusRun all tests on every changeWasted computeChange detection, affected project analysis
Environment consistencyManual configurationEnvironment driftInfrastructure as code, immutable artifacts
Release coordinationManual release processHuman error, slow releasesTrunk-based development, feature flags
Merge conflictsLong-lived feature branchesIntegration painTrunk-based development, short-lived branches

First Principles

The Feedback Loop Axiom

The value of CI/CD feedback decreases exponentially with time:

V(t)=V0eλt

Where:

  • V0 is the maximum value of the feedback (catching a bug before merge)
  • t is the time between commit and feedback
  • λ is the decay constant (typically ~0.1 per minute for developer context)

At t=5 minutes, feedback retains e0.560% of its value — the developer is still in context. At t=30 minutes, feedback retains e35% — the developer has context-switched and must reload mental state.

This exponential decay is why pipeline speed is not a nice-to-have but a force multiplier for engineering productivity.

The Testing Pyramid in CI

Pipeline design should reflect this pyramid: run the fastest, cheapest tests first and gate progressively more expensive tests behind earlier successes.

Core Mechanics

Pattern 1: Trunk-Based Development Pipeline

Trunk-based development (TBD) is the practice of all developers committing to a single branch (main/trunk) with short-lived feature branches (< 1 day).

Pipeline for trunk-based development:

yaml
# .github/workflows/trunk-based.yml
name: Trunk-Based CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

concurrency:
  group: ${​{ github.workflow }}-${​{ github.ref }}
  cancel-in-progress: true

jobs:
  # Gate 1: Fast feedback (< 2 min)
  fast-checks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck
      - run: npm run test:unit -- --bail

  # Gate 2: Full validation (< 10 min)
  full-tests:
    needs: fast-checks
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: test
        ports: ['5432:5432']
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run test:all -- --shard=${​{ matrix.shard }}/4

  # Gate 3: Build & push (< 5 min)
  build:
    needs: full-tests
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    outputs:
      image: ${​{ steps.build.outputs.image }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${​{ github.actor }}
          password: ${​{ secrets.GITHUB_TOKEN }}
      - id: build
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${​{ github.repository }}:${​{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # Gate 4: Progressive deployment
  deploy-canary:
    needs: build
    runs-on: ubuntu-latest
    environment: production-canary
    steps:
      - uses: actions/checkout@v4
      - name: Deploy canary (5% traffic)
        run: |
          kubectl set image deployment/app-canary \
            app=${​{ needs.build.outputs.image }} -n production

  verify-canary:
    needs: deploy-canary
    runs-on: ubuntu-latest
    steps:
      - name: Monitor canary for 10 minutes
        run: |
          for i in $(seq 1 20); do
            ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=rate(http_requests_total{status=~\"5..\",deployment=\"canary\"}[1m])/rate(http_requests_total{deployment=\"canary\"}[1m])" | jq '.data.result[0].value[1] // "0"' -r)
            if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
              echo "Canary error rate too high: $ERROR_RATE"
              kubectl rollout undo deployment/app-canary -n production
              exit 1
            fi
            echo "Canary healthy. Error rate: $ERROR_RATE"
            sleep 30
          done

  deploy-full:
    needs: verify-canary
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Full production deployment
        run: |
          kubectl set image deployment/app \
            app=${​{ needs.build.outputs.image }} -n production
          kubectl rollout status deployment/app -n production --timeout=600s

Pattern 2: Monorepo Pipeline

Monorepos require intelligent change detection to avoid rebuilding the entire codebase on every commit.

Change detection with dependency awareness:

typescript
// scripts/detect-affected.ts
import { execSync } from 'child_process';
import * as path from 'path';
import * as fs from 'fs';

interface PackageInfo {
  name: string;
  path: string;
  dependencies: string[];
}

interface AffectedResult {
  packages: string[];
  matrix: { package: string }[];
}

function getChangedFiles(baseBranch: string = 'origin/main'): string[] {
  const output = execSync(
    `git diff --name-only ${baseBranch}...HEAD`,
    { encoding: 'utf-8' }
  );
  return output.trim().split('\n').filter(Boolean);
}

function getPackageMap(): Map<string, PackageInfo> {
  const packagesDir = path.join(process.cwd(), 'packages');
  const packages = new Map<string, PackageInfo>();

  for (const dir of fs.readdirSync(packagesDir)) {
    const pkgJsonPath = path.join(packagesDir, dir, 'package.json');
    if (!fs.existsSync(pkgJsonPath)) continue;

    const pkgJson = JSON.parse(fs.readFileSync(pkgJsonPath, 'utf-8'));
    const deps = Object.keys({
      ...pkgJson.dependencies,
      ...pkgJson.devDependencies,
    }).filter(d => d.startsWith('@myorg/'));

    packages.set(dir, {
      name: pkgJson.name,
      path: `packages/${dir}`,
      dependencies: deps.map(d => d.replace('@myorg/', '')),
    });
  }

  return packages;
}

function getAffectedPackages(
  changedFiles: string[],
  packageMap: Map<string, PackageInfo>
): Set<string> {
  const affected = new Set<string>();

  // Direct changes
  for (const file of changedFiles) {
    for (const [name, pkg] of packageMap) {
      if (file.startsWith(pkg.path + '/')) {
        affected.add(name);
      }
    }

    // Root config changes affect everything
    if (
      file === 'package.json' ||
      file === 'tsconfig.json' ||
      file.startsWith('.github/')
    ) {
      for (const name of packageMap.keys()) {
        affected.add(name);
      }
      return affected; // Everything affected
    }
  }

  // Transitive dependencies (BFS)
  const queue = [...affected];
  while (queue.length > 0) {
    const current = queue.shift()!;
    for (const [name, pkg] of packageMap) {
      if (!affected.has(name) && pkg.dependencies.includes(current)) {
        affected.add(name);
        queue.push(name);
      }
    }
  }

  return affected;
}

// Main
const changedFiles = getChangedFiles();
const packageMap = getPackageMap();
const affected = getAffectedPackages(changedFiles, packageMap);

const result: AffectedResult = {
  packages: [...affected],
  matrix: [...affected].map(p => ({ package: p })),
};

console.log(JSON.stringify(result));

Monorepo pipeline with Turborepo:

yaml
# .github/workflows/monorepo-ci.yml
name: Monorepo CI

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

env:
  TURBO_TOKEN: ${​{ secrets.TURBO_TOKEN }}
  TURBO_TEAM: ${​{ vars.TURBO_TEAM }}

jobs:
  detect:
    runs-on: ubuntu-latest
    outputs:
      affected: ${​{ steps.affected.outputs.packages }}
      matrix: ${​{ steps.affected.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - id: affected
        run: |
          AFFECTED=$(npx turbo run build --dry-run=json --filter='...[origin/main...HEAD]' | jq -c '.packages')
          MATRIX=$(echo "$AFFECTED" | jq -c '[.[] | select(. != "//") | {package: .}]')
          echo "packages=$AFFECTED" >> "$GITHUB_OUTPUT"
          echo "matrix=$MATRIX" >> "$GITHUB_OUTPUT"
          echo "Affected packages: $AFFECTED"

  lint:
    needs: detect
    if: needs.detect.outputs.affected != '[]'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npx turbo run lint --filter='...[origin/main...HEAD]'

  test:
    needs: detect
    if: needs.detect.outputs.matrix != '[]'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        include: ${​{ fromJson(needs.detect.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npx turbo run test --filter=${​{ matrix.package }}

  build:
    needs: [lint, test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npx turbo run build --filter='...[origin/main...HEAD]'
      - uses: actions/upload-artifact@v4
        with:
          name: build-artifacts
          path: |
            packages/*/dist/

Pattern 3: Parallel Testing

Test parallelization is the single most impactful optimization for pipeline speed:

Intelligent test sharding by duration:

typescript
// scripts/shard-tests.ts
import * as fs from 'fs';

interface TestTiming {
  file: string;
  duration: number; // milliseconds
}

interface ShardAssignment {
  shard: number;
  files: string[];
  estimatedDuration: number;
}

function loadTimings(timingsFile: string): TestTiming[] {
  if (!fs.existsSync(timingsFile)) {
    return []; // First run, no timing data
  }

  return JSON.parse(fs.readFileSync(timingsFile, 'utf-8'));
}

function shardByDuration(
  testFiles: string[],
  timings: TestTiming[],
  shardCount: number
): ShardAssignment[] {
  const timingMap = new Map(timings.map(t => [t.file, t.duration]));
  const defaultDuration = 5000; // 5s default for unknown tests

  // Sort tests by duration (descending) for better bin packing
  const sortedTests = [...testFiles].sort((a, b) => {
    const dA = timingMap.get(a) ?? defaultDuration;
    const dB = timingMap.get(b) ?? defaultDuration;
    return dB - dA;
  });

  // Greedy bin-packing: assign each test to the shard with least total duration
  const shards: ShardAssignment[] = Array.from({ length: shardCount }, (_, i) => ({
    shard: i + 1,
    files: [],
    estimatedDuration: 0,
  }));

  for (const test of sortedTests) {
    // Find shard with minimum current duration
    const minShard = shards.reduce((min, s) =>
      s.estimatedDuration < min.estimatedDuration ? s : min
    );

    minShard.files.push(test);
    minShard.estimatedDuration += timingMap.get(test) ?? defaultDuration;
  }

  return shards;
}

// Calculate theoretical speedup
function calculateSpeedup(shards: ShardAssignment[]): number {
  const totalDuration = shards.reduce((sum, s) => sum + s.estimatedDuration, 0);
  const maxShardDuration = Math.max(...shards.map(s => s.estimatedDuration));
  return totalDuration / maxShardDuration;
}

// Usage
const testFiles = fs.readdirSync('src')
  .filter(f => f.endsWith('.test.ts'))
  .map(f => `src/${f}`);

const timings = loadTimings('.test-timings.json');
const shardCount = parseInt(process.env.CI_SHARD_COUNT ?? '4', 10);
const shards = shardByDuration(testFiles, timings, shardCount);

console.log(`Speedup factor: ${calculateSpeedup(shards).toFixed(2)}x`);
for (const shard of shards) {
  console.log(
    `Shard ${shard.shard}: ${shard.files.length} tests, ` +
    `~${(shard.estimatedDuration / 1000).toFixed(1)}s`
  );
}

// Output for CI consumption
const currentShard = parseInt(process.env.CI_SHARD_INDEX ?? '1', 10);
const myFiles = shards[currentShard - 1].files;
console.log(myFiles.join('\n'));

Pipeline with duration-based sharding:

yaml
jobs:
  prepare-shards:
    runs-on: ubuntu-latest
    outputs:
      shards: ${​{ steps.shard.outputs.shards }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/cache/restore@v4
        with:
          path: .test-timings.json
          key: test-timings-${​{ github.ref }}
          restore-keys: test-timings-
      - id: shard
        run: |
          SHARDS=$(npx ts-node scripts/shard-tests.ts --json --count=4)
          echo "shards=$SHARDS" >> "$GITHUB_OUTPUT"

  test:
    needs: prepare-shards
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: ${​{ fromJson(needs.prepare-shards.outputs.shards) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: |
          echo '${​{ matrix.shard.files }}' > /tmp/test-files.txt
          npx vitest run --reporter=json --outputFile=test-results.json $(cat /tmp/test-files.txt | tr ',' ' ')
      - uses: actions/upload-artifact@v4
        with:
          name: test-results-${​{ matrix.shard.index }}
          path: test-results.json

  collect-timings:
    needs: test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/download-artifact@v4
        with:
          pattern: test-results-*
      - run: |
          # Merge timing data from all shards
          npx ts-node scripts/merge-timings.ts test-results-*/*.json > .test-timings.json
      - uses: actions/cache/save@v4
        with:
          path: .test-timings.json
          key: test-timings-${​{ github.ref }}

Pattern 4: Feature Flag Pipeline

Feature flags decouple deployment from release, enabling trunk-based development with safe rollouts:

typescript
// src/feature-flags.ts
interface FeatureFlagConfig {
  name: string;
  defaultValue: boolean;
  rolloutPercentage: number;
  targetGroups: string[];
  killSwitch: boolean;
}

class FeatureFlagService {
  private flags: Map<string, FeatureFlagConfig>;
  private evaluationCache: Map<string, boolean> = new Map();

  constructor(private provider: FeatureFlagProvider) {
    this.flags = new Map();
  }

  async initialize(): Promise<void> {
    const configs = await this.provider.fetchAll();
    for (const config of configs) {
      this.flags.set(config.name, config);
    }

    // Subscribe to real-time updates
    this.provider.onUpdate((config) => {
      this.flags.set(config.name, config);
      this.evaluationCache.clear();
    });
  }

  isEnabled(flagName: string, context: EvaluationContext): boolean {
    const cacheKey = `${flagName}:${context.userId}`;
    const cached = this.evaluationCache.get(cacheKey);
    if (cached !== undefined) return cached;

    const flag = this.flags.get(flagName);
    if (!flag) return false;

    // Kill switch overrides everything
    if (flag.killSwitch) return false;

    // Check target groups
    if (flag.targetGroups.length > 0) {
      const inTargetGroup = flag.targetGroups.some(
        group => context.groups?.includes(group)
      );
      if (inTargetGroup) {
        this.evaluationCache.set(cacheKey, true);
        return true;
      }
    }

    // Percentage rollout (deterministic based on user ID)
    const hash = this.hashString(`${flagName}:${context.userId}`);
    const bucket = hash % 100;
    const enabled = bucket < flag.rolloutPercentage;

    this.evaluationCache.set(cacheKey, enabled);
    return enabled;
  }

  private hashString(str: string): number {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      const char = str.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }
}

interface EvaluationContext {
  userId: string;
  groups?: string[];
  environment?: string;
  region?: string;
}

interface FeatureFlagProvider {
  fetchAll(): Promise<FeatureFlagConfig[]>;
  onUpdate(callback: (config: FeatureFlagConfig) => void): void;
}

Pattern 5: Blue-Green Deployment Pipeline

yaml
# Blue-green deployment with AWS ECS
jobs:
  deploy-blue-green:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${​{ vars.DEPLOY_ROLE_ARN }}
          aws-region: us-east-1

      - name: Determine active environment
        id: active
        run: |
          ACTIVE=$(aws elbv2 describe-rules \
            --listener-arn ${​{ vars.LISTENER_ARN }} \
            --query 'Rules[?IsDefault].Actions[0].ForwardConfig.TargetGroups[?Weight>`0`].TargetGroupArn' \
            --output text)
          if echo "$ACTIVE" | grep -q "blue"; then
            echo "active=blue" >> "$GITHUB_OUTPUT"
            echo "inactive=green" >> "$GITHUB_OUTPUT"
          else
            echo "active=green" >> "$GITHUB_OUTPUT"
            echo "inactive=blue" >> "$GITHUB_OUTPUT"
          fi

      - name: Deploy to inactive environment
        run: |
          ENV=${​{ steps.active.outputs.inactive }}
          aws ecs update-service \
            --cluster production \
            --service app-${ENV} \
            --task-definition app:${​{ github.sha }} \
            --force-new-deployment
          aws ecs wait services-stable \
            --cluster production \
            --services app-${ENV}

      - name: Health check inactive environment
        run: |
          ENV=${​{ steps.active.outputs.inactive }}
          ENDPOINT=$(aws elbv2 describe-target-groups \
            --names app-${ENV} \
            --query 'TargetGroups[0].HealthCheckPath' \
            --output text)
          curl --fail --retry 10 --retry-delay 5 \
            "http://app-${ENV}.internal${ENDPOINT}"

      - name: Switch traffic
        run: |
          ACTIVE_TG=$(aws elbv2 describe-target-groups \
            --names app-${​{ steps.active.outputs.active }} \
            --query 'TargetGroups[0].TargetGroupArn' --output text)
          INACTIVE_TG=$(aws elbv2 describe-target-groups \
            --names app-${​{ steps.active.outputs.inactive }} \
            --query 'TargetGroups[0].TargetGroupArn' --output text)

          aws elbv2 modify-rule \
            --rule-arn ${​{ vars.DEFAULT_RULE_ARN }} \
            --actions "[{
              \"Type\": \"forward\",
              \"ForwardConfig\": {
                \"TargetGroups\": [
                  {\"TargetGroupArn\": \"${INACTIVE_TG}\", \"Weight\": 100},
                  {\"TargetGroupArn\": \"${ACTIVE_TG}\", \"Weight\": 0}
                ]
              }
            }]"

      - name: Monitor for 5 minutes
        run: |
          for i in $(seq 1 10); do
            ERROR_RATE=$(aws cloudwatch get-metric-statistics \
              --namespace AWS/ApplicationELB \
              --metric-name HTTPCode_Target_5XX_Count \
              --period 30 --statistics Sum \
              --start-time $(date -u -d '1 minute ago' +%Y-%m-%dT%H:%M:%S) \
              --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
              --query 'Datapoints[0].Sum // `0`' --output text)
            echo "5XX count: $ERROR_RATE"
            if [ "$ERROR_RATE" -gt "10" ]; then
              echo "Too many errors, rolling back"
              # Rollback: switch traffic back
              aws elbv2 modify-rule \
                --rule-arn ${​{ vars.DEFAULT_RULE_ARN }} \
                --actions "[{
                  \"Type\": \"forward\",
                  \"ForwardConfig\": {
                    \"TargetGroups\": [
                      {\"TargetGroupArn\": \"${ACTIVE_TG}\", \"Weight\": 100},
                      {\"TargetGroupArn\": \"${INACTIVE_TG}\", \"Weight\": 0}
                    ]
                  }
                }]"
              exit 1
            fi
            sleep 30
          done

Pattern 6: Preview Environment Per PR

yaml
# .github/workflows/preview.yml
name: Preview Environment

on:
  pull_request:
    types: [opened, synchronize, reopened, closed]

jobs:
  deploy-preview:
    if: github.event.action != 'closed'
    runs-on: ubuntu-latest
    environment:
      name: preview-${​{ github.event.pull_request.number }}
      url: https://pr-${​{ github.event.pull_request.number }}.preview.example.com
    steps:
      - uses: actions/checkout@v4

      - name: Build preview image
        run: |
          docker build -t preview:pr-${​{ github.event.pull_request.number }} .

      - name: Deploy preview
        run: |
          # Create namespace for this PR
          kubectl create namespace preview-${​{ github.event.pull_request.number }} \
            --dry-run=client -o yaml | kubectl apply -f -

          # Deploy application
          helm upgrade --install \
            pr-${​{ github.event.pull_request.number }} \
            ./chart \
            --namespace preview-${​{ github.event.pull_request.number }} \
            --set image.tag=pr-${​{ github.event.pull_request.number }} \
            --set ingress.host=pr-${​{ github.event.pull_request.number }}.preview.example.com \
            --wait --timeout 300s

      - name: Comment PR with preview URL
        uses: actions/github-script@v7
        with:
          script: |
            const url = `https://pr-${​{ github.event.pull_request.number }}.preview.example.com`;
            const body = `## Preview Environment\n\nDeployed to: ${url}\n\nThis environment will be destroyed when the PR is closed.`;

            // Find existing comment
            const comments = await github.rest.issues.listComments({
              ...context.repo,
              issue_number: context.issue.number,
            });
            const existing = comments.data.find(c => c.body.includes('Preview Environment'));

            if (existing) {
              await github.rest.issues.updateComment({
                ...context.repo,
                comment_id: existing.id,
                body,
              });
            } else {
              await github.rest.issues.createComment({
                ...context.repo,
                issue_number: context.issue.number,
                body,
              });
            }

  cleanup-preview:
    if: github.event.action == 'closed'
    runs-on: ubuntu-latest
    steps:
      - name: Destroy preview environment
        run: |
          helm uninstall pr-${​{ github.event.pull_request.number }} \
            --namespace preview-${​{ github.event.pull_request.number }} || true
          kubectl delete namespace preview-${​{ github.event.pull_request.number }} || true

Edge Cases & Failure Modes

Pattern Anti-Patterns

Anti-PatternProblemBetter Approach
Test everything on every PRWasted compute, slow feedbackChange detection + affected analysis
Sequential stages onlyArtificial dependencies slow pipelineDAG dependencies with needs
Single massive E2E suiteFlaky, slow, hard to debugParallel shards with timing data
Manual deployment gatesBottleneck on approversAutomated canary with auto-rollback
Shared mutable test dataTest interference, flaky failuresPer-test database, test isolation
Long-lived feature branchesMerge conflicts, integration riskTrunk-based + feature flags
Monolithic pipelineCan't scale, hard to maintainReusable workflows, pipeline composition

Dealing with Flaky Tests in Parallel

typescript
// scripts/quarantine-manager.ts
interface TestResult {
  name: string;
  passed: boolean;
  duration: number;
  retried: boolean;
}

interface QuarantineDecision {
  action: 'quarantine' | 'keep' | 'remove';
  reason: string;
}

class QuarantineManager {
  private readonly FLAKY_THRESHOLD = 0.15; // 15% failure rate
  private readonly WINDOW_SIZE = 20;       // Last 20 runs
  private readonly HEAL_THRESHOLD = 0.02;  // 2% failure rate to un-quarantine

  analyzeTest(
    testName: string,
    history: TestResult[]
  ): QuarantineDecision {
    if (history.length < this.WINDOW_SIZE) {
      return { action: 'keep', reason: 'Insufficient data' };
    }

    const recent = history.slice(-this.WINDOW_SIZE);
    const failures = recent.filter(r => !r.passed).length;
    const failureRate = failures / recent.length;

    if (failureRate >= this.FLAKY_THRESHOLD) {
      return {
        action: 'quarantine',
        reason: `Failure rate ${(failureRate * 100).toFixed(1)}% exceeds threshold`,
      };
    }

    if (failureRate <= this.HEAL_THRESHOLD) {
      return {
        action: 'remove',
        reason: `Failure rate ${(failureRate * 100).toFixed(1)}% below heal threshold`,
      };
    }

    return { action: 'keep', reason: 'Within acceptable range' };
  }
}

Performance Characteristics

Pipeline Optimization Impact

OptimizationBeforeAfterEffort
Add dependency caching15 min8 minLow
Parallel test shards (4x)8 min3 minLow
DAG pipeline (remove stage waits)12 min7 minMedium
Change detection (monorepo)20 min5 minMedium
Remote build cache (Turbo)10 min2 minMedium
Predictive test selection8 min2 minHigh
Self-hosted runners (warm cache)5 min2 minHigh

Cost vs. Speed Tradeoff

Total Cost=Ccompute+Cdeveloper-waitCcompute=P×Tpipeline×Cper-minCdeveloper-wait=Ndevs×Ddaily-PRs×Twait×Cdev-hourly/60

For a team of 30 developers at $80/hr, each submitting 3 PRs/day:

Pipeline DurationDaily Compute CostDaily Wait CostTotal Daily
30 min$72$3,600$3,672
15 min$72$1,800$1,872
5 min$96$600$696
2 min$120$240$360

Spending more on compute (parallel runners, larger machines) dramatically reduces total cost because developer wait time dominates.

Mathematical Foundations

Optimal Shard Count

Given n tests with durations d1,d2,,dn and k shards, the optimal makespan (using greedy bin-packing) satisfies:

Tmakespan43Toptimal

This is the LPT (Longest Processing Time first) approximation ratio. The optimal solution is NP-hard (it's the multiprocessor scheduling problem), but LPT gives a 4/3 approximation.

The theoretical optimal makespan is bounded by:

Toptimalmax(maxidi,i=1ndik)

Queuing Theory for PR Reviews

PRs in a review queue follow an M/G/c model:

Wq=Cs2+Ca22ρ1ρ1μ

Where Cs is the coefficient of variation of service time (review duration) and Ca is the coefficient of variation of arrival time. High variance in review times (Cs>1) dramatically increases wait times.

Real-World War Stories

War Story — The Monorepo That Brought Down CI

A fintech startup migrated 30 microservices into a monorepo for better code sharing. Day one: every push triggered builds for all 30 services. With 50 developers pushing 200 commits/day, this meant 6,000 service builds per day. Their 20-runner pool was overwhelmed within hours, and the queue grew to 8+ hours.

Emergency fix: Added path-based filtering so each service only built when its own files changed. But this missed shared library changes.

Proper fix: Built a dependency graph analysis tool (similar to the TypeScript code above). When packages/shared/ changed, it computed the transitive closure of dependent services and only built those. CI minutes dropped 85%.

Further optimization: Added Turborepo with remote caching. If packages/api hadn't changed since the last main build, it reused the cached artifact. This cut another 60% off remaining builds.

Final numbers: 6,000 builds/day -> 400 builds/day. Pipeline time: 25 min -> 4 min.

War Story — Feature Flag Cleanup Debt

A product team adopted feature flags enthusiastically — 200 flags in 6 months. But they never cleaned up old flags. The codebase became a maze of conditional paths. Tests had to cover flag permutations (2200 theoretical combinations). A production incident occurred when two flags interacted unexpectedly, creating a code path that no test covered.

Fix: Implemented a "flag lifecycle" policy:

  1. Every flag has an owner and an expiration date (max 30 days)
  2. CI pipeline fails if a flag exceeds its expiration
  3. A weekly report shows flag age and usage
  4. Dead flags are auto-detected by feature flag service metrics

Flag count dropped from 200 to 25 active flags within 3 months. Test reliability improved measurably.

Decision Framework

Choosing a Branching Strategy

StrategyBranch LifetimeTeam SizeRelease CadenceRisk Tolerance
Trunk-Based< 1 dayAnyContinuousHigh (needs feature flags)
GitHub Flow1-5 daysSmall-MediumWeeklyMedium
GitFlowWeeks-MonthsLargeScheduledLow
Release BranchesPer releaseLargeScheduledLow

When to Use Each Pattern

PatternUse WhenAvoid When
Trunk-basedContinuous deployment, experienced teamRegulatory holds, immature testing
Monorepo pipelineShared code, coordinated releasesIndependent teams, massive codebase
Parallel testing> 5 min test suite, flaky testsSimple projects, < 50 tests
Feature flagsTrunk-based development, gradual rolloutSimple features, short development cycles
Blue-greenZero-downtime requirements, easy rollbackDatabase migrations, stateful services
Preview environmentsUI-heavy features, stakeholder reviewBackend-only changes, cost constraints
Canary deploymentHigh-traffic services, risk-averseLow-traffic services, simple applications

Advanced Topics

Predictive Test Selection

Instead of running all tests, use ML to predict which tests are likely to fail based on the changed files:

typescript
// Simplified predictive test selection
interface TestPrediction {
  testFile: string;
  probability: number;
  reason: string;
}

class PredictiveTestSelector {
  private cooccurrenceMatrix: Map<string, Map<string, number>> = new Map();
  private totalRuns: number = 0;

  // Build model from historical data
  train(history: { changedFiles: string[]; failedTests: string[] }[]): void {
    this.totalRuns = history.length;

    for (const run of history) {
      for (const file of run.changedFiles) {
        if (!this.cooccurrenceMatrix.has(file)) {
          this.cooccurrenceMatrix.set(file, new Map());
        }
        const row = this.cooccurrenceMatrix.get(file)!;

        for (const test of run.failedTests) {
          row.set(test, (row.get(test) ?? 0) + 1);
        }
      }
    }
  }

  // Predict which tests might fail
  predict(changedFiles: string[], threshold: number = 0.1): TestPrediction[] {
    const scores = new Map<string, number>();

    for (const file of changedFiles) {
      const row = this.cooccurrenceMatrix.get(file);
      if (!row) continue;

      for (const [test, count] of row) {
        const probability = count / this.totalRuns;
        const current = scores.get(test) ?? 0;
        scores.set(test, Math.max(current, probability));
      }
    }

    return [...scores.entries()]
      .filter(([, prob]) => prob >= threshold)
      .map(([testFile, probability]) => ({
        testFile,
        probability,
        reason: `${(probability * 100).toFixed(1)}% historical correlation`,
      }))
      .sort((a, b) => b.probability - a.probability);
  }
}

Pipeline Observability

typescript
// Pipeline metrics collector
import { metrics, trace } from '@opentelemetry/api';

const meter = metrics.getMeter('ci-pipeline');
const tracer = trace.getTracer('ci-pipeline');

const pipelineDuration = meter.createHistogram('pipeline.duration', {
  description: 'Pipeline execution duration in seconds',
  unit: 's',
});

const queueTime = meter.createHistogram('pipeline.queue_time', {
  description: 'Time spent waiting for a runner',
  unit: 's',
});

const cacheHitRate = meter.createObservableGauge('pipeline.cache_hit_rate', {
  description: 'Cache hit rate percentage',
});

const testFlakyRate = meter.createObservableGauge('pipeline.test_flaky_rate', {
  description: 'Percentage of tests that are flaky',
});

// Track pipeline execution
async function executePipeline(config: PipelineConfig): Promise<void> {
  const span = tracer.startSpan('pipeline.execute', {
    attributes: {
      'pipeline.id': config.id,
      'pipeline.trigger': config.trigger,
      'pipeline.branch': config.branch,
    },
  });

  const startTime = Date.now();

  try {
    // ... execute pipeline
    span.setStatus({ code: 0 }); // OK
  } catch (error) {
    span.setStatus({ code: 2, message: String(error) }); // ERROR
    throw error;
  } finally {
    const duration = (Date.now() - startTime) / 1000;
    pipelineDuration.record(duration, {
      status: span.status?.code === 0 ? 'success' : 'failure',
      branch: config.branch,
    });
    span.end();
  }
}

These patterns form the building blocks of production CI/CD systems. Most real-world pipelines combine multiple patterns — trunk-based development with monorepo change detection, parallel testing, and canary deployments. The key is matching patterns to your team's specific constraints: team size, release cadence, risk tolerance, and regulatory requirements.

For implementation details on specific CI platforms, see GitHub Actions Deep Dive and GitLab CI. For artifact management strategies referenced in these patterns, see Artifact Management.

"What I cannot create, I do not understand." — Richard Feynman