Pipeline Patterns
Why Pipeline Patterns Matter
A CI/CD pipeline is not just a sequence of shell commands — it is a production system that every developer interacts with dozens of times per day. A poorly designed pipeline creates a bottleneck that multiplies across every engineer, every PR, every deployment. The difference between a 5-minute and a 30-minute pipeline, across a team of 50 engineers making 100 PRs per week, is 2,000 engineer-minutes per week — or roughly one full-time engineer doing nothing but waiting for CI.
Pipeline patterns are the architectural blueprints that solve specific organizational and technical challenges. Just as software design patterns provide reusable solutions to recurring problems, pipeline patterns provide proven approaches to common CI/CD challenges.
The Problem Space
| Challenge | Naive Approach | Cost | Pattern Solution |
|---|---|---|---|
| Slow feedback | Run everything sequentially | 30+ min per PR | DAG pipelines, parallel testing |
| Monorepo blast radius | Run all tests on every change | Wasted compute | Change detection, affected project analysis |
| Environment consistency | Manual configuration | Environment drift | Infrastructure as code, immutable artifacts |
| Release coordination | Manual release process | Human error, slow releases | Trunk-based development, feature flags |
| Merge conflicts | Long-lived feature branches | Integration pain | Trunk-based development, short-lived branches |
First Principles
The Feedback Loop Axiom
The value of CI/CD feedback decreases exponentially with time:
Where:
is the maximum value of the feedback (catching a bug before merge) is the time between commit and feedback is the decay constant (typically ~0.1 per minute for developer context)
At
This exponential decay is why pipeline speed is not a nice-to-have but a force multiplier for engineering productivity.
The Testing Pyramid in CI
Pipeline design should reflect this pyramid: run the fastest, cheapest tests first and gate progressively more expensive tests behind earlier successes.
Core Mechanics
Pattern 1: Trunk-Based Development Pipeline
Trunk-based development (TBD) is the practice of all developers committing to a single branch (main/trunk) with short-lived feature branches (< 1 day).
Pipeline for trunk-based development:
# .github/workflows/trunk-based.yml
name: Trunk-Based CI/CD
on:
push:
branches: [main]
pull_request:
branches: [main]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
# Gate 1: Fast feedback (< 2 min)
fast-checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm run lint
- run: npm run typecheck
- run: npm run test:unit -- --bail
# Gate 2: Full validation (< 10 min)
full-tests:
needs: fast-checks
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: test
ports: ['5432:5432']
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm run test:all -- --shard=${{ matrix.shard }}/4
# Gate 3: Build & push (< 5 min)
build:
needs: full-tests
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
outputs:
image: ${{ steps.build.outputs.image }}
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- id: build
uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Gate 4: Progressive deployment
deploy-canary:
needs: build
runs-on: ubuntu-latest
environment: production-canary
steps:
- uses: actions/checkout@v4
- name: Deploy canary (5% traffic)
run: |
kubectl set image deployment/app-canary \
app=${{ needs.build.outputs.image }} -n production
verify-canary:
needs: deploy-canary
runs-on: ubuntu-latest
steps:
- name: Monitor canary for 10 minutes
run: |
for i in $(seq 1 20); do
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=rate(http_requests_total{status=~\"5..\",deployment=\"canary\"}[1m])/rate(http_requests_total{deployment=\"canary\"}[1m])" | jq '.data.result[0].value[1] // "0"' -r)
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "Canary error rate too high: $ERROR_RATE"
kubectl rollout undo deployment/app-canary -n production
exit 1
fi
echo "Canary healthy. Error rate: $ERROR_RATE"
sleep 30
done
deploy-full:
needs: verify-canary
runs-on: ubuntu-latest
environment: production
steps:
- name: Full production deployment
run: |
kubectl set image deployment/app \
app=${{ needs.build.outputs.image }} -n production
kubectl rollout status deployment/app -n production --timeout=600sPattern 2: Monorepo Pipeline
Monorepos require intelligent change detection to avoid rebuilding the entire codebase on every commit.
Change detection with dependency awareness:
// scripts/detect-affected.ts
import { execSync } from 'child_process';
import * as path from 'path';
import * as fs from 'fs';
interface PackageInfo {
name: string;
path: string;
dependencies: string[];
}
interface AffectedResult {
packages: string[];
matrix: { package: string }[];
}
function getChangedFiles(baseBranch: string = 'origin/main'): string[] {
const output = execSync(
`git diff --name-only ${baseBranch}...HEAD`,
{ encoding: 'utf-8' }
);
return output.trim().split('\n').filter(Boolean);
}
function getPackageMap(): Map<string, PackageInfo> {
const packagesDir = path.join(process.cwd(), 'packages');
const packages = new Map<string, PackageInfo>();
for (const dir of fs.readdirSync(packagesDir)) {
const pkgJsonPath = path.join(packagesDir, dir, 'package.json');
if (!fs.existsSync(pkgJsonPath)) continue;
const pkgJson = JSON.parse(fs.readFileSync(pkgJsonPath, 'utf-8'));
const deps = Object.keys({
...pkgJson.dependencies,
...pkgJson.devDependencies,
}).filter(d => d.startsWith('@myorg/'));
packages.set(dir, {
name: pkgJson.name,
path: `packages/${dir}`,
dependencies: deps.map(d => d.replace('@myorg/', '')),
});
}
return packages;
}
function getAffectedPackages(
changedFiles: string[],
packageMap: Map<string, PackageInfo>
): Set<string> {
const affected = new Set<string>();
// Direct changes
for (const file of changedFiles) {
for (const [name, pkg] of packageMap) {
if (file.startsWith(pkg.path + '/')) {
affected.add(name);
}
}
// Root config changes affect everything
if (
file === 'package.json' ||
file === 'tsconfig.json' ||
file.startsWith('.github/')
) {
for (const name of packageMap.keys()) {
affected.add(name);
}
return affected; // Everything affected
}
}
// Transitive dependencies (BFS)
const queue = [...affected];
while (queue.length > 0) {
const current = queue.shift()!;
for (const [name, pkg] of packageMap) {
if (!affected.has(name) && pkg.dependencies.includes(current)) {
affected.add(name);
queue.push(name);
}
}
}
return affected;
}
// Main
const changedFiles = getChangedFiles();
const packageMap = getPackageMap();
const affected = getAffectedPackages(changedFiles, packageMap);
const result: AffectedResult = {
packages: [...affected],
matrix: [...affected].map(p => ({ package: p })),
};
console.log(JSON.stringify(result));Monorepo pipeline with Turborepo:
# .github/workflows/monorepo-ci.yml
name: Monorepo CI
on:
pull_request:
branches: [main]
push:
branches: [main]
env:
TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
TURBO_TEAM: ${{ vars.TURBO_TEAM }}
jobs:
detect:
runs-on: ubuntu-latest
outputs:
affected: ${{ steps.affected.outputs.packages }}
matrix: ${{ steps.affected.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-node@v4
with:
node-version: '20'
- id: affected
run: |
AFFECTED=$(npx turbo run build --dry-run=json --filter='...[origin/main...HEAD]' | jq -c '.packages')
MATRIX=$(echo "$AFFECTED" | jq -c '[.[] | select(. != "//") | {package: .}]')
echo "packages=$AFFECTED" >> "$GITHUB_OUTPUT"
echo "matrix=$MATRIX" >> "$GITHUB_OUTPUT"
echo "Affected packages: $AFFECTED"
lint:
needs: detect
if: needs.detect.outputs.affected != '[]'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npx turbo run lint --filter='...[origin/main...HEAD]'
test:
needs: detect
if: needs.detect.outputs.matrix != '[]'
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
include: ${{ fromJson(needs.detect.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npx turbo run test --filter=${{ matrix.package }}
build:
needs: [lint, test]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npx turbo run build --filter='...[origin/main...HEAD]'
- uses: actions/upload-artifact@v4
with:
name: build-artifacts
path: |
packages/*/dist/Pattern 3: Parallel Testing
Test parallelization is the single most impactful optimization for pipeline speed:
Intelligent test sharding by duration:
// scripts/shard-tests.ts
import * as fs from 'fs';
interface TestTiming {
file: string;
duration: number; // milliseconds
}
interface ShardAssignment {
shard: number;
files: string[];
estimatedDuration: number;
}
function loadTimings(timingsFile: string): TestTiming[] {
if (!fs.existsSync(timingsFile)) {
return []; // First run, no timing data
}
return JSON.parse(fs.readFileSync(timingsFile, 'utf-8'));
}
function shardByDuration(
testFiles: string[],
timings: TestTiming[],
shardCount: number
): ShardAssignment[] {
const timingMap = new Map(timings.map(t => [t.file, t.duration]));
const defaultDuration = 5000; // 5s default for unknown tests
// Sort tests by duration (descending) for better bin packing
const sortedTests = [...testFiles].sort((a, b) => {
const dA = timingMap.get(a) ?? defaultDuration;
const dB = timingMap.get(b) ?? defaultDuration;
return dB - dA;
});
// Greedy bin-packing: assign each test to the shard with least total duration
const shards: ShardAssignment[] = Array.from({ length: shardCount }, (_, i) => ({
shard: i + 1,
files: [],
estimatedDuration: 0,
}));
for (const test of sortedTests) {
// Find shard with minimum current duration
const minShard = shards.reduce((min, s) =>
s.estimatedDuration < min.estimatedDuration ? s : min
);
minShard.files.push(test);
minShard.estimatedDuration += timingMap.get(test) ?? defaultDuration;
}
return shards;
}
// Calculate theoretical speedup
function calculateSpeedup(shards: ShardAssignment[]): number {
const totalDuration = shards.reduce((sum, s) => sum + s.estimatedDuration, 0);
const maxShardDuration = Math.max(...shards.map(s => s.estimatedDuration));
return totalDuration / maxShardDuration;
}
// Usage
const testFiles = fs.readdirSync('src')
.filter(f => f.endsWith('.test.ts'))
.map(f => `src/${f}`);
const timings = loadTimings('.test-timings.json');
const shardCount = parseInt(process.env.CI_SHARD_COUNT ?? '4', 10);
const shards = shardByDuration(testFiles, timings, shardCount);
console.log(`Speedup factor: ${calculateSpeedup(shards).toFixed(2)}x`);
for (const shard of shards) {
console.log(
`Shard ${shard.shard}: ${shard.files.length} tests, ` +
`~${(shard.estimatedDuration / 1000).toFixed(1)}s`
);
}
// Output for CI consumption
const currentShard = parseInt(process.env.CI_SHARD_INDEX ?? '1', 10);
const myFiles = shards[currentShard - 1].files;
console.log(myFiles.join('\n'));Pipeline with duration-based sharding:
jobs:
prepare-shards:
runs-on: ubuntu-latest
outputs:
shards: ${{ steps.shard.outputs.shards }}
steps:
- uses: actions/checkout@v4
- uses: actions/cache/restore@v4
with:
path: .test-timings.json
key: test-timings-${{ github.ref }}
restore-keys: test-timings-
- id: shard
run: |
SHARDS=$(npx ts-node scripts/shard-tests.ts --json --count=4)
echo "shards=$SHARDS" >> "$GITHUB_OUTPUT"
test:
needs: prepare-shards
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: ${{ fromJson(needs.prepare-shards.outputs.shards) }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: |
echo '${{ matrix.shard.files }}' > /tmp/test-files.txt
npx vitest run --reporter=json --outputFile=test-results.json $(cat /tmp/test-files.txt | tr ',' ' ')
- uses: actions/upload-artifact@v4
with:
name: test-results-${{ matrix.shard.index }}
path: test-results.json
collect-timings:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
with:
pattern: test-results-*
- run: |
# Merge timing data from all shards
npx ts-node scripts/merge-timings.ts test-results-*/*.json > .test-timings.json
- uses: actions/cache/save@v4
with:
path: .test-timings.json
key: test-timings-${{ github.ref }}Pattern 4: Feature Flag Pipeline
Feature flags decouple deployment from release, enabling trunk-based development with safe rollouts:
// src/feature-flags.ts
interface FeatureFlagConfig {
name: string;
defaultValue: boolean;
rolloutPercentage: number;
targetGroups: string[];
killSwitch: boolean;
}
class FeatureFlagService {
private flags: Map<string, FeatureFlagConfig>;
private evaluationCache: Map<string, boolean> = new Map();
constructor(private provider: FeatureFlagProvider) {
this.flags = new Map();
}
async initialize(): Promise<void> {
const configs = await this.provider.fetchAll();
for (const config of configs) {
this.flags.set(config.name, config);
}
// Subscribe to real-time updates
this.provider.onUpdate((config) => {
this.flags.set(config.name, config);
this.evaluationCache.clear();
});
}
isEnabled(flagName: string, context: EvaluationContext): boolean {
const cacheKey = `${flagName}:${context.userId}`;
const cached = this.evaluationCache.get(cacheKey);
if (cached !== undefined) return cached;
const flag = this.flags.get(flagName);
if (!flag) return false;
// Kill switch overrides everything
if (flag.killSwitch) return false;
// Check target groups
if (flag.targetGroups.length > 0) {
const inTargetGroup = flag.targetGroups.some(
group => context.groups?.includes(group)
);
if (inTargetGroup) {
this.evaluationCache.set(cacheKey, true);
return true;
}
}
// Percentage rollout (deterministic based on user ID)
const hash = this.hashString(`${flagName}:${context.userId}`);
const bucket = hash % 100;
const enabled = bucket < flag.rolloutPercentage;
this.evaluationCache.set(cacheKey, enabled);
return enabled;
}
private hashString(str: string): number {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
}
interface EvaluationContext {
userId: string;
groups?: string[];
environment?: string;
region?: string;
}
interface FeatureFlagProvider {
fetchAll(): Promise<FeatureFlagConfig[]>;
onUpdate(callback: (config: FeatureFlagConfig) => void): void;
}Pattern 5: Blue-Green Deployment Pipeline
# Blue-green deployment with AWS ECS
jobs:
deploy-blue-green:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.DEPLOY_ROLE_ARN }}
aws-region: us-east-1
- name: Determine active environment
id: active
run: |
ACTIVE=$(aws elbv2 describe-rules \
--listener-arn ${{ vars.LISTENER_ARN }} \
--query 'Rules[?IsDefault].Actions[0].ForwardConfig.TargetGroups[?Weight>`0`].TargetGroupArn' \
--output text)
if echo "$ACTIVE" | grep -q "blue"; then
echo "active=blue" >> "$GITHUB_OUTPUT"
echo "inactive=green" >> "$GITHUB_OUTPUT"
else
echo "active=green" >> "$GITHUB_OUTPUT"
echo "inactive=blue" >> "$GITHUB_OUTPUT"
fi
- name: Deploy to inactive environment
run: |
ENV=${{ steps.active.outputs.inactive }}
aws ecs update-service \
--cluster production \
--service app-${ENV} \
--task-definition app:${{ github.sha }} \
--force-new-deployment
aws ecs wait services-stable \
--cluster production \
--services app-${ENV}
- name: Health check inactive environment
run: |
ENV=${{ steps.active.outputs.inactive }}
ENDPOINT=$(aws elbv2 describe-target-groups \
--names app-${ENV} \
--query 'TargetGroups[0].HealthCheckPath' \
--output text)
curl --fail --retry 10 --retry-delay 5 \
"http://app-${ENV}.internal${ENDPOINT}"
- name: Switch traffic
run: |
ACTIVE_TG=$(aws elbv2 describe-target-groups \
--names app-${{ steps.active.outputs.active }} \
--query 'TargetGroups[0].TargetGroupArn' --output text)
INACTIVE_TG=$(aws elbv2 describe-target-groups \
--names app-${{ steps.active.outputs.inactive }} \
--query 'TargetGroups[0].TargetGroupArn' --output text)
aws elbv2 modify-rule \
--rule-arn ${{ vars.DEFAULT_RULE_ARN }} \
--actions "[{
\"Type\": \"forward\",
\"ForwardConfig\": {
\"TargetGroups\": [
{\"TargetGroupArn\": \"${INACTIVE_TG}\", \"Weight\": 100},
{\"TargetGroupArn\": \"${ACTIVE_TG}\", \"Weight\": 0}
]
}
}]"
- name: Monitor for 5 minutes
run: |
for i in $(seq 1 10); do
ERROR_RATE=$(aws cloudwatch get-metric-statistics \
--namespace AWS/ApplicationELB \
--metric-name HTTPCode_Target_5XX_Count \
--period 30 --statistics Sum \
--start-time $(date -u -d '1 minute ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--query 'Datapoints[0].Sum // `0`' --output text)
echo "5XX count: $ERROR_RATE"
if [ "$ERROR_RATE" -gt "10" ]; then
echo "Too many errors, rolling back"
# Rollback: switch traffic back
aws elbv2 modify-rule \
--rule-arn ${{ vars.DEFAULT_RULE_ARN }} \
--actions "[{
\"Type\": \"forward\",
\"ForwardConfig\": {
\"TargetGroups\": [
{\"TargetGroupArn\": \"${ACTIVE_TG}\", \"Weight\": 100},
{\"TargetGroupArn\": \"${INACTIVE_TG}\", \"Weight\": 0}
]
}
}]"
exit 1
fi
sleep 30
donePattern 6: Preview Environment Per PR
# .github/workflows/preview.yml
name: Preview Environment
on:
pull_request:
types: [opened, synchronize, reopened, closed]
jobs:
deploy-preview:
if: github.event.action != 'closed'
runs-on: ubuntu-latest
environment:
name: preview-${{ github.event.pull_request.number }}
url: https://pr-${{ github.event.pull_request.number }}.preview.example.com
steps:
- uses: actions/checkout@v4
- name: Build preview image
run: |
docker build -t preview:pr-${{ github.event.pull_request.number }} .
- name: Deploy preview
run: |
# Create namespace for this PR
kubectl create namespace preview-${{ github.event.pull_request.number }} \
--dry-run=client -o yaml | kubectl apply -f -
# Deploy application
helm upgrade --install \
pr-${{ github.event.pull_request.number }} \
./chart \
--namespace preview-${{ github.event.pull_request.number }} \
--set image.tag=pr-${{ github.event.pull_request.number }} \
--set ingress.host=pr-${{ github.event.pull_request.number }}.preview.example.com \
--wait --timeout 300s
- name: Comment PR with preview URL
uses: actions/github-script@v7
with:
script: |
const url = `https://pr-${{ github.event.pull_request.number }}.preview.example.com`;
const body = `## Preview Environment\n\nDeployed to: ${url}\n\nThis environment will be destroyed when the PR is closed.`;
// Find existing comment
const comments = await github.rest.issues.listComments({
...context.repo,
issue_number: context.issue.number,
});
const existing = comments.data.find(c => c.body.includes('Preview Environment'));
if (existing) {
await github.rest.issues.updateComment({
...context.repo,
comment_id: existing.id,
body,
});
} else {
await github.rest.issues.createComment({
...context.repo,
issue_number: context.issue.number,
body,
});
}
cleanup-preview:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- name: Destroy preview environment
run: |
helm uninstall pr-${{ github.event.pull_request.number }} \
--namespace preview-${{ github.event.pull_request.number }} || true
kubectl delete namespace preview-${{ github.event.pull_request.number }} || trueEdge Cases & Failure Modes
Pattern Anti-Patterns
| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Test everything on every PR | Wasted compute, slow feedback | Change detection + affected analysis |
| Sequential stages only | Artificial dependencies slow pipeline | DAG dependencies with needs |
| Single massive E2E suite | Flaky, slow, hard to debug | Parallel shards with timing data |
| Manual deployment gates | Bottleneck on approvers | Automated canary with auto-rollback |
| Shared mutable test data | Test interference, flaky failures | Per-test database, test isolation |
| Long-lived feature branches | Merge conflicts, integration risk | Trunk-based + feature flags |
| Monolithic pipeline | Can't scale, hard to maintain | Reusable workflows, pipeline composition |
Dealing with Flaky Tests in Parallel
// scripts/quarantine-manager.ts
interface TestResult {
name: string;
passed: boolean;
duration: number;
retried: boolean;
}
interface QuarantineDecision {
action: 'quarantine' | 'keep' | 'remove';
reason: string;
}
class QuarantineManager {
private readonly FLAKY_THRESHOLD = 0.15; // 15% failure rate
private readonly WINDOW_SIZE = 20; // Last 20 runs
private readonly HEAL_THRESHOLD = 0.02; // 2% failure rate to un-quarantine
analyzeTest(
testName: string,
history: TestResult[]
): QuarantineDecision {
if (history.length < this.WINDOW_SIZE) {
return { action: 'keep', reason: 'Insufficient data' };
}
const recent = history.slice(-this.WINDOW_SIZE);
const failures = recent.filter(r => !r.passed).length;
const failureRate = failures / recent.length;
if (failureRate >= this.FLAKY_THRESHOLD) {
return {
action: 'quarantine',
reason: `Failure rate ${(failureRate * 100).toFixed(1)}% exceeds threshold`,
};
}
if (failureRate <= this.HEAL_THRESHOLD) {
return {
action: 'remove',
reason: `Failure rate ${(failureRate * 100).toFixed(1)}% below heal threshold`,
};
}
return { action: 'keep', reason: 'Within acceptable range' };
}
}Performance Characteristics
Pipeline Optimization Impact
| Optimization | Before | After | Effort |
|---|---|---|---|
| Add dependency caching | 15 min | 8 min | Low |
| Parallel test shards (4x) | 8 min | 3 min | Low |
| DAG pipeline (remove stage waits) | 12 min | 7 min | Medium |
| Change detection (monorepo) | 20 min | 5 min | Medium |
| Remote build cache (Turbo) | 10 min | 2 min | Medium |
| Predictive test selection | 8 min | 2 min | High |
| Self-hosted runners (warm cache) | 5 min | 2 min | High |
Cost vs. Speed Tradeoff
For a team of 30 developers at $80/hr, each submitting 3 PRs/day:
| Pipeline Duration | Daily Compute Cost | Daily Wait Cost | Total Daily |
|---|---|---|---|
| 30 min | $72 | $3,600 | $3,672 |
| 15 min | $72 | $1,800 | $1,872 |
| 5 min | $96 | $600 | $696 |
| 2 min | $120 | $240 | $360 |
Spending more on compute (parallel runners, larger machines) dramatically reduces total cost because developer wait time dominates.
Mathematical Foundations
Optimal Shard Count
Given
This is the LPT (Longest Processing Time first) approximation ratio. The optimal solution is NP-hard (it's the multiprocessor scheduling problem), but LPT gives a 4/3 approximation.
The theoretical optimal makespan is bounded by:
Queuing Theory for PR Reviews
PRs in a review queue follow an M/G/c model:
Where
Real-World War Stories
War Story — The Monorepo That Brought Down CI
A fintech startup migrated 30 microservices into a monorepo for better code sharing. Day one: every push triggered builds for all 30 services. With 50 developers pushing 200 commits/day, this meant 6,000 service builds per day. Their 20-runner pool was overwhelmed within hours, and the queue grew to 8+ hours.
Emergency fix: Added path-based filtering so each service only built when its own files changed. But this missed shared library changes.
Proper fix: Built a dependency graph analysis tool (similar to the TypeScript code above). When packages/shared/ changed, it computed the transitive closure of dependent services and only built those. CI minutes dropped 85%.
Further optimization: Added Turborepo with remote caching. If packages/api hadn't changed since the last main build, it reused the cached artifact. This cut another 60% off remaining builds.
Final numbers: 6,000 builds/day -> 400 builds/day. Pipeline time: 25 min -> 4 min.
War Story — Feature Flag Cleanup Debt
A product team adopted feature flags enthusiastically — 200 flags in 6 months. But they never cleaned up old flags. The codebase became a maze of conditional paths. Tests had to cover flag permutations (
Fix: Implemented a "flag lifecycle" policy:
- Every flag has an owner and an expiration date (max 30 days)
- CI pipeline fails if a flag exceeds its expiration
- A weekly report shows flag age and usage
- Dead flags are auto-detected by feature flag service metrics
Flag count dropped from 200 to 25 active flags within 3 months. Test reliability improved measurably.
Decision Framework
Choosing a Branching Strategy
| Strategy | Branch Lifetime | Team Size | Release Cadence | Risk Tolerance |
|---|---|---|---|---|
| Trunk-Based | < 1 day | Any | Continuous | High (needs feature flags) |
| GitHub Flow | 1-5 days | Small-Medium | Weekly | Medium |
| GitFlow | Weeks-Months | Large | Scheduled | Low |
| Release Branches | Per release | Large | Scheduled | Low |
When to Use Each Pattern
| Pattern | Use When | Avoid When |
|---|---|---|
| Trunk-based | Continuous deployment, experienced team | Regulatory holds, immature testing |
| Monorepo pipeline | Shared code, coordinated releases | Independent teams, massive codebase |
| Parallel testing | > 5 min test suite, flaky tests | Simple projects, < 50 tests |
| Feature flags | Trunk-based development, gradual rollout | Simple features, short development cycles |
| Blue-green | Zero-downtime requirements, easy rollback | Database migrations, stateful services |
| Preview environments | UI-heavy features, stakeholder review | Backend-only changes, cost constraints |
| Canary deployment | High-traffic services, risk-averse | Low-traffic services, simple applications |
Advanced Topics
Predictive Test Selection
Instead of running all tests, use ML to predict which tests are likely to fail based on the changed files:
// Simplified predictive test selection
interface TestPrediction {
testFile: string;
probability: number;
reason: string;
}
class PredictiveTestSelector {
private cooccurrenceMatrix: Map<string, Map<string, number>> = new Map();
private totalRuns: number = 0;
// Build model from historical data
train(history: { changedFiles: string[]; failedTests: string[] }[]): void {
this.totalRuns = history.length;
for (const run of history) {
for (const file of run.changedFiles) {
if (!this.cooccurrenceMatrix.has(file)) {
this.cooccurrenceMatrix.set(file, new Map());
}
const row = this.cooccurrenceMatrix.get(file)!;
for (const test of run.failedTests) {
row.set(test, (row.get(test) ?? 0) + 1);
}
}
}
}
// Predict which tests might fail
predict(changedFiles: string[], threshold: number = 0.1): TestPrediction[] {
const scores = new Map<string, number>();
for (const file of changedFiles) {
const row = this.cooccurrenceMatrix.get(file);
if (!row) continue;
for (const [test, count] of row) {
const probability = count / this.totalRuns;
const current = scores.get(test) ?? 0;
scores.set(test, Math.max(current, probability));
}
}
return [...scores.entries()]
.filter(([, prob]) => prob >= threshold)
.map(([testFile, probability]) => ({
testFile,
probability,
reason: `${(probability * 100).toFixed(1)}% historical correlation`,
}))
.sort((a, b) => b.probability - a.probability);
}
}Pipeline Observability
// Pipeline metrics collector
import { metrics, trace } from '@opentelemetry/api';
const meter = metrics.getMeter('ci-pipeline');
const tracer = trace.getTracer('ci-pipeline');
const pipelineDuration = meter.createHistogram('pipeline.duration', {
description: 'Pipeline execution duration in seconds',
unit: 's',
});
const queueTime = meter.createHistogram('pipeline.queue_time', {
description: 'Time spent waiting for a runner',
unit: 's',
});
const cacheHitRate = meter.createObservableGauge('pipeline.cache_hit_rate', {
description: 'Cache hit rate percentage',
});
const testFlakyRate = meter.createObservableGauge('pipeline.test_flaky_rate', {
description: 'Percentage of tests that are flaky',
});
// Track pipeline execution
async function executePipeline(config: PipelineConfig): Promise<void> {
const span = tracer.startSpan('pipeline.execute', {
attributes: {
'pipeline.id': config.id,
'pipeline.trigger': config.trigger,
'pipeline.branch': config.branch,
},
});
const startTime = Date.now();
try {
// ... execute pipeline
span.setStatus({ code: 0 }); // OK
} catch (error) {
span.setStatus({ code: 2, message: String(error) }); // ERROR
throw error;
} finally {
const duration = (Date.now() - startTime) / 1000;
pipelineDuration.record(duration, {
status: span.status?.code === 0 ? 'success' : 'failure',
branch: config.branch,
});
span.end();
}
}These patterns form the building blocks of production CI/CD systems. Most real-world pipelines combine multiple patterns — trunk-based development with monorepo change detection, parallel testing, and canary deployments. The key is matching patterns to your team's specific constraints: team size, release cadence, risk tolerance, and regulatory requirements.
For implementation details on specific CI platforms, see GitHub Actions Deep Dive and GitLab CI. For artifact management strategies referenced in these patterns, see Artifact Management.