Skip to content
Unverified — AI-generated content. Help verify this page

GCP Cloud Run Deep Dive

Cloud Run is GCP's serverless container platform. Give it a container image, and it handles scaling (including to zero), TLS termination, autoscaling, load balancing, and high availability. Unlike AWS Lambda (which requires specific runtimes and packaging), Cloud Run runs any container that listens on a port — your existing Docker images work without modification.

Cloud Run sits at the sweet spot between full container orchestration (GKE) and functions-as-a-service (Cloud Functions). It gives you the deployment simplicity of serverless with the flexibility of containers.


1. Why Cloud Run Exists: The Problem It Solves

The Container Deployment Gap

Before Cloud Run, deploying a containerized application on GCP required:

  1. GKE: Set up a Kubernetes cluster, configure nodes, deployments, services, ingress, autoscaling, TLS certificates, etc. Minimum cost ~$70/month (control plane) even with zero traffic.
  2. Compute Engine: Manage VMs, install Docker, configure health checks, load balancing, auto-scaling groups, TLS — all manually.
  3. App Engine: Constrained to specific runtimes and deployment models.

Cloud Run launched in April 2019 (GA November 2019) as a way to deploy containers with zero infrastructure management. Built on Knative (an open-source serverless platform for Kubernetes), Cloud Run runs on Google's own Borg/Kubernetes infrastructure.

Cloud Run vs. Competitors

FeatureCloud RunAWS LambdaAWS FargateAzure Container Apps
InputAny containerZip/container (specific runtimes)Any containerAny container
Scale to zeroYesYesNo (min 1 task)Yes
Max timeout60 minutes15 minutesUnlimitedUnlimited
Max memory32 GB10 GB120 GB4 GB per replica
ConcurrencyUp to 1000 req/instance1 req/instanceUnlimitedConfigurable
Cold start0.5-5s (depends on image)0.1-10s (depends on runtime)30-60s1-10s
PricingPer-request + CPU/memoryPer-invocation + durationAlways-onPer-request + CPU/memory

2. First Principles: How Cloud Run Works

Architecture

Key Concepts

ConceptDefinition
ServiceA named, auto-scaling group of container instances serving requests
RevisionAn immutable snapshot of a service configuration (code + config)
InstanceA running container that handles requests
ConcurrencyNumber of simultaneous requests a single instance handles

The Request Lifecycle

The Concurrency Model

This is Cloud Run's most important and most misunderstood feature. Unlike Lambda (1 request per instance), Cloud Run can handle multiple concurrent requests per instance.

Default concurrency: 80 requests per instance (configurable 1-1000).

Instances Needed=Concurrent RequestsConcurrency Setting

For 400 concurrent requests with concurrency of 80:

Instances=40080=5

TIP

Higher concurrency means fewer instances, lower cost, and better resource utilization. Set concurrency as high as your application can handle without degrading response times. For I/O-bound services (most APIs), 80-200 is typical. For CPU-bound services, lower it to 1-10.


3. Container Requirements

What Cloud Run Expects

  1. Your container must listen on the port specified by the PORT environment variable (default: 8080)
  2. The container must be stateless (no local disk persistence across requests)
  3. The container must start within the startup timeout (default: 300s, max: 3600s)
  4. x86_64 or ARM64 architecture
  5. Linux containers only

Production Dockerfile

dockerfile
# Multi-stage build for minimal image size
FROM node:20-slim AS builder
WORKDIR /app

# Install dependencies first (cached layer)
COPY package.json package-lock.json ./
RUN npm ci --only=production

# Copy source and build
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build

# Production stage
FROM node:20-slim AS production
WORKDIR /app

# Security: run as non-root
RUN groupadd -r app && useradd -r -g app app

# Copy only production artifacts
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./

# Cloud Run sets PORT env var
ENV NODE_ENV=production
EXPOSE 8080

USER app

# Use exec form for proper signal handling
CMD ["node", "dist/server.js"]

Application Setup

typescript
// src/server.ts — Cloud Run-optimized server
import Fastify from 'fastify';
import { Logger } from './logger';

const logger = new Logger({ service: 'api' });

async function start() {
  const app = Fastify({
    logger: false, // Use structured logging instead
    trustProxy: true, // Cloud Run is behind a proxy
    // Increase keep-alive for connection reuse
    keepAliveTimeout: 620 * 1000, // > Cloud Run's 600s timeout
    connectionTimeout: 0,
  });

  // Health check endpoint
  app.get('/', async () => ({ status: 'healthy' }));

  // Readiness probe (Cloud Run checks this)
  app.get('/ready', async () => {
    // Check dependencies (DB, cache, etc.)
    const dbHealthy = await checkDatabase();
    if (!dbHealthy) {
      throw { statusCode: 503, message: 'Database unavailable' };
    }
    return { status: 'ready' };
  });

  // Application routes
  app.register(import('./routes/orders'));
  app.register(import('./routes/users'));

  // Listen on the PORT Cloud Run provides
  const port = parseInt(process.env.PORT ?? '8080', 10);
  const host = '0.0.0.0'; // Must bind to all interfaces

  await app.listen({ port, host });
  logger.info(`Server listening on ${host}:${port}`);

  // Graceful shutdown
  const shutdown = async (signal: string) => {
    logger.info(`Received ${signal}, shutting down gracefully`);

    // Stop accepting new requests
    await app.close();

    // Close database connections
    await closeDatabasePool();

    logger.info('Shutdown complete');
    process.exit(0);
  };

  process.on('SIGTERM', () => shutdown('SIGTERM'));
  process.on('SIGINT', () => shutdown('SIGINT'));
}

start().catch((err) => {
  logger.error('Failed to start server', { error: err });
  process.exit(1);
});

4. Scaling Behavior

Scale to Zero

When no requests arrive for a period (typically ~15 minutes, but variable), Cloud Run scales down to zero instances. You pay nothing. The next request triggers a cold start.

Scaling Up

The autoscaler targets a concurrency utilization of ~70% by default:

Target Instances=Current Concurrent RequestsConcurrency×0.7

Scaling is not instantaneous — there is a ramp-up period. For sudden spikes, Cloud Run can overshoot and start more instances than needed, then scale back down.

Cold Start Optimization

FactorImpactOptimization
Image sizeProportionalMulti-stage builds, distroless base images
Language runtimeVariesGo/Rust: ~100ms; Java: 1-5s; Node.js: 200-500ms
Dependency loadingProportionalLazy loading, reduce dependencies
Database connections1-5s for new connectionsUse connection pooling, defer until needed
TLS handshakes50-100ms per external serviceConnection reuse, keep-alive
Startup CPU2x boost during startupTake advantage of it — do heavy init early
typescript
// Optimization: Defer heavy initialization
let dbPool: Pool | null = null;

// DON'T do this at startup — it blocks cold start
// const dbPool = new Pool({ connectionString: process.env.DATABASE_URL });

// DO: Lazy initialize on first request
async function getDbPool(): Promise<Pool> {
  if (!dbPool) {
    dbPool = new Pool({
      connectionString: process.env.DATABASE_URL,
      max: 5, // Cloud Run instances handle multiple requests
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 5000,
    });
  }
  return dbPool;
}

Minimum Instances

To avoid cold starts entirely, set a minimum instance count:

hcl
resource "google_cloud_run_v2_service" "api" {
  name     = "api-service"
  location = "us-central1"

  template {
    scaling {
      min_instance_count = 1   # Always warm
      max_instance_count = 100
    }

    containers {
      image = "gcr.io/my-project/api:latest"

      resources {
        limits = {
          cpu    = "2"
          memory = "1Gi"
        }
        cpu_idle = true # Only charge for CPU during requests (with min instances)
        startup_cpu_boost = true # 2x CPU during startup
      }

      ports {
        container_port = 8080
      }
    }
  }
}

WARNING

With min_instance_count > 0 and cpu_idle = false, you pay for CPU continuously (like a VM). With cpu_idle = true, you only pay for CPU when handling requests, but the instance stays warm (no cold start). Memory is always billed for min instances.


5. Traffic Management

Revision-Based Traffic Splitting

Cloud Run's traffic splitting is built on Knative's traffic management:

hcl
resource "google_cloud_run_v2_service" "api" {
  name     = "api-service"
  location = "us-central1"

  template {
    revision = "api-service-v3"
    containers {
      image = "gcr.io/my-project/api:v3.0"
    }
  }

  traffic {
    type     = "TRAFFIC_TARGET_ALLOCATION_TYPE_REVISION"
    revision = "api-service-v2"
    percent  = 90
  }

  traffic {
    type     = "TRAFFIC_TARGET_ALLOCATION_TYPE_REVISION"
    revision = "api-service-v3"
    percent  = 10
  }
}

Blue-Green Deployments

bash
# Deploy new revision with 0% traffic
gcloud run deploy api-service \
  --image gcr.io/my-project/api:v3.0 \
  --no-traffic

# Test the new revision directly
curl https://api-service-v3---api-service-xxxxx.run.app/health

# If tests pass, shift all traffic
gcloud run services update-traffic api-service \
  --to-revisions=api-service-v3=100

# If something goes wrong, instant rollback
gcloud run services update-traffic api-service \
  --to-revisions=api-service-v2=100

Canary with Automatic Rollback

typescript
// deploy/canary.ts — Automated canary deployment
import { CloudRunClient } from './cloud-run-client';
import { MonitoringClient } from './monitoring-client';

interface CanaryConfig {
  service: string;
  region: string;
  newRevision: string;
  steps: number[];          // [5, 25, 50, 100]
  stepDurationMs: number;   // Time at each step
  errorThreshold: number;   // Max error rate (0.01 = 1%)
  latencyThreshold: number; // Max P99 latency (ms)
}

async function canaryDeploy(config: CanaryConfig): Promise<boolean> {
  const cloudRun = new CloudRunClient(config.service, config.region);
  const monitoring = new MonitoringClient(config.service);

  for (const percent of config.steps) {
    console.log(`Setting traffic to ${percent}% for ${config.newRevision}`);
    await cloudRun.setTrafficSplit(config.newRevision, percent);

    // Wait and monitor
    await sleep(config.stepDurationMs);

    const metrics = await monitoring.getRevisionMetrics(config.newRevision, config.stepDurationMs);

    if (metrics.errorRate > config.errorThreshold) {
      console.error(`Error rate ${metrics.errorRate} exceeds threshold ${config.errorThreshold}`);
      await cloudRun.rollback();
      return false;
    }

    if (metrics.latencyP99 > config.latencyThreshold) {
      console.error(`P99 latency ${metrics.latencyP99}ms exceeds threshold ${config.latencyThreshold}ms`);
      await cloudRun.rollback();
      return false;
    }

    console.log(`Step ${percent}% healthy: error=${metrics.errorRate}, p99=${metrics.latencyP99}ms`);
  }

  return true;
}

6. Cloud Run Jobs

Cloud Run Jobs (GA 2023) run containers to completion — they do not serve HTTP requests. Use for:

  • Database migrations
  • Batch processing
  • Scheduled tasks (with Cloud Scheduler)
  • Data imports/exports
hcl
resource "google_cloud_run_v2_job" "migration" {
  name     = "db-migration"
  location = "us-central1"

  template {
    task_count  = 1
    parallelism = 1

    template {
      containers {
        image = "gcr.io/my-project/migration:latest"

        env {
          name  = "DATABASE_URL"
          value_source {
            secret_key_ref {
              secret  = google_secret_manager_secret.db_url.secret_id
              version = "latest"
            }
          }
        }

        resources {
          limits = {
            cpu    = "2"
            memory = "2Gi"
          }
        }
      }

      timeout     = "3600s"
      max_retries = 1
    }
  }
}

7. Security

Authentication and Authorization

Service-to-Service Authentication

typescript
// Calling another Cloud Run service with authentication
import { GoogleAuth } from 'google-auth-library';

const auth = new GoogleAuth();

async function callInternalService(
  serviceUrl: string,
  path: string,
  body: unknown,
): Promise<unknown> {
  // Get an ID token for the target service
  const client = await auth.getIdTokenClient(serviceUrl);

  const response = await client.request({
    url: `${serviceUrl}${path}`,
    method: 'POST',
    data: body,
    headers: { 'Content-Type': 'application/json' },
  });

  return response.data;
}

// Usage
const result = await callInternalService(
  'https://order-service-xxxxx-uc.a.run.app',
  '/api/orders',
  { customerId: '123', items: [{ productId: 'abc', quantity: 1 }] },
);

Secrets Management

hcl
# Use Secret Manager for sensitive configuration
resource "google_secret_manager_secret" "api_key" {
  secret_id = "api-key"

  replication {
    auto {}
  }
}

resource "google_secret_manager_secret_version" "api_key" {
  secret      = google_secret_manager_secret.api_key.id
  secret_data = var.api_key_value
}

# Mount as environment variable in Cloud Run
resource "google_cloud_run_v2_service" "api" {
  # ...
  template {
    containers {
      image = "gcr.io/my-project/api:latest"

      env {
        name = "API_KEY"
        value_source {
          secret_key_ref {
            secret  = google_secret_manager_secret.api_key.secret_id
            version = "latest"
          }
        }
      }
    }

    service_account = google_service_account.api.email
  }
}

# Grant access to secret
resource "google_secret_manager_secret_iam_member" "api_key_access" {
  secret_id = google_secret_manager_secret.api_key.id
  role      = "roles/secretmanager.secretAccessor"
  member    = "serviceAccount:${google_service_account.api.email}"
}

8. Networking

VPC Connectivity

Cloud Run services can connect to VPC resources (Cloud SQL, Memorystore, internal services) via:

  1. Direct VPC egress (recommended) — Direct connection to VPC, no connector needed
  2. Serverless VPC Access Connector — Legacy approach, uses a managed connector
hcl
resource "google_cloud_run_v2_service" "api" {
  name     = "api-service"
  location = "us-central1"

  template {
    vpc_access {
      # Direct VPC egress (preferred)
      network_interfaces {
        network    = google_compute_network.main.id
        subnetwork = google_compute_subnetwork.cloud_run.id
      }
      egress = "PRIVATE_RANGES_ONLY" # Only route private IPs through VPC
    }

    containers {
      image = "gcr.io/my-project/api:latest"
    }
  }
}

Custom Domains

hcl
# Map custom domain to Cloud Run service
resource "google_cloud_run_domain_mapping" "api" {
  location = "us-central1"
  name     = "api.example.com"

  metadata {
    namespace = var.project_id
  }

  spec {
    route_name = google_cloud_run_v2_service.api.name
  }
}

# For global load balancing with Cloud CDN
resource "google_compute_backend_service" "api" {
  name = "api-backend"

  backend {
    group = google_compute_region_network_endpoint_group.api.id
  }

  log_config {
    enable = true
  }
}

resource "google_compute_region_network_endpoint_group" "api" {
  name                  = "api-neg"
  network_endpoint_type = "SERVERLESS"
  region                = "us-central1"

  cloud_run {
    service = google_cloud_run_v2_service.api.name
  }
}

9. Observability

Structured Logging

Cloud Run automatically sends stdout/stderr to Cloud Logging. Use structured JSON for rich querying:

typescript
// logger.ts — Cloud Run structured logging
interface LogEntry {
  severity: 'DEBUG' | 'INFO' | 'WARNING' | 'ERROR' | 'CRITICAL';
  message: string;
  httpRequest?: {
    requestMethod: string;
    requestUrl: string;
    status: number;
    latency: string;
    userAgent: string;
    remoteIp: string;
  };
  'logging.googleapis.com/trace'?: string;
  'logging.googleapis.com/spanId'?: string;
  [key: string]: unknown;
}

export function log(entry: LogEntry): void {
  // Cloud Run picks up JSON from stdout
  console.log(JSON.stringify(entry));
}

// Middleware to extract trace context
export function traceMiddleware(req: any, res: any, next: any): void {
  const traceHeader = req.headers['x-cloud-trace-context'];
  if (traceHeader) {
    const [traceId, spanId] = traceHeader.split('/');
    req.traceId = `projects/${process.env.GOOGLE_CLOUD_PROJECT}/traces/${traceId}`;
    req.spanId = spanId?.split(';')[0];
  }
  next();
}

10. Cost Model

Pricing Components

ComponentCostBilling
CPU$0.00002400/vCPU-secondPer-second, during request processing
Memory$0.00000250/GiB-secondPer-second, during request processing
Requests$0.40 per millionPer request
NetworkingStandard GCP egress ratesPer GB

CPU Allocation Modes

Modecpu_idleWhen CPU Is AvailableWhen BilledUse Case
CPU during requests onlytrueOnly during request processingPer-requestAPIs, web services
CPU always allocatedfalseAlways (even between requests)Always-onBackground work, WebSockets

Cost Example: API Service

For a service handling 10M requests/month, 100ms average latency, 1 vCPU, 512MB memory:

CPU=10,000,000×0.1s×1vCPU×$0.0000240=$24.00Memory=10,000,000×0.1s×0.5GiB×$0.0000025=$1.25Requests=10×$0.40=$4.00Total=$29.25/month

Compare to a Compute Engine e2-medium running 24/7: ~$24.27/month — but it handles far fewer concurrent requests and requires manual scaling.

War Story

A startup migrated 12 microservices from GKE (3-node cluster, ~$250/month) to Cloud Run. Most services had sporadic traffic. Monthly cost dropped to $45 because services scaled to zero when not in use. The exception was their WebSocket service — Cloud Run does not support WebSocket well (connections are capped at 60 minutes), so that stayed on GKE.


11. Edge Cases and Failure Modes

IssueCauseMitigation
Request timeout (60 min max)Long-running processUse Cloud Run Jobs instead
Cold start spikesTraffic bursts to idle serviceSet min instances > 0
503 errors during deploymentOld revision drainingUse gradual traffic migration
Memory exceeded (killed)Memory leak or large payloadMonitor memory, set appropriate limits
Container fails to startPort misconfigurationEnsure PORT env var is used
Intermittent 429 errorsPer-service scaling limitsRequest quota increase

12. Decision Framework

Choose Cloud Run WhenChoose GKE WhenChoose Cloud Functions When
HTTP services, APIsComplex service mesh neededSimple event handlers
Containers with any languageStateful workloadsNo container expertise
Scale-to-zero neededGPU/TPU workloadsTight GCP event integration
Simple deployment modelCustom schedulingVery small functions
Cost-sensitive, variable trafficNeed DaemonSets, sidecarsSub-second cold starts critical

See Also

  • GCP Overview — GCP fundamentals and resource hierarchy
  • GKE — When you need full Kubernetes
  • Cloud SQL — Connecting Cloud Run to databases
  • Pub/Sub — Event-driven Cloud Run services
  • IAM — Service accounts and invoker permissions
  • Cost Optimization — Cloud Run cost strategies

"What I cannot create, I do not understand." — Richard Feynman