Deploy Node.js to Production
Deploying a Node.js application to production is not node server.js on a VPS. A production deployment must handle process management, graceful shutdowns, health checks, logging, secret management, horizontal scaling, and zero-downtime updates. This page covers the complete deployment lifecycle — from building a Docker image to running it on every major platform — with production-tested configurations and operational checklists.
Docker Containerization
Docker is the universal deployment artifact for Node.js. Regardless of which platform you deploy to, you build a Docker image first.
Production Dockerfile
# syntax=docker/dockerfile:1
# ---- Base ----
FROM node:20-alpine AS base
WORKDIR /app
# Security: run as non-root user
RUN addgroup --system --gid 1001 appgroup && \
adduser --system --uid 1001 appuser
# ---- Dependencies ----
FROM base AS deps
# Copy only package files for better layer caching
COPY package.json pnpm-lock.yaml ./
RUN corepack enable pnpm && \
pnpm install --frozen-lockfile --prod
# ---- Build ----
FROM base AS build
COPY package.json pnpm-lock.yaml ./
RUN corepack enable pnpm && \
pnpm install --frozen-lockfile
COPY . .
RUN pnpm run build
# ---- Production ----
FROM base AS production
ENV NODE_ENV=production
# Copy production dependencies
COPY --from=deps /app/node_modules ./node_modules
# Copy built application
COPY --from=build /app/dist ./dist
COPY package.json ./
# Switch to non-root user
USER appuser
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
# Start
CMD ["node", "dist/server.js"]Multi-Stage Build Breakdown
.dockerignore
node_modules
npm-debug.log
.git
.gitignore
.env
.env.*
Dockerfile
docker-compose*.yml
.dockerignore
README.md
docs/
tests/
coverage/
.nyc_output/
.vscode/
.idea/Never Include node_modules
Never copy node_modules from the host into the container. Dependencies must be installed inside the container to match the container's OS and architecture (Alpine Linux, not macOS/Windows).
Docker Compose for Local Development
# docker-compose.yml
services:
app:
build:
context: .
target: production
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgres://postgres:password@db:5432/myapp
- REDIS_URL=redis://cache:6379
depends_on:
db:
condition: service_healthy
cache:
condition: service_healthy
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: myapp
POSTGRES_PASSWORD: password
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
cache:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
volumes:
pgdata:Health Checks
Every production Node.js application needs health check endpoints. Load balancers, container orchestrators, and platform health systems all rely on them.
// src/health.ts
import type { FastifyInstance } from 'fastify';
interface HealthStatus {
status: 'healthy' | 'degraded' | 'unhealthy';
timestamp: string;
uptime: number;
checks: Record<string, {
status: 'pass' | 'fail';
latency_ms: number;
message?: string;
}>;
}
export async function healthRoutes(app: FastifyInstance) {
// Liveness probe — is the process running?
app.get('/health/live', async () => {
return { status: 'ok' };
});
// Readiness probe — can the process handle requests?
app.get('/health/ready', async (request, reply) => {
const checks: HealthStatus['checks'] = {};
// Check database
const dbStart = Date.now();
try {
await app.db.query('SELECT 1');
checks.database = { status: 'pass', latency_ms: Date.now() - dbStart };
} catch (err) {
checks.database = {
status: 'fail',
latency_ms: Date.now() - dbStart,
message: err.message,
};
}
// Check Redis
const redisStart = Date.now();
try {
await app.redis.ping();
checks.redis = { status: 'pass', latency_ms: Date.now() - redisStart };
} catch (err) {
checks.redis = {
status: 'fail',
latency_ms: Date.now() - redisStart,
message: err.message,
};
}
const allPassing = Object.values(checks).every(c => c.status === 'pass');
const health: HealthStatus = {
status: allPassing ? 'healthy' : 'unhealthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks,
};
reply.code(allPassing ? 200 : 503).send(health);
});
}Health Check Configuration by Platform
| Platform | Liveness | Readiness | Configuration |
|---|---|---|---|
| Docker | HEALTHCHECK in Dockerfile | Same | Dockerfile instruction |
| Kubernetes | livenessProbe | readinessProbe | Pod spec |
| Cloud Run | Automatic (TCP) | Startup probe | Service config |
| ECS | ALB health check | Container health check | Task definition |
| Railway | Automatic | /health endpoint | Service settings |
Graceful Shutdown
When a container is stopped, it receives SIGTERM. The process must stop accepting new connections, finish in-flight requests, close database connections, and exit cleanly.
// src/server.ts
import Fastify from 'fastify';
const app = Fastify({ logger: true });
// Register routes, plugins, etc.
await app.register(healthRoutes);
await app.register(databasePlugin);
await app.register(apiRoutes);
// Start server
await app.listen({ port: 3000, host: '0.0.0.0' });
// Graceful shutdown
async function shutdown(signal: string) {
app.log.info(`Received ${signal}. Starting graceful shutdown...`);
// Stop accepting new connections and finish in-flight requests
// Fastify's close() handles this automatically
try {
await app.close();
app.log.info('Server closed. Exiting.');
process.exit(0);
} catch (err) {
app.log.error(err, 'Error during shutdown');
process.exit(1);
}
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
// Prevent unhandled rejections from crashing silently
process.on('unhandledRejection', (err) => {
app.log.error(err, 'Unhandled rejection');
process.exit(1);
});Always Bind to 0.0.0.0
app.listen({ port: 3000 }) binds to 127.0.0.1 by default — this works on your machine but is unreachable inside a Docker container. Always specify host: '0.0.0.0' to bind to all interfaces.
Deploy to GCP Cloud Run
Cloud Run is a serverless container platform. You push a Docker image, Cloud Run runs it, and scales from zero to thousands of instances based on traffic.
# Authenticate and set project
gcloud auth login
gcloud config set project my-project-id
# Build and push image to Google Container Registry
gcloud builds submit --tag gcr.io/my-project-id/myapp
# Deploy to Cloud Run
gcloud run deploy myapp \
--image gcr.io/my-project-id/myapp \
--platform managed \
--region us-central1 \
--port 3000 \
--memory 512Mi \
--cpu 1 \
--min-instances 0 \
--max-instances 10 \
--concurrency 80 \
--timeout 30s \
--set-env-vars "NODE_ENV=production" \
--set-secrets "DATABASE_URL=db-url:latest,JWT_SECRET=jwt-secret:latest" \
--allow-unauthenticatedCloud Run Architecture
Cloud Run Configuration
# service.yaml — declarative Cloud Run configuration
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: myapp
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/maxScale: "10"
run.googleapis.com/cpu-throttling: "false"
spec:
containerConcurrency: 80
timeoutSeconds: 30
containers:
- image: gcr.io/my-project-id/myapp
ports:
- containerPort: 3000
resources:
limits:
memory: 512Mi
cpu: "1"
env:
- name: NODE_ENV
value: production
startupProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3Deploy to AWS ECS Fargate
ECS Fargate runs containers without managing servers. You define a task (container spec) and a service (how many tasks to run).
{
"family": "myapp",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "myapp",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest",
"portMappings": [
{ "containerPort": 3000, "protocol": "tcp" }
],
"healthCheck": {
"command": ["CMD-SHELL", "wget -q --spider http://localhost:3000/health/live || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 15
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/myapp",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:myapp/DATABASE_URL"
}
],
"environment": [
{ "name": "NODE_ENV", "value": "production" },
{ "name": "PORT", "value": "3000" }
]
}
]
}Deploy to Railway
Railway is the simplest platform for fullstack deployments. It auto-detects Node.js, builds from your Dockerfile (or uses Nixpacks), and provides managed databases.
# Install Railway CLI
npm install -g @railway/cli
# Login and link
railway login
railway link
# Deploy
railway up
# Add a PostgreSQL database
railway add --plugin postgresql
# Set environment variables
railway variables set JWT_SECRET=your-secret-keyRailway detects your Dockerfile automatically. If no Dockerfile exists, it uses Nixpacks to build from source.
Railway Configuration
# railway.toml
[build]
builder = "DOCKERFILE"
dockerfilePath = "Dockerfile"
[deploy]
healthcheckPath = "/health/live"
healthcheckTimeout = 30
restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 5Deploy to Fly.io
Fly.io runs containers on edge servers worldwide. Deploy once, run in multiple regions.
# Install and authenticate
curl -L https://fly.io/install.sh | sh
fly auth login
# Initialize (creates fly.toml)
fly launch
# Deploy
fly deploy
# Scale to multiple regions
fly regions add lax ord ams
fly scale count 3# fly.toml
app = "myapp"
primary_region = "iad"
[build]
dockerfile = "Dockerfile"
[http_service]
internal_port = 3000
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[services]]
protocol = "tcp"
internal_port = 3000
[[services.ports]]
port = 443
handlers = ["tls", "http"]
[[services.http_checks]]
interval = 15000
grace_period = "10s"
method = "get"
path = "/health/live"
protocol = "http"
timeout = 5000Deploy to Render
Render provides managed infrastructure with a Git-push deployment workflow.
# render.yaml (Infrastructure as Code)
services:
- type: web
name: myapp
runtime: docker
dockerfilePath: ./Dockerfile
envVars:
- key: NODE_ENV
value: production
- key: DATABASE_URL
fromDatabase:
name: mydb
property: connectionString
healthCheckPath: /health/live
autoDeploy: true
databases:
- name: mydb
databaseName: myapp
plan: starterPM2 vs Docker vs Kubernetes
| Feature | PM2 | Docker (single host) | Kubernetes |
|---|---|---|---|
| Process management | Built-in | Docker restart policy | Pod restart policy |
| Clustering | pm2 start -i max | Docker Compose replicas | Deployment replicas |
| Zero-downtime deploy | pm2 reload | Rolling update | Rolling update |
| Resource limits | OS-level (ulimit) | Container-level | Pod-level (requests/limits) |
| Monitoring | pm2 monit | Docker stats | Prometheus + Grafana |
| Complexity | Low | Medium | High |
| Best for | Single VPS | Small to medium | Large scale |
PM2 Inside Docker?
Do not run PM2 inside Docker. Docker is already your process manager — if the process crashes, Docker restarts the container. PM2 adds unnecessary complexity. Use PM2 only when deploying directly to a VPS without Docker.
Environment Variables and Secrets
Never Do This
# NEVER hardcode secrets
const JWT_SECRET = 'my-super-secret-key'; // hardcoded
# NEVER commit .env files
git add .env # this exposes secrets in git history foreverSecret Management by Platform
| Platform | Secret Storage | How to Use |
|---|---|---|
| Cloud Run | GCP Secret Manager | --set-secrets flag |
| ECS | AWS Secrets Manager / Parameter Store | Task definition secrets |
| Railway | Built-in variable management | Dashboard or CLI |
| Fly.io | fly secrets set | Encrypted, injected as env vars |
| Render | Dashboard environment groups | Per-service or shared |
| Kubernetes | Kubernetes Secrets (+ external-secrets-operator) | Volume mount or env |
# Example: GCP Secret Manager
echo -n "postgres://user:pass@host/db" | \
gcloud secrets create db-url --data-file=-
# Reference in Cloud Run deployment
gcloud run deploy myapp --set-secrets "DATABASE_URL=db-url:latest"Logging Setup
Production Node.js applications must output structured JSON logs to stdout. Every platform collects stdout and routes it to a log aggregation service.
import pino from 'pino';
// Production logger configuration
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
// In production, output JSON. In dev, use pino-pretty.
...(process.env.NODE_ENV === 'production'
? {}
: { transport: { target: 'pino-pretty' } }),
// Add standard fields
base: {
service: 'myapp',
version: process.env.APP_VERSION || 'unknown',
environment: process.env.NODE_ENV,
},
});
// Structured logging
logger.info({ userId: '123', action: 'login' }, 'User logged in');
// {"level":30,"time":1616000000,"service":"myapp","userId":"123","action":"login","msg":"User logged in"}
logger.error({ err, requestId: req.id }, 'Failed to process payment');
// {"level":50,"time":1616000000,"service":"myapp","err":{"type":"PaymentError","message":"...","stack":"..."},"requestId":"abc","msg":"Failed to process payment"}Log Routing by Platform
Monitoring Setup
Key Metrics to Track
| Metric | Why It Matters | Alert Threshold |
|---|---|---|
| Response time (p95) | User experience | > 500ms |
| Error rate (5xx) | Application health | > 1% |
| CPU utilization | Capacity planning | > 80% sustained |
| Memory utilization | Memory leaks | > 85% |
| Event loop lag | Node.js health | > 100ms |
| Active connections | Load tracking | > 80% of max |
Prometheus Metrics Example
import { collectDefaultMetrics, Counter, Histogram, register } from 'prom-client';
// Collect default Node.js metrics (memory, CPU, event loop)
collectDefaultMetrics();
// Custom metrics
const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
});
const httpRequestTotal = new Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
});
// Middleware to track metrics
app.addHook('onResponse', (request, reply, done) => {
const duration = reply.elapsedTime / 1000; // convert to seconds
const labels = {
method: request.method,
route: request.routeOptions?.url || 'unknown',
status_code: reply.statusCode.toString(),
};
httpRequestDuration.observe(labels, duration);
httpRequestTotal.inc(labels);
done();
});
// Expose metrics endpoint
app.get('/metrics', async (request, reply) => {
reply.header('Content-Type', register.contentType);
return register.metrics();
});Production Checklist
Before deploying to production, verify every item:
- [ ] Docker image builds successfully —
docker build -t myapp . - [ ] Image runs locally —
docker run -p 3000:3000 myapp - [ ] Health check endpoint responds —
curl http://localhost:3000/health/live - [ ] Graceful shutdown works —
docker stopexits cleanly within 10s - [ ] No hardcoded secrets — all secrets via environment variables
- [ ] Structured JSON logging —
pinoor equivalent writing to stdout - [ ] Non-root user — container runs as non-root (UID 1001+)
- [ ] Resource limits set — memory and CPU limits configured
- [ ] CI/CD pipeline configured — automated test + deploy on merge to main
- [ ] Rollback plan documented — know how to revert to previous version
Cross-References
- Node.js Internals — event loop, V8, and cluster module
- Fastify Deep Dive — production Fastify configuration
- Deploy Next.js — Next.js-specific deployment
- Deployment Strategies — blue-green, canary, rolling updates
- Monitoring — observability setup
- Disaster Recovery — backup and restore procedures
Summary
| Step | Action |
|---|---|
| 1. Containerize | Multi-stage Dockerfile with non-root user |
| 2. Health checks | /health/live (liveness) + /health/ready (readiness) |
| 3. Graceful shutdown | Handle SIGTERM, close connections, finish in-flight requests |
| 4. Choose platform | Cloud Run (serverless), Railway (simple), ECS (enterprise) |
| 5. Secrets management | Platform secret manager, never in code or git |
| 6. Logging | Structured JSON to stdout, platform collects automatically |
| 7. Monitoring | Prometheus metrics + dashboards + alerts |
| 8. CI/CD | Automated test and deploy pipeline |
| 9. Rollback plan | Know how to revert to previous version in < 5 minutes |