Skip to content

Audit Log Service Blueprint

An audit log answers one question: who did what, to what, and when? It is the security camera of your application — a tamper-evident, immutable record of every significant action. Without it, investigating security incidents is guesswork, compliance audits are painful, and debugging customer issues ("I never changed that setting") is impossible.

This blueprint covers a production-grade audit log service that captures structured events, provides tamper detection through hash chaining, supports compliance requirements (SOC 2, GDPR, HIPAA), and scales to billions of events.

Overview & Requirements

Functional Requirements

RequirementDescription
Event captureRecord every significant action with actor, target, action, and metadata
ImmutabilityEvents cannot be modified or deleted (except by retention policy)
Tamper detectionCryptographic hash chain to detect unauthorized modifications
QueryingSearch by actor, action, target, time range, and metadata
RetentionConfigurable retention periods per event category
ExportExport events in CSV, JSON, or SIEM-compatible formats
Real-time streamingStream events to external systems (Splunk, Datadog, S3)
ComplianceSOC 2 Type II, GDPR Article 30, HIPAA audit trail

Non-Functional Requirements

RequirementTarget
Write latency< 50ms (async accepted)
Query latency (indexed)< 500ms
Event throughput10,000 events/second
Retention7 years (configurable)
Availability99.99% for writes
DurabilityZero event loss
StoragePetabyte-scale archival

What Gets Audited

Not everything belongs in an audit log. Focus on security-relevant and compliance-relevant actions:

CategoryExamples
AuthenticationLogin, logout, failed login, MFA enabled/disabled, password changed
AuthorizationPermission granted, role changed, API key created/revoked
Data accessRecord viewed, exported, searched (for sensitive data)
Data mutationRecord created, updated, deleted
ConfigurationSetting changed, feature flag toggled, webhook modified
AdministrativeUser invited, suspended, billing plan changed
SystemDeployment, migration, maintenance mode enabled

Architecture Diagram

Core Components Deep Dive

Event Schema

Every audit event follows a standardized schema. Consistency is critical — if different services log events in different formats, the audit log becomes unsearchable.

typescript
// audit-event.ts
interface AuditEvent {
  // Identity
  id: string;                    // UUIDv7 (time-ordered)
  timestamp: string;             // ISO 8601 with timezone

  // Who
  actor: {
    type: 'user' | 'service' | 'system' | 'api_key';
    id: string;                  // User ID, service name, or API key ID
    email?: string;              // For user actors
    ip?: string;                 // Client IP address
    userAgent?: string;          // Client user agent
  };

  // What
  action: string;                // Dot-notation: "user.create", "settings.update"
  category: string;              // "authentication", "data_access", "configuration"
  outcome: 'success' | 'failure' | 'error';

  // On what
  target: {
    type: string;                // "user", "project", "api_key", "setting"
    id: string;                  // Target entity ID
    name?: string;               // Human-readable name
  };

  // Context
  metadata: Record<string, any>; // Action-specific data
  changes?: {                    // For update actions
    before: Record<string, any>;
    after: Record<string, any>;
  };

  // Location
  source: {
    service: string;             // "auth-service", "billing-service"
    version: string;             // Service version
    environment: string;         // "production", "staging"
  };

  // Tamper detection
  previousHash?: string;         // Hash of the previous event
  hash?: string;                 // SHA-256 of this event
}

Ingestion Service

The ingestion service receives events, validates them, and pushes them to Kafka for ordered processing. It must be fire-and-forget from the caller's perspective — an audit log write must never block or fail the primary operation.

typescript
// ingestion-service.ts
class AuditIngestionService {
  private kafka: KafkaProducer;

  /**
   * Record an audit event. This method is async and non-blocking.
   * The caller should fire and forget — never await in the hot path.
   */
  async record(event: Omit<AuditEvent, 'id' | 'timestamp' | 'hash' | 'previousHash'>): Promise<void> {
    const auditEvent: AuditEvent = {
      id: uuidv7(),
      timestamp: new Date().toISOString(),
      ...event,
    };

    // Validate schema
    const validation = auditEventSchema.safeParse(auditEvent);
    if (!validation.success) {
      this.logger.error('Invalid audit event', {
        errors: validation.error.issues,
        event: auditEvent,
      });
      this.metrics.increment('audit.ingestion.validation_errors');
      return; // Never throw — audit failures must not break the app
    }

    // Publish to Kafka — partition by actor ID for ordering per actor
    await this.kafka.produce({
      topic: 'audit-events',
      key: auditEvent.actor.id,
      value: JSON.stringify(auditEvent),
      headers: {
        'event-type': auditEvent.action,
        'source-service': auditEvent.source.service,
      },
    });

    this.metrics.increment('audit.ingestion.events_recorded');
  }
}

// Usage in application code
class UserService {
  async updateUserRole(adminId: string, targetUserId: string, newRole: string): Promise<void> {
    const user = await this.db.getUser(targetUserId);
    const oldRole = user.role;

    // Perform the action
    await this.db.updateUser(targetUserId, { role: newRole });

    // Fire-and-forget audit event
    this.audit.record({
      actor: { type: 'user', id: adminId },
      action: 'user.role_changed',
      category: 'authorization',
      outcome: 'success',
      target: { type: 'user', id: targetUserId, name: user.email },
      changes: {
        before: { role: oldRole },
        after: { role: newRole },
      },
      metadata: { reason: 'Promoted to team lead' },
      source: { service: 'user-service', version: '2.3.1', environment: 'production' },
    });
  }
}

Never Block on Audit Writes

The audit log must never fail the operation it is recording. If the audit system is down, the operation should still succeed — the event will be retried or captured through a dead-letter queue. Use async, fire-and-forget patterns. See Message Queues for reliability patterns.

Hash Chain for Tamper Detection

Each event includes a hash of its contents and a reference to the previous event's hash, forming a chain. If anyone modifies or deletes an event in the database, the chain breaks and the tampering is detectable.

typescript
// hash-chain.ts
import { createHash } from 'crypto';

class HashChainGenerator {
  private previousHash: string = '0'.repeat(64); // Genesis hash

  constructor(private redis: Redis) {}

  async initialize(): Promise<void> {
    const stored = await this.redis.get('audit:chain:latest_hash');
    if (stored) this.previousHash = stored;
  }

  async processEvent(event: AuditEvent): Promise<AuditEvent> {
    // Reference the previous hash
    event.previousHash = this.previousHash;

    // Hash the event contents (excluding the hash field itself)
    const hashInput = JSON.stringify({
      id: event.id,
      timestamp: event.timestamp,
      actor: event.actor,
      action: event.action,
      target: event.target,
      metadata: event.metadata,
      changes: event.changes,
      previousHash: event.previousHash,
    });

    event.hash = createHash('sha256').update(hashInput).digest('hex');

    // Update chain state
    this.previousHash = event.hash;
    await this.redis.set('audit:chain:latest_hash', event.hash);

    return event;
  }

  /**
   * Verify the integrity of a sequence of events.
   * Returns the first event where the chain breaks, or null if valid.
   */
  async verify(events: AuditEvent[]): Promise<VerificationResult> {
    let expectedPreviousHash = events[0]?.previousHash;

    for (let i = 0; i < events.length; i++) {
      const event = events[i];

      // Check previous hash reference
      if (i > 0 && event.previousHash !== events[i - 1].hash) {
        return {
          valid: false,
          brokenAt: event.id,
          reason: 'Previous hash mismatch — event may have been deleted or reordered',
        };
      }

      // Recompute hash and compare
      const recomputed = this.computeHash(event);
      if (recomputed !== event.hash) {
        return {
          valid: false,
          brokenAt: event.id,
          reason: 'Hash mismatch — event content may have been modified',
        };
      }
    }

    return { valid: true };
  }
}

Hash Chain Limitations

A hash chain detects tampering but does not prevent it. For higher assurance, periodically publish chain checkpoints to an external, immutable store (like a blockchain or a third-party timestamping service). For SOC 2, having a tamper-detection mechanism is sufficient — you do not need cryptographic proof of non-tampering.

Data Model / Schema

sql
-- Audit events (partitioned by month)
CREATE TABLE audit_events (
    id              UUID NOT NULL,
    timestamp       TIMESTAMPTZ NOT NULL,

    -- Actor
    actor_type      TEXT NOT NULL,
    actor_id        TEXT NOT NULL,
    actor_email     TEXT,
    actor_ip        INET,
    actor_user_agent TEXT,

    -- Action
    action          TEXT NOT NULL,
    category        TEXT NOT NULL,
    outcome         TEXT NOT NULL CHECK (outcome IN ('success', 'failure', 'error')),

    -- Target
    target_type     TEXT NOT NULL,
    target_id       TEXT NOT NULL,
    target_name     TEXT,

    -- Details
    metadata        JSONB DEFAULT '{}',
    changes         JSONB,

    -- Source
    source_service  TEXT NOT NULL,
    source_version  TEXT,
    source_env      TEXT NOT NULL,

    -- Tamper detection
    previous_hash   TEXT,
    hash            TEXT NOT NULL,

    PRIMARY KEY (id, timestamp)
) PARTITION BY RANGE (timestamp);

-- Create monthly partitions
CREATE TABLE audit_events_2026_01 PARTITION OF audit_events
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE audit_events_2026_02 PARTITION OF audit_events
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE audit_events_2026_03 PARTITION OF audit_events
    FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

-- Indexes for common query patterns
CREATE INDEX idx_audit_actor ON audit_events(actor_id, timestamp DESC);
CREATE INDEX idx_audit_target ON audit_events(target_type, target_id, timestamp DESC);
CREATE INDEX idx_audit_action ON audit_events(action, timestamp DESC);
CREATE INDEX idx_audit_category ON audit_events(category, timestamp DESC);

-- Prevent any updates or deletes via a trigger
CREATE OR REPLACE FUNCTION prevent_audit_modification()
RETURNS TRIGGER AS $$
BEGIN
    RAISE EXCEPTION 'Audit log events cannot be modified or deleted';
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER audit_immutability_update
    BEFORE UPDATE ON audit_events
    FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();

CREATE TRIGGER audit_immutability_delete
    BEFORE DELETE ON audit_events
    FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();

Immutability at the Database Level

The triggers above prevent application-level modifications, but a DBA with superuser access can still bypass them. For true immutability: (1) restrict superuser access to break-glass procedures, (2) use the hash chain for tamper detection, (3) replicate events to an external write-only store (S3 with Object Lock, or a dedicated audit SaaS). Defense in depth.

API Design

Record an Event

POST /api/v1/audit/events
Authorization: Bearer <service-token>

{
  "actor": { "type": "user", "id": "user_123", "ip": "203.0.113.42" },
  "action": "project.settings_updated",
  "category": "configuration",
  "outcome": "success",
  "target": { "type": "project", "id": "proj_456", "name": "Acme Dashboard" },
  "changes": {
    "before": { "visibility": "private" },
    "after": { "visibility": "public" }
  },
  "metadata": { "reason": "Making project public for demo" }
}

Response (202 Accepted):
{
  "eventId": "01914a77-8a1a-7000-b000-000000000001",
  "status": "accepted"
}

Query Events

GET /api/v1/audit/events?actor_id=user_123&action=project.*&from=2026-03-01&to=2026-03-20&limit=50
Authorization: Bearer <admin-token>

Response:
{
  "events": [
    {
      "id": "01914a77-8a1a-7000-b000-000000000001",
      "timestamp": "2026-03-20T14:30:00Z",
      "actor": { "type": "user", "id": "user_123", "email": "admin@example.com" },
      "action": "project.settings_updated",
      "category": "configuration",
      "outcome": "success",
      "target": { "type": "project", "id": "proj_456", "name": "Acme Dashboard" },
      "changes": {
        "before": { "visibility": "private" },
        "after": { "visibility": "public" }
      }
    }
  ],
  "total": 1,
  "hasMore": false,
  "cursor": "eyJsYXN0SWQiOiAiMDE5MTRhNzcifQ=="
}

Export Events

POST /api/v1/audit/export
Authorization: Bearer <admin-token>

{
  "format": "csv",
  "from": "2026-01-01",
  "to": "2026-03-31",
  "filters": { "category": "authentication" },
  "deliveryMethod": "email",
  "email": "compliance@example.com"
}

Response (202 Accepted):
{
  "exportId": "export_789",
  "status": "processing",
  "estimatedRows": 150000
}

Verify Chain Integrity

POST /api/v1/audit/verify
Authorization: Bearer <admin-token>

{
  "from": "2026-03-01",
  "to": "2026-03-20"
}

Response:
{
  "valid": true,
  "eventsChecked": 45230,
  "timeRange": { "from": "2026-03-01", "to": "2026-03-20" },
  "verifiedAt": "2026-03-20T15:00:00Z"
}

Compliance Integration

SOC 2 Type II

SOC 2 requires evidence of:

ControlHow the Audit Log Addresses It
Access loggingEvery login, logout, and permission change is recorded
Change managementConfiguration changes include before/after diffs
Data access monitoringSensitive data access is logged with actor context
Incident investigationEvents are searchable by time range, actor, and target
RetentionEvents retained for 7 years with immutability guarantees

GDPR Article 30

GDPR requires a record of processing activities:

RequirementImplementation
What data was processedtarget.type + metadata fields
Why it was processedaction + metadata.reason fields
Who processed itactor fields (user ID, service name)
When it was processedtimestamp field
Right to accessExport API with per-user filtering
Right to erasureAudit events themselves are exempt under legitimate interest, but log the erasure action

GDPR and Audit Logs

GDPR Article 17 (right to erasure) does not require you to delete audit log entries about a user. Audit logs serve a legitimate interest (security, fraud detection, compliance). However, you must document this in your privacy policy and Data Protection Impact Assessment. When you delete a user's data, log the deletion itself as an audit event.

Scaling Considerations

Storage Growth

Events/DayStorage/DayStorage/YearArchive Strategy
100K~200 MB~73 GBPostgreSQL only
1M~2 GB~730 GBPostgreSQL + S3 archive after 90 days
10M~20 GB~7.3 TBPostgreSQL hot (30 days) + S3 Parquet
100M~200 GB~73 TBClickHouse + S3 with Athena

For high-volume audit logs, use ClickHouse for the query layer and S3 with Parquet files for long-term storage. Query archived events with AWS Athena or Presto.

Retention and Archival

Deployment

Infrastructure

ComponentServiceCountNotes
Audit APIECS Fargate3Stateless, behind ALB
KafkaMSK3-node clusterRetention: 7 days, 10 partitions
Chain ProcessorECS Fargate1Single consumer for ordering
PostgreSQLRDS (r6g.xlarge)1 + replicaMonthly partitions
RedisElastiCache1Chain state only
S3Standard + GlacierObject Lock enabled

Monitoring

MetricWarningCritical
Event ingestion lag> 30s> 120s
Chain verification failures> 0
Write failures> 0.01%> 0.1%
Query latency p99> 2s> 5s
Partition size (90-day)> 500 GB> 1 TB

Monitor with Prometheus. Set up daily automated chain verification and alert on any integrity failures.


"An audit log you cannot search is a compliance checkbox. An audit log you can query, verify, and stream is a security tool. Build the tool."

"What I cannot create, I do not understand." — Richard Feynman