Skip to content
Unverified — AI-generated content. Help verify this page

Schema Evolution

Why Schema Evolution Exists

Data schemas change. Business requirements evolve, new features need new fields, old fields become irrelevant, and data types need correction. In a monolithic batch system, you can stop everything, migrate the schema, and restart. In distributed streaming systems with multiple producers and consumers at different versions, this luxury doesn't exist.

Schema evolution is the discipline of changing data structures while maintaining compatibility between producers and consumers at different versions.

The Core Problem

Producer v2 writes data with a new field. Consumer v1 doesn't know about this field. Consumer v3 expects a field that was removed in v2. Can everyone coexist?

Historical Context

  • 2009: Apache Avro designed with schema evolution as a first-class feature
  • 2011: Google's Protocol Buffers v2 formalized field numbering for evolution
  • 2014: Confluent Schema Registry introduced for Kafka
  • 2020s: Schema evolution is now considered essential infrastructure for data platforms
  • 2025: Event-driven architectures and data contracts make schema evolution a critical concern

First Principles

Compatibility Types

There are four types of schema compatibility:

TypeProducerConsumerRule
BackwardOld schemaNew schemaNew consumers can read old data
ForwardNew schemaOld schemaOld consumers can read new data
FullBoth directionsBoth directionsBackward AND Forward
NoneNo guaranteeNo guaranteeAnything goes (dangerous)

Additionally, transitive variants check compatibility across ALL previous versions, not just the immediately preceding one.

The Compatibility Contract

Formally, schema S2 is backward compatible with S1 if:

dvalid(S1):decode(S2,encode(S1,d)) succeeds

Schema S2 is forward compatible with S1 if:

dvalid(S2):decode(S1,encode(S2,d)) succeeds

Schema Evolution Rules by Format

Avro Evolution Rules

Avro uses a writer schema (used to encode) and a reader schema (used to decode). The reader must resolve differences.

ChangeBackwardForwardFull
Add field with defaultYesYesYes
Add field without defaultNoYesNo
Remove field with defaultYesNoNo
Remove field without defaultNoNoNo
Rename field (with alias)YesYesYes
Change type (promotable)DependsDependsDepends
typescript
// Avro schema evolution example
const schemaV1 = {
  type: 'record',
  name: 'User',
  fields: [
    { name: 'id', type: 'string' },
    { name: 'name', type: 'string' },
    { name: 'email', type: 'string' },
  ],
};

// V2: Added 'age' with default (backward + forward compatible)
const schemaV2 = {
  type: 'record',
  name: 'User',
  fields: [
    { name: 'id', type: 'string' },
    { name: 'name', type: 'string' },
    { name: 'email', type: 'string' },
    { name: 'age', type: 'int', default: 0 }, // New field WITH default
  ],
};

// V3: Removed 'email', added 'phone' (ONLY backward compatible)
const schemaV3 = {
  type: 'record',
  name: 'User',
  fields: [
    { name: 'id', type: 'string' },
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int', default: 0 },
    { name: 'phone', type: ['null', 'string'], default: null }, // Nullable with default
    // 'email' removed — old data still has it, reader ignores it
  ],
};

Avro Schema Resolution

typescript
interface AvroField {
  name: string;
  type: string | string[] | Record<string, unknown>;
  default?: unknown;
  aliases?: string[];
}

interface AvroSchema {
  type: 'record';
  name: string;
  fields: AvroField[];
}

class AvroSchemaResolver {
  /**
   * Resolve differences between writer and reader schemas.
   * Returns a resolution plan for each field.
   */
  resolve(
    writerSchema: AvroSchema,
    readerSchema: AvroSchema,
  ): FieldResolution[] {
    const resolutions: FieldResolution[] = [];

    for (const readerField of readerSchema.fields) {
      const writerField = this.findWriterField(readerField, writerSchema);

      if (writerField) {
        resolutions.push({
          fieldName: readerField.name,
          action: 'read_from_writer',
          writerFieldName: writerField.name,
        });
      } else if (readerField.default !== undefined) {
        resolutions.push({
          fieldName: readerField.name,
          action: 'use_default',
          defaultValue: readerField.default,
        });
      } else {
        resolutions.push({
          fieldName: readerField.name,
          action: 'error',
          errorMessage: `No matching field in writer and no default for '${readerField.name}'`,
        });
      }
    }

    // Writer fields not in reader are simply ignored
    return resolutions;
  }

  private findWriterField(
    readerField: AvroField,
    writerSchema: AvroSchema,
  ): AvroField | undefined {
    // Match by name
    const byName = writerSchema.fields.find(
      (f) => f.name === readerField.name,
    );
    if (byName) return byName;

    // Match by alias
    if (readerField.aliases) {
      for (const alias of readerField.aliases) {
        const byAlias = writerSchema.fields.find((f) => f.name === alias);
        if (byAlias) return byAlias;
      }
    }

    return undefined;
  }
}

interface FieldResolution {
  fieldName: string;
  action: 'read_from_writer' | 'use_default' | 'error';
  writerFieldName?: string;
  defaultValue?: unknown;
  errorMessage?: string;
}

Protocol Buffers Evolution Rules

Protobuf uses field numbers for identification, making field names irrelevant for wire format compatibility.

ChangeSafeUnsafe
Add new fieldYes (new number)If number reused
Remove fieldYes (mark reserved)If number reused
Rename fieldYes (numbers matter, not names)N/A
Change field typeSome (int32 <-> int64)Most type changes
Change field numberNeverAlways breaks
protobuf
// v1
message User {
  string id = 1;
  string name = 2;
  string email = 3;
}

// v2 (compatible evolution)
message User {
  string id = 1;
  string name = 2;
  // Field 3 removed — mark as reserved
  reserved 3;
  reserved "email";
  int32 age = 4;      // New field
  string phone = 5;   // New field
}

// v3 (DANGEROUS — reuses field number 3!)
message User {
  string id = 1;
  string name = 2;
  string address = 3; // WRONG: reuses field number 3 (was email)
  int32 age = 4;
}

DANGER

Never reuse field numbers in Protobuf. Old data encoded with email at field 3 will be decoded as address at field 3, producing corrupted data. Always use reserved to retire field numbers permanently.

JSON Schema Evolution

JSON is the most permissive format — no built-in evolution rules. You must enforce them yourself:

typescript
interface JSONSchemaEvolutionChecker {
  checkBackwardCompatibility(
    oldSchema: JSONSchema,
    newSchema: JSONSchema,
  ): CompatibilityResult;
}

interface JSONSchema {
  type: string;
  properties: Record<string, { type: string; required?: boolean; default?: unknown }>;
  required: string[];
}

interface CompatibilityResult {
  compatible: boolean;
  issues: string[];
}

class JSONSchemaEvolution implements JSONSchemaEvolutionChecker {
  checkBackwardCompatibility(
    oldSchema: JSONSchema,
    newSchema: JSONSchema,
  ): CompatibilityResult {
    const issues: string[] = [];

    // Check 1: No required field removed
    for (const field of oldSchema.required) {
      if (!newSchema.properties[field]) {
        issues.push(
          `Required field '${field}' was removed (breaks backward compat)`,
        );
      }
    }

    // Check 2: No new required fields without defaults
    for (const field of newSchema.required) {
      if (!oldSchema.properties[field] && !newSchema.properties[field]?.default) {
        issues.push(
          `New required field '${field}' has no default (breaks backward compat)`,
        );
      }
    }

    // Check 3: No type changes
    for (const [field, spec] of Object.entries(newSchema.properties)) {
      const oldSpec = oldSchema.properties[field];
      if (oldSpec && oldSpec.type !== spec.type) {
        issues.push(
          `Field '${field}' type changed from '${oldSpec.type}' to '${spec.type}'`,
        );
      }
    }

    return {
      compatible: issues.length === 0,
      issues,
    };
  }
}

Schema Registry

Architecture

Implementation

typescript
interface SchemaRegistryClient {
  register(subject: string, schema: AvroSchema): Promise<number>;
  getById(id: number): Promise<AvroSchema>;
  getLatest(subject: string): Promise<{ id: number; schema: AvroSchema }>;
  checkCompatibility(
    subject: string,
    schema: AvroSchema,
  ): Promise<{ compatible: boolean; messages: string[] }>;
  setCompatibility(
    subject: string,
    level: CompatibilityLevel,
  ): Promise<void>;
}

type CompatibilityLevel =
  | 'NONE'
  | 'BACKWARD'
  | 'BACKWARD_TRANSITIVE'
  | 'FORWARD'
  | 'FORWARD_TRANSITIVE'
  | 'FULL'
  | 'FULL_TRANSITIVE';

class SchemaRegistry implements SchemaRegistryClient {
  private schemas: Map<number, AvroSchema> = new Map();
  private subjects: Map<string, Array<{ id: number; version: number; schema: AvroSchema }>> =
    new Map();
  private compatibility: Map<string, CompatibilityLevel> = new Map();
  private nextId = 1;

  async register(subject: string, schema: AvroSchema): Promise<number> {
    // Check compatibility before registering
    const existing = this.subjects.get(subject) ?? [];
    const level = this.compatibility.get(subject) ?? 'BACKWARD';

    if (existing.length > 0 && level !== 'NONE') {
      const compatible = await this.checkCompatibility(subject, schema);
      if (!compatible.compatible) {
        throw new Error(
          `Schema incompatible: ${compatible.messages.join(', ')}`,
        );
      }
    }

    // Check if this exact schema already exists
    const existingEntry = existing.find(
      (e) => JSON.stringify(e.schema) === JSON.stringify(schema),
    );
    if (existingEntry) return existingEntry.id;

    const id = this.nextId++;
    this.schemas.set(id, schema);

    const versions = this.subjects.get(subject) ?? [];
    versions.push({ id, version: versions.length + 1, schema });
    this.subjects.set(subject, versions);

    return id;
  }

  async getById(id: number): Promise<AvroSchema> {
    const schema = this.schemas.get(id);
    if (!schema) throw new Error(`Schema not found: ${id}`);
    return schema;
  }

  async getLatest(
    subject: string,
  ): Promise<{ id: number; schema: AvroSchema }> {
    const versions = this.subjects.get(subject);
    if (!versions || versions.length === 0) {
      throw new Error(`Subject not found: ${subject}`);
    }
    const latest = versions[versions.length - 1];
    return { id: latest.id, schema: latest.schema };
  }

  async checkCompatibility(
    subject: string,
    schema: AvroSchema,
  ): Promise<{ compatible: boolean; messages: string[] }> {
    const level = this.compatibility.get(subject) ?? 'BACKWARD';
    const versions = this.subjects.get(subject) ?? [];

    if (versions.length === 0) {
      return { compatible: true, messages: [] };
    }

    const messages: string[] = [];
    const toCheck =
      level.includes('TRANSITIVE') ? versions : [versions[versions.length - 1]];

    for (const existing of toCheck) {
      const result = this.checkPairCompatibility(
        existing.schema,
        schema,
        level,
      );
      messages.push(...result.messages);
    }

    return { compatible: messages.length === 0, messages };
  }

  private checkPairCompatibility(
    oldSchema: AvroSchema,
    newSchema: AvroSchema,
    level: CompatibilityLevel,
  ): { messages: string[] } {
    const messages: string[] = [];

    if (level.startsWith('BACKWARD') || level === 'FULL' || level === 'FULL_TRANSITIVE') {
      // New reader must be able to read old data
      for (const newField of newSchema.fields) {
        const oldField = oldSchema.fields.find((f) => f.name === newField.name);
        if (!oldField && newField.default === undefined) {
          messages.push(
            `New field '${newField.name}' has no default (backward incompatible)`,
          );
        }
      }
    }

    if (level.startsWith('FORWARD') || level === 'FULL' || level === 'FULL_TRANSITIVE') {
      // Old reader must be able to read new data
      for (const oldField of oldSchema.fields) {
        const newField = newSchema.fields.find((f) => f.name === oldField.name);
        if (!newField && oldField.default === undefined) {
          messages.push(
            `Removed field '${oldField.name}' has no default in old schema (forward incompatible)`,
          );
        }
      }
    }

    return { messages };
  }

  async setCompatibility(
    subject: string,
    level: CompatibilityLevel,
  ): Promise<void> {
    this.compatibility.set(subject, level);
  }
}

Subject Naming Strategies

StrategyPatternUse Case
TopicNameorders-valueOne schema per topic
RecordNamecom.company.OrderMultiple event types per topic
TopicRecordNameorders-com.company.OrderFull isolation

Migration Strategies

Online Schema Migration (Zero Downtime)

Dual-Write Migration

For database schema changes that can't be done in a single step:

typescript
interface DualWriteMigrator<T, U> {
  // Phase 1: Write to both old and new schema
  writeToOld(data: T): Promise<void>;
  writeToNew(data: U): Promise<void>;

  // Phase 2: Backfill old data into new schema
  backfill(batchSize: number): Promise<number>;

  // Phase 3: Verify consistency
  verify(sampleSize: number): Promise<VerificationResult>;

  // Phase 4: Switch reads to new schema
  switchReads(): Promise<void>;

  // Phase 5: Stop writes to old schema
  stopOldWrites(): Promise<void>;
}

interface VerificationResult {
  totalChecked: number;
  mismatches: number;
  mismatchExamples: Array<{ id: string; oldValue: unknown; newValue: unknown }>;
}

class ColumnAdditionMigrator {
  constructor(
    private readonly db: Database,
    private readonly table: string,
    private readonly newColumn: string,
    private readonly defaultValue: unknown,
  ) {}

  async migrate(): Promise<void> {
    // Phase 1: Add nullable column
    await this.db.query(
      `ALTER TABLE ${this.table} ADD COLUMN ${this.newColumn} TEXT`,
      [],
    );

    // Phase 2: Backfill default values
    let updated = 0;
    do {
      const result = await this.db.query(
        `UPDATE ${this.table}
         SET ${this.newColumn} = $1
         WHERE ${this.newColumn} IS NULL
         LIMIT 10000`,
        [this.defaultValue],
      );
      updated = result.rowCount;
      // Yield to avoid blocking other operations
      await new Promise((resolve) => setTimeout(resolve, 100));
    } while (updated > 0);

    // Phase 3: Add NOT NULL constraint (if needed)
    await this.db.query(
      `ALTER TABLE ${this.table}
       ALTER COLUMN ${this.newColumn} SET NOT NULL`,
      [],
    );
  }
}

interface Database {
  query(sql: string, params: unknown[]): Promise<{ rowCount: number }>;
}

Expand-Contract Pattern

The safest approach for breaking changes:

Performance Characteristics

Schema Registry Overhead

OperationLatencyFrequency
Register schema10-50msOnce per deployment
Get schema by ID1-5ms (cached)Once per consumer startup
Compatibility check5-20msOnce per registration

Schemas are cached client-side, so the per-message overhead is zero after the initial fetch.

Serialization Format Comparison

FormatSchema EvolutionEncoding SizeEncode SpeedDecode Speed
JSONManualLarge (100%)FastFast
AvroExcellentSmall (30-50%)MediumMedium
ProtobufGoodSmallest (20-40%)FastestFastest
ThriftGoodSmall (30-50%)FastFast
MessagePackNoneMedium (60-80%)FastFast

Schema Evolution Impact on Storage

Storage with evolution=v=1V|Dv|×row_size(v)

Where V is the number of schema versions and |Dv| is the amount of data written with version v.

Avro handles this efficiently because the schema is stored once in the Schema Registry, not with every record. The per-record overhead is just the schema ID (4 bytes).

Edge Cases & Failure Modes

Breaking Change Detection

typescript
class BreakingChangeDetector {
  detectBreakingChanges(
    oldSchema: AvroSchema,
    newSchema: AvroSchema,
  ): BreakingChange[] {
    const changes: BreakingChange[] = [];

    // Removed fields without defaults
    for (const oldField of oldSchema.fields) {
      const newField = newSchema.fields.find((f) => f.name === oldField.name);
      if (!newField) {
        changes.push({
          type: 'field_removed',
          field: oldField.name,
          severity: oldField.default !== undefined ? 'warning' : 'breaking',
          message: `Field '${oldField.name}' was removed`,
        });
      }
    }

    // Type changes
    for (const newField of newSchema.fields) {
      const oldField = oldSchema.fields.find((f) => f.name === newField.name);
      if (oldField && JSON.stringify(oldField.type) !== JSON.stringify(newField.type)) {
        changes.push({
          type: 'type_changed',
          field: newField.name,
          severity: 'breaking',
          message: `Field '${newField.name}' type changed from ${JSON.stringify(oldField.type)} to ${JSON.stringify(newField.type)}`,
        });
      }
    }

    // Required field added without default
    for (const newField of newSchema.fields) {
      const oldField = oldSchema.fields.find((f) => f.name === newField.name);
      if (!oldField && newField.default === undefined) {
        changes.push({
          type: 'required_field_added',
          field: newField.name,
          severity: 'breaking',
          message: `New field '${newField.name}' has no default value`,
        });
      }
    }

    return changes;
  }
}

interface BreakingChange {
  type: string;
  field: string;
  severity: 'info' | 'warning' | 'breaking';
  message: string;
}

Schema Registry Unavailability

If the Schema Registry is down, producers cannot register new schemas and consumers cannot fetch unknown schemas:

typescript
class ResilientSchemaRegistryClient {
  private cache: Map<number, AvroSchema> = new Map();
  private subjectCache: Map<string, { id: number; schema: AvroSchema }> = new Map();

  constructor(
    private readonly client: SchemaRegistryClient,
    private readonly fallbackSchemas: Map<number, AvroSchema>,
  ) {}

  async getById(id: number): Promise<AvroSchema> {
    // Check local cache first
    const cached = this.cache.get(id);
    if (cached) return cached;

    try {
      const schema = await this.client.getById(id);
      this.cache.set(id, schema);
      return schema;
    } catch (error) {
      // Fall back to pre-loaded schemas
      const fallback = this.fallbackSchemas.get(id);
      if (fallback) {
        console.warn(
          `Schema Registry unavailable, using fallback for schema ${id}`,
        );
        return fallback;
      }
      throw new Error(
        `Schema Registry unavailable and no fallback for schema ${id}`,
      );
    }
  }
}

Transitive Compatibility Violations

Schema v1 is compatible with v2, and v2 is compatible with v3, but v1 may NOT be compatible with v3:

v1: { name: string, email: string }
v2: { name: string, email: string, age: int (default=0) }  -- backward compat with v1
v3: { name: string, age: int }                               -- backward compat with v2

But v3 is NOT backward compatible with v1!
(v1 data has 'email' which v3 ignores — OK
 v1 data lacks 'age' which v3 expects — v3 has default for age — OK in Avro)

Actually this specific example IS compatible. The real problem is:
v1: { name: string, email: string }
v2: { name: string, email: string, age: int (default=0) }
v3: { name: string, age: int (NO DEFAULT) }  -- compatible with v2 but not v1

This is why BACKWARD_TRANSITIVE exists — it checks compatibility against ALL versions, not just the latest.

Mathematical Foundations

Schema Lattice

Schemas under the subtyping relation form a lattice:

S1S2every valid instance of S1 is also valid under S2

The lattice has:

  • Top element: the "any" type (accepts everything)
  • Bottom element: the "empty" type (accepts nothing)

Schema evolution is movement within this lattice:

  • Backward compatible: moving up (becoming more permissive)
  • Forward compatible: moving down (becoming more restrictive)
  • Full compatible: moving to a comparable element

Information Theory Perspective

Schema information content:I(S)=ttypesp(t)logp(t)

More restrictive schemas have higher information content (they tell you more about the data). Schema evolution that adds optional fields increases the type space, reducing information density per field.

Real-World War Stories

War Story

The Protobuf Field Number Reuse Disaster

A team removed a user_email field (field number 5) from a Protobuf message and later added a user_phone field at field number 5. Old messages stored in Kafka still had email data at field 5.

When consumers read these old messages, they decoded the email bytes as a phone number. Some "phone numbers" were valid enough to not cause errors but contained email addresses, which were then sent via SMS as part of an identity verification flow.

Over 10,000 customers received SMS messages to email addresses (which obviously failed). The incident required a full topic rewrite.

Fix:

  1. Used reserved 5; to permanently retire the field number
  2. Added a CI check that compares current Protobuf files against the Schema Registry to detect field number reuse
  3. All field removals now go through a review that checks the field number is reserved

War Story

The Avro Default Value Trap

A team added a new field risk_score to their user event schema with "default": 0. All existing consumers were backward compatible — they'd read 0 for old events.

The problem: the risk model that populated risk_score treated 0 as "lowest risk." Old events (pre-upgrade) were treated as lowest risk, skewing the risk distribution and causing the fraud detection system to miss high-risk historical patterns.

Lesson: The default value must be semantically meaningful. Use null (with a union type ["null", "int"]) when the value is truly unknown, not 0.

Decision Framework

Compatibility Level Selection

ScenarioRecommended Level
Internal microservices, single teamBACKWARD
Shared platform, multiple teamsFULL_TRANSITIVE
Public API eventsFULL_TRANSITIVE
Log/metrics data (read by many)FORWARD
Single consumer, controlled upgradeBACKWARD
Long-lived topics (years of data)FULL_TRANSITIVE

Schema Format Selection

FactorAvroProtobufJSON
Schema evolution supportBestGoodManual
Human readabilityMediumLowBest
Encoding efficiencyGoodBestWorst
Language supportGoodBestUniversal
Schema Registry integrationNativeGoodLimited
Learning curveMediumMediumLow

Advanced Topics

Data Contracts

Schema evolution is evolving into data contracts — formal agreements between producers and consumers:

typescript
interface DataContract {
  apiVersion: string;
  kind: 'DataContract';
  metadata: {
    name: string;
    version: string;
    owner: string;
    team: string;
  };
  schema: {
    type: 'avro' | 'protobuf' | 'json';
    specification: unknown; // The actual schema
  };
  quality: {
    checks: Array<{
      name: string;
      type: 'not_null' | 'unique' | 'range' | 'regex' | 'freshness';
      field?: string;
      parameters?: Record<string, unknown>;
    }>;
  };
  sla: {
    freshness: string;       // e.g., "PT5M" (5 minutes)
    availability: number;    // e.g., 0.999
    completeness: number;    // e.g., 0.99
  };
  compatibility: CompatibilityLevel;
}

Schema Evolution in Data Lakes

Parquet/Delta Lake/Iceberg handle schema evolution at the file level:

typescript
interface TableSchemaEvolution {
  // Adding columns
  addColumn(name: string, type: string, position?: 'first' | 'after'): void;

  // Renaming columns
  renameColumn(oldName: string, newName: string): void;

  // Changing types (limited: widening only)
  widenType(column: string, newType: string): void;

  // Partition evolution (Iceberg-specific)
  evolvePartition(oldSpec: PartitionSpec, newSpec: PartitionSpec): void;
}

interface PartitionSpec {
  fields: Array<{
    sourceColumn: string;
    transform: 'identity' | 'year' | 'month' | 'day' | 'hour' | 'bucket' | 'truncate';
    parameters?: Record<string, unknown>;
  }>;
}

Apache Iceberg's schema evolution is the most advanced — it supports adding, dropping, renaming, reordering, and type-widening columns without rewriting existing data files.

CI/CD Schema Validation

typescript
class SchemaCICheck {
  async validateSchemaChange(
    oldSchemaPath: string,
    newSchemaPath: string,
    compatibilityLevel: CompatibilityLevel,
  ): Promise<{ pass: boolean; report: string }> {
    const oldSchema = await this.loadSchema(oldSchemaPath);
    const newSchema = await this.loadSchema(newSchemaPath);

    const detector = new BreakingChangeDetector();
    const changes = detector.detectBreakingChanges(oldSchema, newSchema);

    const breakingChanges = changes.filter((c) => c.severity === 'breaking');

    if (breakingChanges.length > 0) {
      return {
        pass: false,
        report: [
          'SCHEMA COMPATIBILITY CHECK FAILED',
          `Compatibility level: ${compatibilityLevel}`,
          '',
          'Breaking changes detected:',
          ...breakingChanges.map(
            (c) => `  - [${c.type}] ${c.message}`,
          ),
          '',
          'To fix:',
          '  - Add default values to new required fields',
          '  - Use reserved field numbers instead of removing fields',
          '  - Use type widening instead of type changes',
        ].join('\n'),
      };
    }

    return {
      pass: true,
      report: `Schema change compatible (${changes.length} non-breaking changes)`,
    };
  }

  private async loadSchema(_path: string): Promise<AvroSchema> {
    // Load and parse schema from file
    return { type: 'record', name: '', fields: [] };
  }
}

Cross-References


Key Takeaway

  • Schema evolution requires compatibility guarantees: backward-compatible changes (adding optional fields) are safe; breaking changes (removing fields, changing types) require migration strategies.
  • Schema registries (Confluent, AWS Glue) enforce compatibility rules at write time, preventing producers from breaking consumers.
  • Avro excels at schema evolution (fields matched by name, defaults for missing fields); Protobuf is strong with field numbers; JSON has no built-in evolution support.
Exercise

Plan a Schema Evolution for a Production Event Stream

Your team produces a UserSignup event to Kafka, consumed by 5 different services. The current Avro schema is:

json
{"type": "record", "name": "UserSignup", "fields": [
  {"name": "user_id", "type": "string"},
  {"name": "email", "type": "string"},
  {"name": "country", "type": "string"},
  {"name": "signup_timestamp", "type": "long"}
]}

You need to make these changes:

  1. Add an optional referral_code field
  2. Rename country to country_code
  3. Change signup_timestamp from long to string (ISO 8601)
  4. Remove the email field (GDPR requirement)

Classify each change as backward-compatible, forward-compatible, or breaking. Propose a migration strategy.

Solution
  1. Add referral_code (optional with default): BACKWARD COMPATIBLE. Old consumers ignore it, new consumers read it. Safe to deploy immediately.

    json
    {"name": "referral_code", "type": ["null", "string"], "default": null}
  2. Rename country to country_code: BREAKING. Avro matches fields by name. Renaming is seen as deleting country and adding country_code. Strategy: Use Avro aliases: {"name": "country_code", "aliases": ["country"]}. This maintains backward compatibility.

  3. Change signup_timestamp type: BREAKING. Changing long to string is not compatible. Strategy: Add a NEW field signup_timestamp_iso (string) while keeping the old long field. Deprecate the old field, and remove it after all consumers migrate (may take months).

  4. Remove email: BREAKING for consumers that read email. Strategy: Phase out over 3 months: (a) Add email_hash field, (b) Notify all 5 consumer teams, (c) Set email default to empty string, (d) Stop populating email, (e) Remove field after all consumers confirm migration.

Registry config: Set compatibility level to BACKWARD_TRANSITIVE to prevent accidental breaking changes.

Common Misconceptions

  • "Adding a field is always safe." Only if the field has a default value. Adding a required field without a default breaks all existing consumers.
  • "JSON schemas evolve naturally." JSON has no built-in schema evolution. Adding/removing fields silently succeeds, but consumers crash on unexpected nulls or missing fields at runtime.
  • "Schema registries are optional overhead." Without a registry, producers can push any schema change, breaking downstream consumers at runtime. Registries catch incompatible changes at deploy time.
  • "Renaming a field is a simple change." In Avro and Protobuf, field identity is by name or number. Renaming is a delete + add, which is a breaking change unless aliases are used.
  • "You can change a field's type freely." Type changes (int to string, string to array) are almost always breaking. The safe pattern is to add a new field with the new type and deprecate the old one.

In Production

  • LinkedIn uses a centralized schema registry (the origin of Confluent Schema Registry) to manage schema evolution across 10,000+ Kafka topics, with BACKWARD_TRANSITIVE compatibility enforced by default.
  • Uber uses Protobuf for all internal event schemas with strict field number management -- field numbers are never reused, and deprecated fields are reserved to prevent accidental conflicts.
  • Netflix maintains a schema governance process where any schema change to a shared event requires review by both the producing and consuming teams before deployment.
  • Airbnb uses Avro with their internal schema evolution tool that auto-generates migration code when schemas change, testing backward and forward compatibility as part of CI/CD.
Quiz

1. What makes a schema change "backward compatible"?

A) Old producers can write data that new consumers can read B) New consumers can read data written by old producers (old data is still valid under the new schema) C) Both producers and consumers can be updated simultaneously D) The schema change does not require a database migration

Answer

B) Backward compatibility means consumers using the new schema can read data produced with the old schema. Adding optional fields with defaults is backward compatible. Removing required fields is not.

2. How does Avro handle missing fields during deserialization?

A) It throws an error B) It uses the default value specified in the reader's schema for fields missing from the writer's data C) It fills them with null regardless of type D) It skips the entire record

Answer

B) Avro resolves schemas by matching field names between the writer's and reader's schemas. If the reader's schema has a field not in the writer's data, it uses the field's default value. If no default is defined, deserialization fails.

3. Why are field numbers important in Protobuf?

A) They determine sort order B) They are the field's identity -- fields are matched by number, not name, so numbers must be stable and never reused C) They set the maximum number of fields D) They are used for encryption

Answer

B) Protobuf serializes fields by number, not name. Renaming a field is safe (the number stays the same), but reusing a deleted field's number would cause old data to be misinterpreted. Use reserved to prevent number reuse.

4. What compatibility level should a production schema registry enforce?

A) NONE (allow any change) B) BACKWARD_TRANSITIVE (new schema can read all historical data) C) FULL (any version can read any other version) D) It depends on the use case

Answer

D) BACKWARD (or BACKWARD_TRANSITIVE) is the most common choice for Kafka consumers. FORWARD compatibility is needed when producers upgrade before consumers. FULL is the safest but most restrictive. The choice depends on your deployment model.

5. What is the safe pattern for changing a field's data type?

A) Modify the field's type in place B) Add a new field with the new type, populate both during a transition period, then deprecate the old field C) Delete the old field and add a new one with the same name D) Type changes are always backward compatible

Answer

B) The dual-field pattern: add field_v2 with the new type, produce both field and field_v2 during transition, migrate all consumers to field_v2, then deprecate field. This avoids breaking any consumer.

:::


One-Liner Summary: Schema evolution is the art of changing data structures without breaking producers or consumers -- add fields with defaults, never reuse identifiers, and enforce compatibility with a schema registry.

"What I cannot create, I do not understand." — Richard Feynman