Schema Evolution
Why Schema Evolution Exists
Data schemas change. Business requirements evolve, new features need new fields, old fields become irrelevant, and data types need correction. In a monolithic batch system, you can stop everything, migrate the schema, and restart. In distributed streaming systems with multiple producers and consumers at different versions, this luxury doesn't exist.
Schema evolution is the discipline of changing data structures while maintaining compatibility between producers and consumers at different versions.
The Core Problem
Producer v2 writes data with a new field. Consumer v1 doesn't know about this field. Consumer v3 expects a field that was removed in v2. Can everyone coexist?
Historical Context
- 2009: Apache Avro designed with schema evolution as a first-class feature
- 2011: Google's Protocol Buffers v2 formalized field numbering for evolution
- 2014: Confluent Schema Registry introduced for Kafka
- 2020s: Schema evolution is now considered essential infrastructure for data platforms
- 2025: Event-driven architectures and data contracts make schema evolution a critical concern
First Principles
Compatibility Types
There are four types of schema compatibility:
| Type | Producer | Consumer | Rule |
|---|---|---|---|
| Backward | Old schema | New schema | New consumers can read old data |
| Forward | New schema | Old schema | Old consumers can read new data |
| Full | Both directions | Both directions | Backward AND Forward |
| None | No guarantee | No guarantee | Anything goes (dangerous) |
Additionally, transitive variants check compatibility across ALL previous versions, not just the immediately preceding one.
The Compatibility Contract
Formally, schema
Schema
Schema Evolution Rules by Format
Avro Evolution Rules
Avro uses a writer schema (used to encode) and a reader schema (used to decode). The reader must resolve differences.
| Change | Backward | Forward | Full |
|---|---|---|---|
| Add field with default | Yes | Yes | Yes |
| Add field without default | No | Yes | No |
| Remove field with default | Yes | No | No |
| Remove field without default | No | No | No |
| Rename field (with alias) | Yes | Yes | Yes |
| Change type (promotable) | Depends | Depends | Depends |
// Avro schema evolution example
const schemaV1 = {
type: 'record',
name: 'User',
fields: [
{ name: 'id', type: 'string' },
{ name: 'name', type: 'string' },
{ name: 'email', type: 'string' },
],
};
// V2: Added 'age' with default (backward + forward compatible)
const schemaV2 = {
type: 'record',
name: 'User',
fields: [
{ name: 'id', type: 'string' },
{ name: 'name', type: 'string' },
{ name: 'email', type: 'string' },
{ name: 'age', type: 'int', default: 0 }, // New field WITH default
],
};
// V3: Removed 'email', added 'phone' (ONLY backward compatible)
const schemaV3 = {
type: 'record',
name: 'User',
fields: [
{ name: 'id', type: 'string' },
{ name: 'name', type: 'string' },
{ name: 'age', type: 'int', default: 0 },
{ name: 'phone', type: ['null', 'string'], default: null }, // Nullable with default
// 'email' removed — old data still has it, reader ignores it
],
};Avro Schema Resolution
interface AvroField {
name: string;
type: string | string[] | Record<string, unknown>;
default?: unknown;
aliases?: string[];
}
interface AvroSchema {
type: 'record';
name: string;
fields: AvroField[];
}
class AvroSchemaResolver {
/**
* Resolve differences between writer and reader schemas.
* Returns a resolution plan for each field.
*/
resolve(
writerSchema: AvroSchema,
readerSchema: AvroSchema,
): FieldResolution[] {
const resolutions: FieldResolution[] = [];
for (const readerField of readerSchema.fields) {
const writerField = this.findWriterField(readerField, writerSchema);
if (writerField) {
resolutions.push({
fieldName: readerField.name,
action: 'read_from_writer',
writerFieldName: writerField.name,
});
} else if (readerField.default !== undefined) {
resolutions.push({
fieldName: readerField.name,
action: 'use_default',
defaultValue: readerField.default,
});
} else {
resolutions.push({
fieldName: readerField.name,
action: 'error',
errorMessage: `No matching field in writer and no default for '${readerField.name}'`,
});
}
}
// Writer fields not in reader are simply ignored
return resolutions;
}
private findWriterField(
readerField: AvroField,
writerSchema: AvroSchema,
): AvroField | undefined {
// Match by name
const byName = writerSchema.fields.find(
(f) => f.name === readerField.name,
);
if (byName) return byName;
// Match by alias
if (readerField.aliases) {
for (const alias of readerField.aliases) {
const byAlias = writerSchema.fields.find((f) => f.name === alias);
if (byAlias) return byAlias;
}
}
return undefined;
}
}
interface FieldResolution {
fieldName: string;
action: 'read_from_writer' | 'use_default' | 'error';
writerFieldName?: string;
defaultValue?: unknown;
errorMessage?: string;
}Protocol Buffers Evolution Rules
Protobuf uses field numbers for identification, making field names irrelevant for wire format compatibility.
| Change | Safe | Unsafe |
|---|---|---|
| Add new field | Yes (new number) | If number reused |
| Remove field | Yes (mark reserved) | If number reused |
| Rename field | Yes (numbers matter, not names) | N/A |
| Change field type | Some (int32 <-> int64) | Most type changes |
| Change field number | Never | Always breaks |
// v1
message User {
string id = 1;
string name = 2;
string email = 3;
}
// v2 (compatible evolution)
message User {
string id = 1;
string name = 2;
// Field 3 removed — mark as reserved
reserved 3;
reserved "email";
int32 age = 4; // New field
string phone = 5; // New field
}
// v3 (DANGEROUS — reuses field number 3!)
message User {
string id = 1;
string name = 2;
string address = 3; // WRONG: reuses field number 3 (was email)
int32 age = 4;
}DANGER
Never reuse field numbers in Protobuf. Old data encoded with email at field 3 will be decoded as address at field 3, producing corrupted data. Always use reserved to retire field numbers permanently.
JSON Schema Evolution
JSON is the most permissive format — no built-in evolution rules. You must enforce them yourself:
interface JSONSchemaEvolutionChecker {
checkBackwardCompatibility(
oldSchema: JSONSchema,
newSchema: JSONSchema,
): CompatibilityResult;
}
interface JSONSchema {
type: string;
properties: Record<string, { type: string; required?: boolean; default?: unknown }>;
required: string[];
}
interface CompatibilityResult {
compatible: boolean;
issues: string[];
}
class JSONSchemaEvolution implements JSONSchemaEvolutionChecker {
checkBackwardCompatibility(
oldSchema: JSONSchema,
newSchema: JSONSchema,
): CompatibilityResult {
const issues: string[] = [];
// Check 1: No required field removed
for (const field of oldSchema.required) {
if (!newSchema.properties[field]) {
issues.push(
`Required field '${field}' was removed (breaks backward compat)`,
);
}
}
// Check 2: No new required fields without defaults
for (const field of newSchema.required) {
if (!oldSchema.properties[field] && !newSchema.properties[field]?.default) {
issues.push(
`New required field '${field}' has no default (breaks backward compat)`,
);
}
}
// Check 3: No type changes
for (const [field, spec] of Object.entries(newSchema.properties)) {
const oldSpec = oldSchema.properties[field];
if (oldSpec && oldSpec.type !== spec.type) {
issues.push(
`Field '${field}' type changed from '${oldSpec.type}' to '${spec.type}'`,
);
}
}
return {
compatible: issues.length === 0,
issues,
};
}
}Schema Registry
Architecture
Implementation
interface SchemaRegistryClient {
register(subject: string, schema: AvroSchema): Promise<number>;
getById(id: number): Promise<AvroSchema>;
getLatest(subject: string): Promise<{ id: number; schema: AvroSchema }>;
checkCompatibility(
subject: string,
schema: AvroSchema,
): Promise<{ compatible: boolean; messages: string[] }>;
setCompatibility(
subject: string,
level: CompatibilityLevel,
): Promise<void>;
}
type CompatibilityLevel =
| 'NONE'
| 'BACKWARD'
| 'BACKWARD_TRANSITIVE'
| 'FORWARD'
| 'FORWARD_TRANSITIVE'
| 'FULL'
| 'FULL_TRANSITIVE';
class SchemaRegistry implements SchemaRegistryClient {
private schemas: Map<number, AvroSchema> = new Map();
private subjects: Map<string, Array<{ id: number; version: number; schema: AvroSchema }>> =
new Map();
private compatibility: Map<string, CompatibilityLevel> = new Map();
private nextId = 1;
async register(subject: string, schema: AvroSchema): Promise<number> {
// Check compatibility before registering
const existing = this.subjects.get(subject) ?? [];
const level = this.compatibility.get(subject) ?? 'BACKWARD';
if (existing.length > 0 && level !== 'NONE') {
const compatible = await this.checkCompatibility(subject, schema);
if (!compatible.compatible) {
throw new Error(
`Schema incompatible: ${compatible.messages.join(', ')}`,
);
}
}
// Check if this exact schema already exists
const existingEntry = existing.find(
(e) => JSON.stringify(e.schema) === JSON.stringify(schema),
);
if (existingEntry) return existingEntry.id;
const id = this.nextId++;
this.schemas.set(id, schema);
const versions = this.subjects.get(subject) ?? [];
versions.push({ id, version: versions.length + 1, schema });
this.subjects.set(subject, versions);
return id;
}
async getById(id: number): Promise<AvroSchema> {
const schema = this.schemas.get(id);
if (!schema) throw new Error(`Schema not found: ${id}`);
return schema;
}
async getLatest(
subject: string,
): Promise<{ id: number; schema: AvroSchema }> {
const versions = this.subjects.get(subject);
if (!versions || versions.length === 0) {
throw new Error(`Subject not found: ${subject}`);
}
const latest = versions[versions.length - 1];
return { id: latest.id, schema: latest.schema };
}
async checkCompatibility(
subject: string,
schema: AvroSchema,
): Promise<{ compatible: boolean; messages: string[] }> {
const level = this.compatibility.get(subject) ?? 'BACKWARD';
const versions = this.subjects.get(subject) ?? [];
if (versions.length === 0) {
return { compatible: true, messages: [] };
}
const messages: string[] = [];
const toCheck =
level.includes('TRANSITIVE') ? versions : [versions[versions.length - 1]];
for (const existing of toCheck) {
const result = this.checkPairCompatibility(
existing.schema,
schema,
level,
);
messages.push(...result.messages);
}
return { compatible: messages.length === 0, messages };
}
private checkPairCompatibility(
oldSchema: AvroSchema,
newSchema: AvroSchema,
level: CompatibilityLevel,
): { messages: string[] } {
const messages: string[] = [];
if (level.startsWith('BACKWARD') || level === 'FULL' || level === 'FULL_TRANSITIVE') {
// New reader must be able to read old data
for (const newField of newSchema.fields) {
const oldField = oldSchema.fields.find((f) => f.name === newField.name);
if (!oldField && newField.default === undefined) {
messages.push(
`New field '${newField.name}' has no default (backward incompatible)`,
);
}
}
}
if (level.startsWith('FORWARD') || level === 'FULL' || level === 'FULL_TRANSITIVE') {
// Old reader must be able to read new data
for (const oldField of oldSchema.fields) {
const newField = newSchema.fields.find((f) => f.name === oldField.name);
if (!newField && oldField.default === undefined) {
messages.push(
`Removed field '${oldField.name}' has no default in old schema (forward incompatible)`,
);
}
}
}
return { messages };
}
async setCompatibility(
subject: string,
level: CompatibilityLevel,
): Promise<void> {
this.compatibility.set(subject, level);
}
}Subject Naming Strategies
| Strategy | Pattern | Use Case |
|---|---|---|
| TopicName | orders-value | One schema per topic |
| RecordName | com.company.Order | Multiple event types per topic |
| TopicRecordName | orders-com.company.Order | Full isolation |
Migration Strategies
Online Schema Migration (Zero Downtime)
Dual-Write Migration
For database schema changes that can't be done in a single step:
interface DualWriteMigrator<T, U> {
// Phase 1: Write to both old and new schema
writeToOld(data: T): Promise<void>;
writeToNew(data: U): Promise<void>;
// Phase 2: Backfill old data into new schema
backfill(batchSize: number): Promise<number>;
// Phase 3: Verify consistency
verify(sampleSize: number): Promise<VerificationResult>;
// Phase 4: Switch reads to new schema
switchReads(): Promise<void>;
// Phase 5: Stop writes to old schema
stopOldWrites(): Promise<void>;
}
interface VerificationResult {
totalChecked: number;
mismatches: number;
mismatchExamples: Array<{ id: string; oldValue: unknown; newValue: unknown }>;
}
class ColumnAdditionMigrator {
constructor(
private readonly db: Database,
private readonly table: string,
private readonly newColumn: string,
private readonly defaultValue: unknown,
) {}
async migrate(): Promise<void> {
// Phase 1: Add nullable column
await this.db.query(
`ALTER TABLE ${this.table} ADD COLUMN ${this.newColumn} TEXT`,
[],
);
// Phase 2: Backfill default values
let updated = 0;
do {
const result = await this.db.query(
`UPDATE ${this.table}
SET ${this.newColumn} = $1
WHERE ${this.newColumn} IS NULL
LIMIT 10000`,
[this.defaultValue],
);
updated = result.rowCount;
// Yield to avoid blocking other operations
await new Promise((resolve) => setTimeout(resolve, 100));
} while (updated > 0);
// Phase 3: Add NOT NULL constraint (if needed)
await this.db.query(
`ALTER TABLE ${this.table}
ALTER COLUMN ${this.newColumn} SET NOT NULL`,
[],
);
}
}
interface Database {
query(sql: string, params: unknown[]): Promise<{ rowCount: number }>;
}Expand-Contract Pattern
The safest approach for breaking changes:
Performance Characteristics
Schema Registry Overhead
| Operation | Latency | Frequency |
|---|---|---|
| Register schema | 10-50ms | Once per deployment |
| Get schema by ID | 1-5ms (cached) | Once per consumer startup |
| Compatibility check | 5-20ms | Once per registration |
Schemas are cached client-side, so the per-message overhead is zero after the initial fetch.
Serialization Format Comparison
| Format | Schema Evolution | Encoding Size | Encode Speed | Decode Speed |
|---|---|---|---|---|
| JSON | Manual | Large (100%) | Fast | Fast |
| Avro | Excellent | Small (30-50%) | Medium | Medium |
| Protobuf | Good | Smallest (20-40%) | Fastest | Fastest |
| Thrift | Good | Small (30-50%) | Fast | Fast |
| MessagePack | None | Medium (60-80%) | Fast | Fast |
Schema Evolution Impact on Storage
Where
Avro handles this efficiently because the schema is stored once in the Schema Registry, not with every record. The per-record overhead is just the schema ID (4 bytes).
Edge Cases & Failure Modes
Breaking Change Detection
class BreakingChangeDetector {
detectBreakingChanges(
oldSchema: AvroSchema,
newSchema: AvroSchema,
): BreakingChange[] {
const changes: BreakingChange[] = [];
// Removed fields without defaults
for (const oldField of oldSchema.fields) {
const newField = newSchema.fields.find((f) => f.name === oldField.name);
if (!newField) {
changes.push({
type: 'field_removed',
field: oldField.name,
severity: oldField.default !== undefined ? 'warning' : 'breaking',
message: `Field '${oldField.name}' was removed`,
});
}
}
// Type changes
for (const newField of newSchema.fields) {
const oldField = oldSchema.fields.find((f) => f.name === newField.name);
if (oldField && JSON.stringify(oldField.type) !== JSON.stringify(newField.type)) {
changes.push({
type: 'type_changed',
field: newField.name,
severity: 'breaking',
message: `Field '${newField.name}' type changed from ${JSON.stringify(oldField.type)} to ${JSON.stringify(newField.type)}`,
});
}
}
// Required field added without default
for (const newField of newSchema.fields) {
const oldField = oldSchema.fields.find((f) => f.name === newField.name);
if (!oldField && newField.default === undefined) {
changes.push({
type: 'required_field_added',
field: newField.name,
severity: 'breaking',
message: `New field '${newField.name}' has no default value`,
});
}
}
return changes;
}
}
interface BreakingChange {
type: string;
field: string;
severity: 'info' | 'warning' | 'breaking';
message: string;
}Schema Registry Unavailability
If the Schema Registry is down, producers cannot register new schemas and consumers cannot fetch unknown schemas:
class ResilientSchemaRegistryClient {
private cache: Map<number, AvroSchema> = new Map();
private subjectCache: Map<string, { id: number; schema: AvroSchema }> = new Map();
constructor(
private readonly client: SchemaRegistryClient,
private readonly fallbackSchemas: Map<number, AvroSchema>,
) {}
async getById(id: number): Promise<AvroSchema> {
// Check local cache first
const cached = this.cache.get(id);
if (cached) return cached;
try {
const schema = await this.client.getById(id);
this.cache.set(id, schema);
return schema;
} catch (error) {
// Fall back to pre-loaded schemas
const fallback = this.fallbackSchemas.get(id);
if (fallback) {
console.warn(
`Schema Registry unavailable, using fallback for schema ${id}`,
);
return fallback;
}
throw new Error(
`Schema Registry unavailable and no fallback for schema ${id}`,
);
}
}
}Transitive Compatibility Violations
Schema v1 is compatible with v2, and v2 is compatible with v3, but v1 may NOT be compatible with v3:
v1: { name: string, email: string }
v2: { name: string, email: string, age: int (default=0) } -- backward compat with v1
v3: { name: string, age: int } -- backward compat with v2
But v3 is NOT backward compatible with v1!
(v1 data has 'email' which v3 ignores — OK
v1 data lacks 'age' which v3 expects — v3 has default for age — OK in Avro)
Actually this specific example IS compatible. The real problem is:
v1: { name: string, email: string }
v2: { name: string, email: string, age: int (default=0) }
v3: { name: string, age: int (NO DEFAULT) } -- compatible with v2 but not v1This is why BACKWARD_TRANSITIVE exists — it checks compatibility against ALL versions, not just the latest.
Mathematical Foundations
Schema Lattice
Schemas under the subtyping relation form a lattice:
The lattice has:
- Top element: the "any" type (accepts everything)
- Bottom element: the "empty" type (accepts nothing)
Schema evolution is movement within this lattice:
- Backward compatible: moving up (becoming more permissive)
- Forward compatible: moving down (becoming more restrictive)
- Full compatible: moving to a comparable element
Information Theory Perspective
More restrictive schemas have higher information content (they tell you more about the data). Schema evolution that adds optional fields increases the type space, reducing information density per field.
Real-World War Stories
War Story
The Protobuf Field Number Reuse Disaster
A team removed a user_email field (field number 5) from a Protobuf message and later added a user_phone field at field number 5. Old messages stored in Kafka still had email data at field 5.
When consumers read these old messages, they decoded the email bytes as a phone number. Some "phone numbers" were valid enough to not cause errors but contained email addresses, which were then sent via SMS as part of an identity verification flow.
Over 10,000 customers received SMS messages to email addresses (which obviously failed). The incident required a full topic rewrite.
Fix:
- Used
reserved 5;to permanently retire the field number - Added a CI check that compares current Protobuf files against the Schema Registry to detect field number reuse
- All field removals now go through a review that checks the field number is reserved
War Story
The Avro Default Value Trap
A team added a new field risk_score to their user event schema with "default": 0. All existing consumers were backward compatible — they'd read 0 for old events.
The problem: the risk model that populated risk_score treated 0 as "lowest risk." Old events (pre-upgrade) were treated as lowest risk, skewing the risk distribution and causing the fraud detection system to miss high-risk historical patterns.
Lesson: The default value must be semantically meaningful. Use null (with a union type ["null", "int"]) when the value is truly unknown, not 0.
Decision Framework
Compatibility Level Selection
| Scenario | Recommended Level |
|---|---|
| Internal microservices, single team | BACKWARD |
| Shared platform, multiple teams | FULL_TRANSITIVE |
| Public API events | FULL_TRANSITIVE |
| Log/metrics data (read by many) | FORWARD |
| Single consumer, controlled upgrade | BACKWARD |
| Long-lived topics (years of data) | FULL_TRANSITIVE |
Schema Format Selection
| Factor | Avro | Protobuf | JSON |
|---|---|---|---|
| Schema evolution support | Best | Good | Manual |
| Human readability | Medium | Low | Best |
| Encoding efficiency | Good | Best | Worst |
| Language support | Good | Best | Universal |
| Schema Registry integration | Native | Good | Limited |
| Learning curve | Medium | Medium | Low |
Advanced Topics
Data Contracts
Schema evolution is evolving into data contracts — formal agreements between producers and consumers:
interface DataContract {
apiVersion: string;
kind: 'DataContract';
metadata: {
name: string;
version: string;
owner: string;
team: string;
};
schema: {
type: 'avro' | 'protobuf' | 'json';
specification: unknown; // The actual schema
};
quality: {
checks: Array<{
name: string;
type: 'not_null' | 'unique' | 'range' | 'regex' | 'freshness';
field?: string;
parameters?: Record<string, unknown>;
}>;
};
sla: {
freshness: string; // e.g., "PT5M" (5 minutes)
availability: number; // e.g., 0.999
completeness: number; // e.g., 0.99
};
compatibility: CompatibilityLevel;
}Schema Evolution in Data Lakes
Parquet/Delta Lake/Iceberg handle schema evolution at the file level:
interface TableSchemaEvolution {
// Adding columns
addColumn(name: string, type: string, position?: 'first' | 'after'): void;
// Renaming columns
renameColumn(oldName: string, newName: string): void;
// Changing types (limited: widening only)
widenType(column: string, newType: string): void;
// Partition evolution (Iceberg-specific)
evolvePartition(oldSpec: PartitionSpec, newSpec: PartitionSpec): void;
}
interface PartitionSpec {
fields: Array<{
sourceColumn: string;
transform: 'identity' | 'year' | 'month' | 'day' | 'hour' | 'bucket' | 'truncate';
parameters?: Record<string, unknown>;
}>;
}Apache Iceberg's schema evolution is the most advanced — it supports adding, dropping, renaming, reordering, and type-widening columns without rewriting existing data files.
CI/CD Schema Validation
class SchemaCICheck {
async validateSchemaChange(
oldSchemaPath: string,
newSchemaPath: string,
compatibilityLevel: CompatibilityLevel,
): Promise<{ pass: boolean; report: string }> {
const oldSchema = await this.loadSchema(oldSchemaPath);
const newSchema = await this.loadSchema(newSchemaPath);
const detector = new BreakingChangeDetector();
const changes = detector.detectBreakingChanges(oldSchema, newSchema);
const breakingChanges = changes.filter((c) => c.severity === 'breaking');
if (breakingChanges.length > 0) {
return {
pass: false,
report: [
'SCHEMA COMPATIBILITY CHECK FAILED',
`Compatibility level: ${compatibilityLevel}`,
'',
'Breaking changes detected:',
...breakingChanges.map(
(c) => ` - [${c.type}] ${c.message}`,
),
'',
'To fix:',
' - Add default values to new required fields',
' - Use reserved field numbers instead of removing fields',
' - Use type widening instead of type changes',
].join('\n'),
};
}
return {
pass: true,
report: `Schema change compatible (${changes.length} non-breaking changes)`,
};
}
private async loadSchema(_path: string): Promise<AvroSchema> {
// Load and parse schema from file
return { type: 'record', name: '', fields: [] };
}
}Cross-References
- Data Modeling Overview — Schema design context
- Normalization & Denormalization — Schema structure decisions
- CDC Patterns — Schema evolution in CDC pipelines
- Data Quality Checks — Schema validation as quality check
- Testing Data Pipelines — Schema evolution testing
Key Takeaway
- Schema evolution requires compatibility guarantees: backward-compatible changes (adding optional fields) are safe; breaking changes (removing fields, changing types) require migration strategies.
- Schema registries (Confluent, AWS Glue) enforce compatibility rules at write time, preventing producers from breaking consumers.
- Avro excels at schema evolution (fields matched by name, defaults for missing fields); Protobuf is strong with field numbers; JSON has no built-in evolution support.
Exercise
Plan a Schema Evolution for a Production Event Stream
Your team produces a UserSignup event to Kafka, consumed by 5 different services. The current Avro schema is:
{"type": "record", "name": "UserSignup", "fields": [
{"name": "user_id", "type": "string"},
{"name": "email", "type": "string"},
{"name": "country", "type": "string"},
{"name": "signup_timestamp", "type": "long"}
]}You need to make these changes:
- Add an optional
referral_codefield - Rename
countrytocountry_code - Change
signup_timestampfromlongtostring(ISO 8601) - Remove the
emailfield (GDPR requirement)
Classify each change as backward-compatible, forward-compatible, or breaking. Propose a migration strategy.
Solution
Add
referral_code(optional with default): BACKWARD COMPATIBLE. Old consumers ignore it, new consumers read it. Safe to deploy immediately.json{"name": "referral_code", "type": ["null", "string"], "default": null}Rename
countrytocountry_code: BREAKING. Avro matches fields by name. Renaming is seen as deletingcountryand addingcountry_code. Strategy: Use Avro aliases:{"name": "country_code", "aliases": ["country"]}. This maintains backward compatibility.Change
signup_timestamptype: BREAKING. Changinglongtostringis not compatible. Strategy: Add a NEW fieldsignup_timestamp_iso(string) while keeping the oldlongfield. Deprecate the old field, and remove it after all consumers migrate (may take months).Remove
email: BREAKING for consumers that reademail. Strategy: Phase out over 3 months: (a) Addemail_hashfield, (b) Notify all 5 consumer teams, (c) Setemaildefault to empty string, (d) Stop populatingemail, (e) Remove field after all consumers confirm migration.
Registry config: Set compatibility level to BACKWARD_TRANSITIVE to prevent accidental breaking changes.
Common Misconceptions
- "Adding a field is always safe." Only if the field has a default value. Adding a required field without a default breaks all existing consumers.
- "JSON schemas evolve naturally." JSON has no built-in schema evolution. Adding/removing fields silently succeeds, but consumers crash on unexpected nulls or missing fields at runtime.
- "Schema registries are optional overhead." Without a registry, producers can push any schema change, breaking downstream consumers at runtime. Registries catch incompatible changes at deploy time.
- "Renaming a field is a simple change." In Avro and Protobuf, field identity is by name or number. Renaming is a delete + add, which is a breaking change unless aliases are used.
- "You can change a field's type freely." Type changes (int to string, string to array) are almost always breaking. The safe pattern is to add a new field with the new type and deprecate the old one.
In Production
- LinkedIn uses a centralized schema registry (the origin of Confluent Schema Registry) to manage schema evolution across 10,000+ Kafka topics, with BACKWARD_TRANSITIVE compatibility enforced by default.
- Uber uses Protobuf for all internal event schemas with strict field number management -- field numbers are never reused, and deprecated fields are reserved to prevent accidental conflicts.
- Netflix maintains a schema governance process where any schema change to a shared event requires review by both the producing and consuming teams before deployment.
- Airbnb uses Avro with their internal schema evolution tool that auto-generates migration code when schemas change, testing backward and forward compatibility as part of CI/CD.
Quiz
1. What makes a schema change "backward compatible"?
A) Old producers can write data that new consumers can read B) New consumers can read data written by old producers (old data is still valid under the new schema) C) Both producers and consumers can be updated simultaneously D) The schema change does not require a database migration
Answer
B) Backward compatibility means consumers using the new schema can read data produced with the old schema. Adding optional fields with defaults is backward compatible. Removing required fields is not.
2. How does Avro handle missing fields during deserialization?
A) It throws an error B) It uses the default value specified in the reader's schema for fields missing from the writer's data C) It fills them with null regardless of type D) It skips the entire record
Answer
B) Avro resolves schemas by matching field names between the writer's and reader's schemas. If the reader's schema has a field not in the writer's data, it uses the field's default value. If no default is defined, deserialization fails.
3. Why are field numbers important in Protobuf?
A) They determine sort order B) They are the field's identity -- fields are matched by number, not name, so numbers must be stable and never reused C) They set the maximum number of fields D) They are used for encryption
Answer
B) Protobuf serializes fields by number, not name. Renaming a field is safe (the number stays the same), but reusing a deleted field's number would cause old data to be misinterpreted. Use reserved to prevent number reuse.
4. What compatibility level should a production schema registry enforce?
A) NONE (allow any change) B) BACKWARD_TRANSITIVE (new schema can read all historical data) C) FULL (any version can read any other version) D) It depends on the use case
Answer
D) BACKWARD (or BACKWARD_TRANSITIVE) is the most common choice for Kafka consumers. FORWARD compatibility is needed when producers upgrade before consumers. FULL is the safest but most restrictive. The choice depends on your deployment model.
5. What is the safe pattern for changing a field's data type?
A) Modify the field's type in place B) Add a new field with the new type, populate both during a transition period, then deprecate the old field C) Delete the old field and add a new one with the same name D) Type changes are always backward compatible
Answer
B) The dual-field pattern: add field_v2 with the new type, produce both field and field_v2 during transition, migrate all consumers to field_v2, then deprecate field. This avoids breaking any consumer.
:::
One-Liner Summary: Schema evolution is the art of changing data structures without breaking producers or consumers -- add fields with defaults, never reuse identifiers, and enforce compatibility with a schema registry.