Skip to content
Unverified — AI-generated content. Help verify this page

Normalization & Denormalization

Why Normalization Exists

Redundant data causes three types of anomalies:

  1. Update anomaly: Changing a customer's address requires updating every row where that customer appears. Miss one, and the data is inconsistent.
  2. Insert anomaly: You can't record a new department until at least one employee is assigned to it.
  3. Delete anomaly: Deleting the last employee in a department also deletes the department's information.

Normalization eliminates these anomalies by decomposing tables so that every fact is stored exactly once.

Historical Context

  • 1970: Edgar Codd defines the relational model and First Normal Form
  • 1971: Codd defines Second and Third Normal Form
  • 1974: Codd and Boyce define Boyce-Codd Normal Form (BCNF)
  • 1977: Fagin defines Fourth Normal Form (4NF)
  • 1979: Fagin defines Fifth Normal Form (5NF / Project-Join Normal Form)
  • 2002: Date, Darwen, and Lorentzos define Sixth Normal Form (6NF)

In practice, most production databases aim for 3NF or BCNF. Higher normal forms are rarely needed.

First Principles

Functional Dependencies

A functional dependency (FD) is the fundamental concept behind normalization:

XY

Read: "X functionally determines Y" — knowing the value of X uniquely determines the value of Y.

Examples:

  • SSN -> Name, BirthDate (Social Security Number determines name and birth date)
  • StudentID, CourseID -> Grade (student-course combination determines grade)
  • ZipCode -> City, State (zip code determines city and state)

Formal definition:

XYt1,t2R:t1[X]=t2[X]t1[Y]=t2[Y]

Armstrong's Axioms

The complete set of rules for reasoning about functional dependencies:

  1. Reflexivity: If YX, then XY
  2. Augmentation: If XY, then XZYZ
  3. Transitivity: If XY and YZ, then XZ

Derived rules: 4. Union: If XY and XZ, then XYZ 5. Decomposition: If XYZ, then XY and XZ 6. Pseudotransitivity: If XY and WYZ, then WXZ

Closure of Functional Dependencies

The closure of attribute set X under a set of FDs F, written XF+, is all attributes functionally determined by X:

typescript
function computeClosure(
  attributes: Set<string>,
  functionalDependencies: Array<{ lhs: Set<string>; rhs: Set<string> }>,
): Set<string> {
  const closure = new Set(attributes);
  let changed = true;

  while (changed) {
    changed = false;
    for (const fd of functionalDependencies) {
      // If lhs is a subset of closure, add rhs to closure
      if (isSubset(fd.lhs, closure)) {
        const sizeBefore = closure.size;
        for (const attr of fd.rhs) {
          closure.add(attr);
        }
        if (closure.size > sizeBefore) {
          changed = true;
        }
      }
    }
  }

  return closure;
}

function isSubset(subset: Set<string>, superset: Set<string>): boolean {
  for (const item of subset) {
    if (!superset.has(item)) return false;
  }
  return true;
}

// Example: Given FDs {A -> B, B -> C, A,D -> E}
// Closure of {A} = {A, B, C}
// Closure of {A, D} = {A, B, C, D, E}
const fds = [
  { lhs: new Set(['A']), rhs: new Set(['B']) },
  { lhs: new Set(['B']), rhs: new Set(['C']) },
  { lhs: new Set(['A', 'D']), rhs: new Set(['E']) },
];

const closureA = computeClosure(new Set(['A']), fds);
// closureA = { A, B, C }

const closureAD = computeClosure(new Set(['A', 'D']), fds);
// closureAD = { A, B, C, D, E }

Candidate Keys

A candidate key is a minimal set of attributes whose closure is all attributes:

K is a candidate keyKF+=R and KK:KF+R
typescript
function findCandidateKeys(
  allAttributes: Set<string>,
  functionalDependencies: Array<{ lhs: Set<string>; rhs: Set<string> }>,
): Set<string>[] {
  const candidates: Set<string>[] = [];

  // Start with single attributes, grow if needed
  const queue: Set<string>[] = [...[...allAttributes].map((a) => new Set([a]))];
  const checked = new Set<string>();

  while (queue.length > 0) {
    const candidate = queue.shift()!;
    const key = [...candidate].sort().join(',');
    if (checked.has(key)) continue;
    checked.add(key);

    const closure = computeClosure(candidate, functionalDependencies);

    if (setsEqual(closure, allAttributes)) {
      // Check minimality — no proper subset is also a key
      const isMinimal = candidates.every(
        (existing) => !isSubset(existing, candidate),
      );
      if (isMinimal) {
        // Remove any existing candidates that are supersets
        const filtered = candidates.filter(
          (existing) => !isSubset(candidate, existing),
        );
        filtered.push(candidate);
        candidates.length = 0;
        candidates.push(...filtered);
      }
    } else {
      // Not a key — try adding one more attribute
      for (const attr of allAttributes) {
        if (!candidate.has(attr)) {
          const expanded = new Set(candidate);
          expanded.add(attr);
          queue.push(expanded);
        }
      }
    }
  }

  return candidates;
}

function setsEqual(a: Set<string>, b: Set<string>): boolean {
  if (a.size !== b.size) return false;
  for (const item of a) {
    if (!b.has(item)) return false;
  }
  return true;
}

Normal Forms

First Normal Form (1NF)

Requirement: All attributes contain only atomic (indivisible) values. No repeating groups.

Violation:

OrderIDProduct1Product2Product3
1WidgetGadgetnull
2Sprocketnullnull

1NF (fixed):

OrderIDProduct
1Widget
1Gadget
2Sprocket

WARNING

JSON columns and array types in modern databases technically violate 1NF. Whether this matters depends on whether you need to query individual array elements. For document-style storage (Postgres JSONB), pragmatic 1NF violations are often acceptable.

Second Normal Form (2NF)

Requirement: 1NF + no partial dependencies on any candidate key.

A partial dependency exists when a non-key attribute depends on only part of a composite key.

2NF violation: ABC where AC (partial dependency on key AB)

Violation:

StudentIDCourseIDGradeCourseName
S1C1AMath
S1C2BScience
S2C1AMath

Key: (StudentID, CourseID) FD: CourseID -> CourseName (partial dependency — CourseName depends only on CourseID)

2NF (decomposed):

Enrollments:

StudentIDCourseIDGrade
S1C1A

Courses:

CourseIDCourseName
C1Math

Third Normal Form (3NF)

Requirement: 2NF + no transitive dependencies.

A transitive dependency exists when a non-key attribute determines another non-key attribute.

3NF violation: ABC where B is not a key

Formal definition: For every non-trivial FD XA where A is a single attribute:

  • X is a superkey, OR
  • A is a member of some candidate key (prime attribute)

Violation:

EmployeeIDDepartmentIDDepartmentNameDepartmentHead
E1D1EngineeringAlice
E2D1EngineeringAlice
E3D2MarketingBob

FDs: EmployeeID -> DepartmentID, DepartmentID -> DepartmentName, DepartmentHead

Transitive: EmployeeID -> DepartmentID -> DepartmentName

3NF (decomposed):

Employees:

EmployeeIDDepartmentID
E1D1

Departments:

DepartmentIDDepartmentNameDepartmentHead
D1EngineeringAlice

Boyce-Codd Normal Form (BCNF)

Requirement: For every non-trivial FD XY: X must be a superkey.

BCNF is stricter than 3NF — it doesn't allow the "prime attribute" exception.

BCNF: (XY)F:X is a superkey

Where 3NF and BCNF differ:

StudentIDSubjectProfessor
S1MathProf. A
S2MathProf. B
S1ScienceProf. C

Candidate keys: (StudentID, Subject) and (StudentID, Professor) FD: Professor -> Subject (each professor teaches one subject)

This is in 3NF (Subject is a prime attribute) but NOT in BCNF (Professor is not a superkey).

BCNF decomposition:

StudentProfessor:

StudentIDProfessor
S1Prof. A

ProfessorSubject:

ProfessorSubject
Prof. AMath

WARNING

BCNF decomposition may not preserve all functional dependencies. In the example above, the FD (StudentID, Subject) -> Professor is lost. In practice, this means you lose the ability to enforce that constraint through table structure alone. You need application-level or trigger-based enforcement.

BCNF Decomposition Algorithm

typescript
interface Relation {
  name: string;
  attributes: Set<string>;
  fds: Array<{ lhs: Set<string>; rhs: Set<string> }>;
}

function decomposeToBCNF(relation: Relation): Relation[] {
  // Find a violating FD
  const violation = findBCNFViolation(relation);

  if (!violation) {
    return [relation]; // Already in BCNF
  }

  // Decompose: R into R1(X, Y) and R2(X, R-Y)
  const r1Attrs = new Set([...violation.lhs, ...violation.rhs]);
  const r2Attrs = new Set([...relation.attributes].filter(
    (a) => !violation.rhs.has(a) || violation.lhs.has(a),
  ));

  const r1: Relation = {
    name: `${relation.name}_1`,
    attributes: r1Attrs,
    fds: projectFDs(relation.fds, r1Attrs),
  };

  const r2: Relation = {
    name: `${relation.name}_2`,
    attributes: r2Attrs,
    fds: projectFDs(relation.fds, r2Attrs),
  };

  // Recursively decompose each part
  return [...decomposeToBCNF(r1), ...decomposeToBCNF(r2)];
}

function findBCNFViolation(
  relation: Relation,
): { lhs: Set<string>; rhs: Set<string> } | null {
  for (const fd of relation.fds) {
    // Check if lhs is a superkey
    const closure = computeClosure(fd.lhs, relation.fds);
    if (!isSubset(relation.attributes, closure)) {
      // lhs is not a superkey -> BCNF violation
      return fd;
    }
  }
  return null;
}

function projectFDs(
  fds: Array<{ lhs: Set<string>; rhs: Set<string> }>,
  attributes: Set<string>,
): Array<{ lhs: Set<string>; rhs: Set<string> }> {
  // Keep only FDs where all attributes are in the projection
  return fds.filter(
    (fd) =>
      isSubset(fd.lhs, attributes) && isSubset(fd.rhs, attributes),
  );
}

Higher Normal Forms (Brief)

Normal FormEliminatesPractical Use
4NFMulti-valued dependenciesRare — specific patterns
5NF (PJNF)Join dependenciesAcademic
6NFAll non-trivial dependenciesTemporal databases, Data Vault

Denormalization

Why Denormalize?

Normalization optimizes for write operations and data integrity. But analytical queries often need to read across many normalized tables, requiring expensive joins:

Query costnormalized=i=1n|Ti|+join_cost(T1T2Tn)Query costdenormalized=|Tflat|

Denormalization trades write complexity for read performance.

Strategic Denormalization Techniques

Pre-Joined Tables

sql
-- Normalized: 3 tables
SELECT o.order_id, c.name, p.product_name, oi.quantity
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;

-- Denormalized: 1 table
SELECT order_id, customer_name, product_name, quantity
FROM order_details_flat;

Computed Columns

sql
-- Instead of computing age from birth_date every query:
ALTER TABLE customers ADD COLUMN age INT;

-- Updated by trigger or batch job
UPDATE customers
SET age = EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date));

Summary Tables

typescript
interface SummaryTableDefinition {
  name: string;
  sourceTable: string;
  grain: string;
  dimensions: string[];
  measures: Array<{
    name: string;
    expression: string;
    aggregation: 'SUM' | 'COUNT' | 'AVG' | 'MIN' | 'MAX';
  }>;
  refreshSchedule: string;
}

const dailySalesSummary: SummaryTableDefinition = {
  name: 'summary_daily_sales',
  sourceTable: 'fact_sales JOIN dim_product JOIN dim_store',
  grain: 'product_category by store by day',
  dimensions: ['date_key', 'product_category', 'store_id'],
  measures: [
    { name: 'total_revenue', expression: 'SUM(total_amount)', aggregation: 'SUM' },
    { name: 'total_units', expression: 'SUM(quantity)', aggregation: 'SUM' },
    { name: 'transaction_count', expression: 'COUNT(*)', aggregation: 'SUM' },
    { name: 'avg_transaction_value', expression: 'AVG(total_amount)', aggregation: 'AVG' },
  ],
  refreshSchedule: 'DAILY 02:00 UTC',
};

Denormalization Decision Matrix

FactorNormalizeDenormalize
Write-heavy workloadYesNo
Read-heavy workloadNoYes
Data integrity criticalYesWith constraints
Complex ad-hoc queriesDependsYes
Storage constrainedYesNo
Query latency criticalNoYes
Multiple update sourcesYesCarefully
Reporting workloadNoYes

Performance Characteristics

Join Cost Analysis

For hash joins (most common in analytical databases):

Cost(RS)=O(|R|+|S|) (build + probe)

For n tables:

Cost(T1T2Tn)=O(i=1n|Ti|+|T1T2Tn|)

The result size depends on join selectivity:

|RAS|=|R|×|S|V(A) (under uniform distribution)

Where V(A) is the number of distinct values of the join attribute.

Storage vs. Query Tradeoff

Redundancy factor=|Tdenormalized|×avg_row_sizedenormsum(|Ti|×avg_row_sizei)

Typical values:

  • Mild denormalization: 1.2-2x storage
  • Star schema: 2-5x storage
  • Full flattening: 5-20x storage

Index Efficiency

Normalized tables with proper indexes:

Lookup cost=O(logBN) per index (B-tree)

Denormalized tables can use composite indexes more effectively:

Covering index cost=O(logBN) with no table access

Edge Cases & Failure Modes

Denormalization Update Anomalies

The very anomalies normalization was designed to prevent return with denormalization:

typescript
// DANGER: Denormalized table with customer name in every order
interface OrderFlat {
  orderId: number;
  customerName: string;  // Redundant — also in customers table
  productName: string;   // Redundant — also in products table
  amount: number;
}

// Customer renames: must update EVERY order row
async function updateCustomerName(
  customerId: string,
  newName: string,
  db: Database,
): Promise<number> {
  // This could update millions of rows
  const result = await db.query(
    'UPDATE order_details_flat SET customer_name = $1 WHERE customer_id = $2',
    [newName, customerId],
  );
  return result.rowCount; // Could be 100,000+ rows
}

Mitigation strategies:

  1. Materialized views: Database manages consistency
  2. Event-driven updates: CDC triggers denormalized table updates
  3. Accept staleness: For analytical workloads, slightly stale denormalized data is often acceptable
  4. Rebuild pattern: Drop and rebuild the denormalized table daily

Over-Normalization

Excessive normalization creates its own problems:

sql
-- Over-normalized: separate table for each attribute
CREATE TABLE person_names (person_id INT, name VARCHAR);
CREATE TABLE person_emails (person_id INT, email VARCHAR);
CREATE TABLE person_phones (person_id INT, phone VARCHAR);
CREATE TABLE person_addresses (person_id INT, address VARCHAR);
-- ... 20 more tables

-- Simple query requires 20+ joins
SELECT n.name, e.email, p.phone, a.address
FROM person_names n
JOIN person_emails e ON n.person_id = e.person_id
JOIN person_phones p ON n.person_id = p.person_id
JOIN person_addresses a ON n.person_id = a.person_id;
-- This is 6NF — almost never necessary

DANGER

6NF is almost never the right choice for production systems. It's useful in specialized temporal databases and Data Vault satellites, but for general-purpose use, 3NF or BCNF is sufficient.

Mathematical Foundations

Lossless Decomposition

A decomposition of R into R1 and R2 is lossless (no spurious tuples) iff:

R1R2R1 or R1R2R2

The common attributes must functionally determine at least one of the decomposed relations.

Proof: If neither FD holds, the natural join R1R2 may contain tuples not in the original R (spurious tuples).

Dependency Preservation

A decomposition preserves dependencies if every FD in F can be enforced using only the FDs within individual decomposed relations:

(iFi)+=F+

Where Fi is the projection of F onto Ri.

WARNING

BCNF decomposition is always lossless but may NOT preserve dependencies. 3NF decomposition is always both lossless AND dependency-preserving (Synthesis Algorithm).

Minimal Cover

A minimal (canonical) cover Fc of FD set F has:

  1. Every FD has a single attribute on the right side
  2. No FD can be removed without changing F+
  3. No attribute can be removed from the left side of any FD without changing F+
typescript
function computeMinimalCover(
  fds: Array<{ lhs: Set<string>; rhs: Set<string> }>,
): Array<{ lhs: Set<string>; rhs: string }> {
  // Step 1: Split right-hand sides
  let split: Array<{ lhs: Set<string>; rhs: string }> = [];
  for (const fd of fds) {
    for (const attr of fd.rhs) {
      split.push({ lhs: new Set(fd.lhs), rhs: attr });
    }
  }

  // Step 2: Remove redundant attributes from left sides
  for (const fd of split) {
    for (const attr of [...fd.lhs]) {
      const reduced = new Set(fd.lhs);
      reduced.delete(attr);
      if (reduced.size === 0) continue;

      const closure = computeClosure(
        reduced,
        split.map((f) => ({ lhs: f.lhs, rhs: new Set([f.rhs]) })),
      );
      if (closure.has(fd.rhs)) {
        fd.lhs = reduced;
      }
    }
  }

  // Step 3: Remove redundant FDs
  split = split.filter((fd, index) => {
    const remaining = split.filter((_, i) => i !== index);
    const closure = computeClosure(
      fd.lhs,
      remaining.map((f) => ({ lhs: f.lhs, rhs: new Set([f.rhs]) })),
    );
    return !closure.has(fd.rhs);
  });

  return split;
}

Real-World War Stories

War Story

The Join of Death

A SaaS company's analytics database had a perfectly normalized schema with 47 tables. Their most important dashboard required a 23-table join. During peak hours, this query took 45 seconds and consumed 100% of the database CPU.

Analysis:

  • 23 tables joined produced 14 intermediate result sets
  • Query optimizer chose suboptimal join ordering
  • Several intermediate results were larger than any source table (Cartesian-like expansion)

Fix:

  1. Created a denormalized analytics_events table with the 12 most-used attributes
  2. Materialized view refreshed every 5 minutes
  3. Dashboard query dropped from 23 joins to 0 joins
  4. Query time: 45 seconds → 200 milliseconds
  5. Storage increase: 3x for the denormalized table, but trivial compared to total database size

War Story

The Normalization Religion

A team of database purists insisted on 5NF for an e-commerce application. The product catalog had 8 levels of normalization with separate tables for each attribute domain.

Adding a new product required INSERT statements to 12 tables. The product page required JOINs across all 12 tables. During Black Friday, the product page load time exceeded 5 seconds, and the database connection pool was exhausted.

Lesson: Normal form is a tool, not a religion. OLTP systems typically need 3NF. OLAP systems need denormalization. Using 5NF for a web application was the wrong tool for the job.

Decision Framework

Normalization Level Selection

Application TypeRecommended Normal FormRationale
OLTP (operational)3NF or BCNFBalance integrity and performance
OLAP (analytical)Denormalized (star)Read performance
Mixed workload3NF + materialized viewsBest of both
Microservice DB3NF within serviceBounded context
Data Vault~6NF (satellites)Maximum flexibility
Document store1NF or noneSchema-on-read

The Normalization/Denormalization Spectrum

Most production systems sit between 3NF and Star Schema. The extremes (6NF and OBT) are for specialized use cases.

Advanced Topics

Temporal Normalization

Traditional normalization doesn't account for time-varying data. Temporal databases extend the theory:

Temporal FD: XtYt:X(t)Y(t)

This means the FD holds at every point in time, not just in the current state.

Normalization in NoSQL

NoSQL databases intentionally denormalize for specific access patterns:

typescript
// DynamoDB: Denormalized single-table design
interface DynamoDBItem {
  PK: string;    // Partition key
  SK: string;    // Sort key
  // All attributes for all entity types in one table
  type: 'Customer' | 'Order' | 'OrderItem';
  customerName?: string;
  orderDate?: string;
  productName?: string;
  quantity?: number;
}

// Access pattern 1: Get customer by ID
// PK = 'CUSTOMER#123', SK = 'PROFILE'

// Access pattern 2: Get all orders for a customer
// PK = 'CUSTOMER#123', SK begins_with 'ORDER#'

// Access pattern 3: Get all items in an order
// PK = 'ORDER#456', SK begins_with 'ITEM#'

This is the antithesis of normalization — data is modeled around access patterns, not entities.

Automated Denormalization with dbt

sql
-- dbt model: automatically denormalize for analytics
-- models/marts/order_details.sql

SELECT
    o.order_id,
    o.order_date,
    c.customer_name,
    c.customer_segment,
    c.city AS customer_city,
    p.product_name,
    p.category AS product_category,
    oi.quantity,
    oi.unit_price,
    oi.quantity * oi.unit_price AS line_total
FROM {​{ ref('stg_orders') }} o
JOIN {​{ ref('stg_customers') }} c ON o.customer_id = c.customer_id
JOIN {​{ ref('stg_order_items') }} oi ON o.order_id = oi.order_id
JOIN {​{ ref('stg_products') }} p ON oi.product_id = p.product_id

Cross-References


Key Takeaway

  • Normalization (1NF through BCNF) eliminates update, insert, and delete anomalies by ensuring every fact is stored exactly once.
  • Denormalization intentionally reintroduces redundancy for read performance -- precomputing joins into flat tables for analytics workloads.
  • OLTP systems should be normalized (3NF+); OLAP/analytics systems should be denormalized (star schemas); the choice is driven by read vs write patterns.
Exercise

Normalize and Then Strategically Denormalize

Given this single denormalized table:

OrderDetails(order_id, order_date, customer_id, customer_name, customer_city,
             product_id, product_name, category_name, unit_price,
             quantity, discount, salesperson_name, salesperson_region)
  1. Identify the functional dependencies
  2. Normalize to 3NF (at least)
  3. Strategically denormalize for a star schema analytics use case
  4. Explain what anomalies the normalization fixes and what performance the denormalization gains
Solution

Functional Dependencies:

  • order_id -> order_date, customer_id, salesperson_name
  • customer_id -> customer_name, customer_city
  • product_id -> product_name, category_name, unit_price
  • salesperson_name -> salesperson_region
  • (order_id, product_id) -> quantity, discount

3NF Tables:

  • Orders(order_id PK, order_date, customer_id FK, salesperson_id FK)
  • Customers(customer_id PK, customer_name, customer_city)
  • Products(product_id PK, product_name, category_name, unit_price)
  • Salespersons(salesperson_id PK, salesperson_name, salesperson_region)
  • OrderItems(order_id FK, product_id FK, quantity, discount) -- composite PK

Star Schema (Denormalized):

  • Fact: fct_order_items(order_id, product_id, customer_key, date_key, salesperson_key, quantity, discount, unit_price, net_amount)
  • Dim: dim_customer(customer_key, customer_id, name, city)
  • Dim: dim_product(product_key, product_id, name, category_name, unit_price)
  • Dim: dim_salesperson(salesperson_key, name, region)
  • Dim: dim_date(date_key, date, month, quarter, year)

Anomalies fixed: Updating a customer's city in 3NF = 1 row change. In the original table = potentially thousands of rows. Performance gained: Analytics query "revenue by category by quarter" goes from a 5-table join to a 2-table join (fact + dim_product + dim_date).

Common Misconceptions

  • "Normalization always improves performance." Normalization improves write performance and data integrity but can hurt read performance (more joins). Analytics workloads are read-heavy, so denormalization is appropriate.
  • "Denormalization is just sloppy design." Strategic denormalization is a deliberate design choice for read-heavy workloads. It trades storage and write complexity for query simplicity and speed.
  • "3NF is always sufficient." 3NF does not eliminate all anomalies. BCNF eliminates anomalies from non-trivial functional dependencies where the determinant is not a superkey.
  • "You should normalize everything first, then denormalize." For OLAP systems, start by designing the star schema directly using dimensional modeling principles. Normalization-then-denormalization is an academic exercise, not a practical design process.

In Production

  • Airbnb maintains normalized operational databases (PostgreSQL) for their booking system while running denormalized star schemas in their data warehouse for analytics -- two different models for two different purposes.
  • Uber uses heavily denormalized tables in their analytics warehouse to avoid the latency of multi-table joins across their petabyte-scale datasets.
  • Spotify denormalizes listening event data by embedding artist and track attributes directly into the event record for sub-second analytics queries.
  • LinkedIn follows a strict normalization policy for their operational databases but builds wide, denormalized tables in their data lake for ML feature engineering.
Quiz

1. What is the primary goal of normalization?

A) To improve query performance B) To eliminate data redundancy and the update, insert, and delete anomalies it causes C) To reduce storage costs D) To make data easier to understand

Answer

B) Normalization decomposes tables so that each fact is stored exactly once. This eliminates update anomalies (inconsistent changes), insert anomalies (can't add data without unrelated data), and delete anomalies (losing data when deleting unrelated records).

2. What distinguishes 3NF from 2NF?

A) 3NF requires all fields to be atomic B) 3NF eliminates transitive dependencies -- non-key attributes must depend on the key directly, not through another non-key attribute C) 3NF requires a primary key D) 3NF supports multiple tables

Answer

B) In 2NF, salesperson_region might depend on salesperson_name which depends on order_id. This transitive dependency means updating a salesperson's region requires updating every order they are on. 3NF removes this by putting salesperson data in its own table.

3. When is denormalization the RIGHT design choice?

A) When the database is too slow for all workloads B) When the workload is read-heavy (analytics/OLAP) and join performance is a bottleneck C) When you want to save storage space D) When the schema has too many tables

Answer

B) Denormalization precomputes joins by storing redundant data in fewer tables. This trades write complexity and storage for faster reads. It is the standard practice for analytics/data warehouse workloads.

4. What is an update anomaly?

A) A database crash during an update B) When changing a fact requires updating multiple rows because the fact is stored redundantly, risking inconsistency if some rows are missed C) A slow-running update query D) An update that violates a foreign key constraint

Answer

B) If a customer's city is stored in every order row, updating the city requires updating ALL of those rows. If one row is missed, the database contains contradictory information about the customer's city.

5. What is BCNF and how does it differ from 3NF?

A) BCNF is a less strict form of normalization B) In BCNF, every determinant (left side of a functional dependency) must be a superkey -- this catches edge cases that 3NF misses C) BCNF adds encryption to the schema D) BCNF is the same as 3NF with an index

Answer

B) 3NF allows non-key attributes to determine other attributes if they are part of a candidate key. BCNF is stricter: EVERY functional dependency must have a superkey as its determinant. This eliminates subtle anomalies that 3NF permits.

:::


One-Liner Summary: Normalize for correctness in OLTP, denormalize for speed in OLAP -- the right level of normalization depends entirely on whether your workload is write-heavy or read-heavy.

"What I cannot create, I do not understand." — Richard Feynman