Skip to content
Unverified — AI-generated content. Help verify this page

SQL Cheat Sheet

Quick reference for SQL query patterns, joins, window functions, CTEs, indexing strategies, and reading EXPLAIN output. Examples use PostgreSQL syntax.

Deep dive: Database Section | Query Optimization


Basic Queries

sql
-- Select with filtering
SELECT id, name, email
FROM users
WHERE status = 'active'
  AND created_at > '2025-01-01'
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;

-- Insert
INSERT INTO users (name, email, status)
VALUES ('Alice', 'alice@example.com', 'active')
RETURNING id;

-- Insert multiple rows
INSERT INTO users (name, email) VALUES
  ('Alice', 'alice@ex.com'),
  ('Bob', 'bob@ex.com'),
  ('Charlie', 'charlie@ex.com');

-- Upsert (INSERT ... ON CONFLICT)
INSERT INTO users (email, name)
VALUES ('alice@ex.com', 'Alice Updated')
ON CONFLICT (email)
DO UPDATE SET name = EXCLUDED.name, updated_at = NOW();

-- Update
UPDATE users
SET status = 'inactive', updated_at = NOW()
WHERE last_login < NOW() - INTERVAL '90 days';

-- Delete
DELETE FROM users WHERE status = 'deleted';

-- Truncate (fast delete all rows)
TRUNCATE TABLE logs;

Joins

Join Types Visual Reference

INNER JOIN       LEFT JOIN        RIGHT JOIN       FULL OUTER JOIN
 A [AB] B        [A AB] B         A [AB B]         [A AB B]
   ██               ███              ███             █████

Join Syntax

sql
-- INNER JOIN: rows that match in both tables
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON o.user_id = u.id;

-- LEFT JOIN: all rows from left, matched from right
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;

-- Multiple joins
SELECT u.name, o.id, p.name AS product
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id;

-- Self join (hierarchical data)
SELECT e.name, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

-- Cross join (cartesian product)
SELECT s.size, c.color
FROM sizes s
CROSS JOIN colors c;

-- Lateral join (per-row subquery)
SELECT u.name, latest.total
FROM users u
CROSS JOIN LATERAL (
  SELECT total
  FROM orders
  WHERE user_id = u.id
  ORDER BY created_at DESC
  LIMIT 1
) latest;

Anti-Join (rows NOT in other table)

sql
-- Method 1: LEFT JOIN + IS NULL (usually fastest)
SELECT u.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;

-- Method 2: NOT EXISTS
SELECT u.*
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- Method 3: NOT IN (avoid if subquery can return NULL)
SELECT u.*
FROM users u
WHERE u.id NOT IN (SELECT user_id FROM orders);

Aggregations

sql
-- Basic aggregates
SELECT
  status,
  COUNT(*) AS total,
  COUNT(DISTINCT email) AS unique_emails,
  SUM(amount) AS total_amount,
  AVG(amount) AS avg_amount,
  MIN(amount) AS min_amount,
  MAX(amount) AS max_amount
FROM orders
GROUP BY status
HAVING COUNT(*) > 10
ORDER BY total DESC;

-- Grouping sets (multiple groupings in one query)
SELECT
  COALESCE(country, 'ALL') AS country,
  COALESCE(city, 'ALL') AS city,
  COUNT(*)
FROM users
GROUP BY GROUPING SETS (
  (country, city),
  (country),
  ()
);

-- Filter aggregate (PostgreSQL)
SELECT
  COUNT(*) AS total,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  COUNT(*) FILTER (WHERE status = 'inactive') AS inactive
FROM users;

Window Functions

Window functions perform calculations across a set of rows related to the current row, without collapsing rows.

Syntax

sql
function_name() OVER (
  [PARTITION BY column]
  [ORDER BY column]
  [ROWS/RANGE frame_spec]
)

Common Window Functions

sql
SELECT
  name,
  department,
  salary,

  -- Ranking
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
  RANK() OVER (ORDER BY salary DESC) AS rank,
  DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
  NTILE(4) OVER (ORDER BY salary DESC) AS quartile,

  -- Ranking within partition
  ROW_NUMBER() OVER (
    PARTITION BY department ORDER BY salary DESC
  ) AS dept_rank,

  -- Aggregates as windows
  SUM(salary) OVER (PARTITION BY department) AS dept_total,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg,
  COUNT(*) OVER (PARTITION BY department) AS dept_count,

  -- Running total
  SUM(salary) OVER (ORDER BY hire_date) AS running_total,

  -- Moving average (last 3 rows)
  AVG(salary) OVER (
    ORDER BY hire_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS moving_avg,

  -- Access other rows
  LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary,
  LEAD(salary, 1) OVER (ORDER BY hire_date) AS next_salary,
  FIRST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
  ) AS highest_paid,
  LAST_VALUE(name) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS lowest_paid

FROM employees;

Ranking Functions Compared

FunctionTiesGapsExample (values: 100, 100, 90)
ROW_NUMBER()NoNo1, 2, 3
RANK()YesYes1, 1, 3
DENSE_RANK()YesNo1, 1, 2

Common Table Expressions (CTEs)

sql
-- Basic CTE
WITH active_users AS (
  SELECT id, name, email
  FROM users
  WHERE status = 'active'
)
SELECT au.name, COUNT(o.id) AS order_count
FROM active_users au
LEFT JOIN orders o ON o.user_id = au.id
GROUP BY au.name;

-- Multiple CTEs
WITH
monthly_revenue AS (
  SELECT
    DATE_TRUNC('month', created_at) AS month,
    SUM(total) AS revenue
  FROM orders
  GROUP BY 1
),
monthly_growth AS (
  SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
    ROUND(
      (revenue - LAG(revenue) OVER (ORDER BY month))
      / LAG(revenue) OVER (ORDER BY month) * 100, 2
    ) AS growth_pct
  FROM monthly_revenue
)
SELECT * FROM monthly_growth ORDER BY month;

-- Recursive CTE (hierarchical data)
WITH RECURSIVE org_tree AS (
  -- Base case: top-level managers
  SELECT id, name, manager_id, 0 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  INNER JOIN org_tree ot ON ot.id = e.manager_id
)
SELECT * FROM org_tree ORDER BY depth, name;

Indexing Rules

When to Create Indexes

ScenarioIndex Type
WHERE clause equalityB-tree (default)
WHERE clause range (<, >, BETWEEN)B-tree
Full-text searchGIN
JSONB field queriesGIN
Geospatial queriesGiST
Array containmentGIN
Pattern matching (LIKE 'prefix%')B-tree (with text_pattern_ops)

Index Creation

sql
-- Single column
CREATE INDEX idx_users_email ON users (email);

-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users (email);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC);

-- Partial index (smaller, faster)
CREATE INDEX idx_active_users ON users (email) WHERE status = 'active';

-- Expression index
CREATE INDEX idx_users_lower_email ON users (LOWER(email));

-- Covering index (INCLUDE)
CREATE INDEX idx_orders_user ON orders (user_id) INCLUDE (total, status);

-- GIN index for JSONB
CREATE INDEX idx_data ON events USING GIN (payload);

-- Concurrent index creation (no table lock)
CREATE INDEX CONCURRENTLY idx_name ON table (column);

Composite Index Column Order

The leftmost prefix rule: a composite index on (A, B, C) can serve queries on:

  • A alone
  • A, B together
  • A, B, C together
  • But NOT B alone or C alone

When NOT to Index

  • Small tables (< 1000 rows) -- sequential scan is faster
  • Columns with very low cardinality (e.g., boolean with 50/50 split)
  • Heavily written tables where index maintenance cost > query benefit
  • Columns rarely used in WHERE, JOIN, or ORDER BY

EXPLAIN Reading Guide

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users WHERE email = 'alice@ex.com';

Key Metrics

MetricMeaning
cost=0.00..8.17Startup cost..total cost (arbitrary units)
rows=1Estimated rows returned
actual time=0.01..0.02Actual time in ms (with ANALYZE)
loops=1Times this node executed
Buffers: shared hit=3Pages read from cache
Buffers: shared read=1Pages read from disk

Scan Types (Best to Worst)

ScanMeaningPerformance
Index Only ScanAll data from indexBest
Index ScanIndex lookup + heap fetchGood
Bitmap Index ScanIndex bitmap + heap scanGood for many rows
Seq ScanFull table scanBad for large tables

Join Types

JoinMeaning
Nested LoopFor each row in outer, scan inner. Good for small sets
Hash JoinBuild hash table from smaller set. Good for equality joins
Merge JoinSort both, then merge. Good for large sorted sets

Red Flags in EXPLAIN

Red FlagMeaning
Seq Scan on large tableMissing index
High rows estimate vs actualStale statistics, run ANALYZE
Nested Loop with large innerMay need hash or merge join
Sort with large sort_method: external mergeNeed more work_mem
Many Buffers: shared readCold cache or table too large for memory

Useful Patterns

Pagination

sql
-- Offset pagination (simple but slow for large offsets)
SELECT * FROM posts ORDER BY id LIMIT 20 OFFSET 100;

-- Keyset pagination (fast for any page)
SELECT * FROM posts
WHERE id > 12345  -- last ID from previous page
ORDER BY id
LIMIT 20;

Deduplication

sql
-- Keep latest per group
DELETE FROM events a
USING (
  SELECT DISTINCT ON (user_id)
    id, user_id
  FROM events
  ORDER BY user_id, created_at DESC
) b
WHERE a.user_id = b.user_id AND a.id != b.id;

Date Ranges

sql
-- Generate date series
SELECT d::date
FROM generate_series('2025-01-01', '2025-12-31', '1 day'::interval) d;

-- Fill in missing dates
SELECT d.date, COALESCE(o.count, 0) AS orders
FROM generate_series('2025-01-01', '2025-01-31', '1 day'::interval) d(date)
LEFT JOIN (
  SELECT DATE(created_at) AS date, COUNT(*) AS count
  FROM orders
  GROUP BY 1
) o ON o.date = d.date;

JSON Operations (PostgreSQL)

sql
-- Extract value
SELECT payload->>'name' FROM events;           -- text
SELECT payload->'address'->>'city' FROM events; -- nested text
SELECT payload#>>'{address,city}' FROM events;  -- path extraction

-- Query JSONB
SELECT * FROM events WHERE payload @> '{"type": "click"}';
SELECT * FROM events WHERE payload ? 'name';           -- key exists
SELECT * FROM events WHERE payload ?| ARRAY['a', 'b']; -- any key exists

-- Aggregate to JSON
SELECT json_agg(row_to_json(u)) FROM users u;

Troubleshooting

ProblemSolution
Query is slowRun EXPLAIN ANALYZE, check for Seq Scan
Table bloatRun VACUUM FULL table
Stale statisticsRun ANALYZE table
Lock contentionCheck pg_stat_activity for blocked queries
Connection exhaustionUse connection pooling (PgBouncer)
DeadlockCheck pg_locks, ensure consistent lock ordering
Index not being usedCheck if query pattern matches index columns
sql
-- Check running queries
SELECT pid, query, state, wait_event, query_start
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

-- Kill a query
SELECT pg_cancel_backend(pid);    -- graceful
SELECT pg_terminate_backend(pid); -- force

-- Check table sizes
SELECT
  relname,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  pg_size_pretty(pg_relation_size(relid)) AS table_size,
  pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

-- Check index usage
SELECT
  indexrelname,
  idx_scan AS scans,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan;

Test Yourself
  1. What SQL clause inserts a row or updates it if a unique constraint is violated (PostgreSQL)?INSERT ... ON CONFLICT ... DO UPDATE

  2. What is the difference between RANK() and DENSE_RANK() for tied values?RANK() leaves gaps after ties (1,1,3), DENSE_RANK() does not (1,1,2).

  3. How do you get the previous row's value in a window function?LAG(column, 1) OVER (ORDER BY ...)

  4. What keyword makes a CTE reference itself for hierarchical queries?RECURSIVE

  5. What anti-join pattern is usually fastest: LEFT JOIN + IS NULL, NOT EXISTS, or NOT IN?LEFT JOIN + IS NULL is usually fastest.

  6. What command creates an index without locking the table in PostgreSQL?CREATE INDEX CONCURRENTLY

  7. What does a Seq Scan on a large table indicate in EXPLAIN output? A missing index -- the database is scanning every row.

  8. How do you perform keyset (cursor) pagination instead of OFFSET?WHERE id > last_seen_id ORDER BY id LIMIT 20

  9. What function generates a series of dates in PostgreSQL?generate_series('start'::date, 'end'::date, '1 day'::interval)

  10. What is the leftmost prefix rule for composite indexes? An index on (A, B, C) can serve queries on A, A+B, or A+B+C, but NOT B alone or C alone.

Common Gotchas

  • NOT IN returns zero rows if the subquery contains a NULL. Use NOT EXISTS instead -- it handles NULLs correctly.
  • Applying functions on indexed columns kills index usage. WHERE DATE(created_at) = '2025-01-01' cannot use an index on created_at. Rewrite as a range query.
  • OFFSET pagination gets slower on every page. The database still scans and discards all skipped rows. Use keyset/cursor pagination for large datasets.
  • Forgetting ANALYZE after bulk data loads. Stale statistics cause the query planner to pick bad plans. Run ANALYZE table_name after large inserts.

One-Liner Summary

SQL is the universal language for querying relational data -- master JOINs, window functions, CTEs, and EXPLAIN ANALYZE and you can answer any data question efficiently.

"What I cannot create, I do not understand." — Richard Feynman