Skip to content

Git Internals & Workflows

Git is the most widely used version control system in the world. Every software company, every open-source project, every developer uses it daily. Yet most engineers treat Git as a black box — they memorize a handful of commands (commit, push, pull, merge) and panic when anything goes wrong. A merge conflict becomes a crisis. A detached HEAD becomes a mystery. A rebase gone wrong becomes a "just clone it again" moment.

This happens because most people learn Git's commands without learning Git's data model. And Git's data model is beautiful in its simplicity: it is a content-addressable filesystem with a graph of commits layered on top. Once you understand the data model, every Git command becomes a logical operation on a well-defined data structure, and "recovering" from mistakes becomes trivial.

Why Understanding Git Internals Matters

ScenarioSurface-Level KnowledgeInternal Knowledge
Merge conflictPanic, start overUnderstand why it happened, resolve confidently
Accidentally committed secretsgit revert (still in history)git filter-repo to rewrite history
Need to undo a rebase"Just clone again"git reflog to find the pre-rebase commit
Branch diverged from mainConfused about merge vs. rebaseUnderstand the commit graph and choose deliberately
CI build is slowNo idea whyUnderstand packfiles, shallow clones, sparse checkout
Monorepo gets slow"Git doesn't scale"Understand pack-objects, partial clones, and git maintenance

Git's Data Model: Content-Addressable Storage

At its core, Git is a content-addressable key-value store. Every piece of data — file contents, directory structures, commits, tags — is stored as an object identified by a SHA-1 hash of its contents.

There are four types of Git objects:

  1. Blob — Raw file contents. No filename, no permissions — just bytes. Two files with identical contents share the same blob, regardless of filename or location.

  2. Tree — A directory listing. Maps filenames to blob hashes (for files) or other tree hashes (for subdirectories), plus file permissions.

  3. Commit — A snapshot in time. Points to a tree (the root directory at that moment), one or more parent commits, and metadata (author, committer, timestamp, message).

  4. Tag — A named pointer to a specific object (usually a commit), with optional metadata and a GPG signature.

This content-addressing has profound implications:

  • Deduplication is automatic. If 100 commits include the same file, only one blob is stored.
  • Integrity is guaranteed. If any byte changes, the hash changes, and all parent objects' hashes change too. Tampering is detectable.
  • Branching is cheap. A branch is just a 41-byte file containing a commit hash. Creating 1,000 branches costs ~40 KB of disk space.

How Git Differs from Other VCS

FeatureGitSVN (Subversion)Perforce
ArchitectureDistributed — every clone is a full repoCentralized — single serverCentralized — single server
Branching costO(1) — pointer to a commitO(n) — server-side copyO(1) for streams
Offline workFull capabilityRead-onlyRead-only
History storageSnapshots (full tree per commit)Deltas (changes per commit)Deltas
Data integritySHA-1 hash chainRevision numbersChange numbers
Scale (files)~1M files (struggles beyond)Millions of filesMillions of files
Scale (repo size)~5 GB (struggles beyond without LFS)Hundreds of GBTerabytes

What This Section Covers

Git Internals

The object model in depth — blobs, trees, commits, and tags. How references (branches, HEAD, tags) work. Packfiles and delta compression. The reflog. How merge and rebase work at the object level. After this page, you will be able to recover from any Git mistake.

Branching Strategies

Trunk-based development, GitHub Flow, GitFlow, and release branches. When to use feature branches vs. feature flags. A comparison table to help you choose the right strategy for your team.

Monorepo Management

Why companies like Google, Meta, and Microsoft use monorepos. Tooling comparison (Nx, Turborepo, Bazel, Rush). Task orchestration, caching, affected/changed detection, and the challenges that emerge at scale.

The Git Mental Model

The single most important mental model for Git: a commit is a snapshot, not a diff. Every commit contains a complete picture of your entire repository at that moment (via the tree object). Diffs are computed on the fly by comparing two snapshots.

This means:

  • Checking out a branch is fast — Git just swaps the working directory to match the commit's tree
  • Merging compares two snapshots and a common ancestor — it does not replay changes
  • History is a directed acyclic graph (DAG) of snapshots, not a sequence of patches

Every node in this graph is a full snapshot. The edges represent parent-child relationships. A merge commit (F) has two parents — it records the fact that two lines of development were combined, and its tree is the merged result.

Essential Git Configuration

Before diving into internals, ensure your Git is configured for a professional workflow:

bash
# Identity
git config --global user.name "Your Name"
git config --global user.email "you@company.com"

# Default branch name
git config --global init.defaultBranch main

# Rebase by default on pull (avoid merge commits for remote sync)
git config --global pull.rebase true

# Auto-stash before rebase (avoid "dirty tree" errors)
git config --global rebase.autoStash true

# Sign commits with SSH key (modern, simpler than GPG)
git config --global gpg.format ssh
git config --global user.signingkey ~/.ssh/id_ed25519.pub
git config --global commit.gpgsign true

# Better diff algorithm
git config --global diff.algorithm histogram

# Global gitignore
git config --global core.excludesfile ~/.gitignore_global

# Credential caching
git config --global credential.helper cache --timeout=3600

# Enable rerere (reuse recorded resolution) — remembers how you
# resolved merge conflicts and applies the same resolution automatically
git config --global rerere.enabled true

Quick Command Reference

TaskCommandNotes
See what changedgit status, git diffdiff --staged for staged changes
Undo last commit (keep changes)git reset --soft HEAD~1Changes go back to staging
Undo last commit (discard)git reset --hard HEAD~1Permanent — changes lost
Find a lost commitgit reflogShows all HEAD movements
See commit graphgit log --oneline --graph --allVisual branch topology
Stash changes temporarilygit stash push -m "description"git stash pop to restore
Cherry-pick a commitgit cherry-pick <sha>Copies commit to current branch
Blame a linegit blame <file>Shows who last changed each line
Search history for textgit log -S "search term"Finds commits that added/removed term
Bisect a buggit bisect start, git bisect bad/goodBinary search for the breaking commit
Clean untracked filesgit clean -fd-n for dry run first

Further Reading

"What I cannot create, I do not understand." — Richard Feynman