Git — Deep Dive

Git's internal object model, the DAG that powers branching, why rebase rewrites history and when that's dangerous, and the parts of Git that will trip up even experienced developers.

Under the Hood: Git’s Object Database

Everything in Git lives in .git/objects/. When you run git init, that directory starts empty. After your first commit, it contains the complete representation of your project — and it’s built from only four object types.

The Four Object Types

Blob — raw file content. Nothing else. No filename, no timestamp, no permissions. Just bytes. If two files in your project have identical content, they share one blob object.

Tree — a directory listing. Contains references to blobs (files) and other trees (subdirectories), along with filenames and permissions. A tree is how Git knows “this blob named README.md lives at the root.”

Commit — metadata: author, timestamp, commit message, and a pointer to a tree (the root of your project at that moment). Also has zero or more parent commit hashes, which is how history is represented.

Tag — an annotated tag. A named pointer to a commit with extra metadata. Lightweight tags are just refs, not objects.

Every object is named by its SHA-1 hash. Run this and you’ll see:

$ echo 'hello' | git hash-object --stdin
f572d396fae9206628714fb2ce00f72e94f2258f

That hash is deterministic — the same content produces the same hash, everywhere, forever. Two Git repos on opposite ends of the world with the same content will have identical object hashes. This content-addressability is what makes distributed Git work: when you push commits, Git on the server can verify integrity by recomputing hashes.

Walking the Object Graph

# See what's inside a commit
$ git cat-file -p HEAD
tree 8f4e2c1d...
parent 3a7b9f2e...
author Jane Doe <jane@example.com> 1711526400 +0000
committer Jane Doe <jane@example.com> 1711526400 +0000

Add login feature

# See what's inside that tree
$ git cat-file -p 8f4e2c1d
100644 blob a8c3f1... README.md
040000 tree 2d9e8b... src
100644 blob 7f2a4c... package.json

The history of your project is a directed acyclic graph (DAG) of commit objects. Each commit points to its parents (usually one, two for merges, zero for the initial commit). Branches and HEAD are just files containing a hash — a moveable pointer into this graph.

$ cat .git/HEAD
ref: refs/heads/main

$ cat .git/refs/heads/main
a9f3e2c1d8b4...

That’s it. A branch is a 41-byte file.

Merge vs Rebase: What Actually Happens

Three-Way Merge

When two branches have diverged, Git finds the common ancestor commit (the “merge base”) and performs a three-way merge:

      A---B---C  feature
     /
D---E---F---G    main

Git looks at the state at E (merge base), applies changes from E→C and E→G, and combines them. If the same lines changed in both branches, you get a conflict. The result is a new merge commit H with two parents (C and G).

The merge commit preserves the actual history — when branches diverged, when they rejoined. This is valuable for auditing but can make the log noisy.

Rebase

Rebase takes the commits on your branch and replays them on top of another branch:

git checkout feature
git rebase main

This doesn’t move commits — it creates new commits with the same changes but different parent hashes (and therefore different SHA-1s). The original commits are orphaned and eventually garbage-collected.

Before:               After rebase:
A---B---C  feature    A'--B'--C'  feature (new commits)
     \                           /
D---E---F---G  main  D---E---F---G  main

The result: a cleaner linear history. git log reads like a story without merge commits cluttering it up.

The golden rule of rebase: never rebase commits that others have pulled.

When you rebase, you rewrite history. If a teammate pulled your feature branch before you rebased it, they have the original commits. After your rebase, their copy and your copy have diverged — same changes, completely different commit hashes. Reconciling this is painful. It’s technically recoverable, but you’ll make enemies.

Interactive Rebase: The History Surgeon’s Tool

git rebase -i HEAD~5

This opens an editor showing your last 5 commits. You can:

pick — keep as-is
squash — merge into previous commit
fixup — squash, but discard the commit message
reword — edit the commit message
drop — delete the commit entirely
edit — pause rebase here so you can amend the commit

This is how professional developers clean up their work before opening a pull request. You might make 20 small “fix typo,” “oops,” “try again” commits while working — interactive rebase lets you compress that into 3 logical commits before anyone else sees the mess.

Internals That Most Developers Never Touch

The Reflog: Your Safety Net

Every time HEAD moves — commit, checkout, merge, rebase — Git appends an entry to .git/logs/HEAD. This is the reflog, and it’s separate from the commit history.

$ git reflog
a9f3e2c HEAD@{0}: rebase finished: returning to refs/heads/feature
7b3d1f8 HEAD@{1}: rebase: add user authentication
3c2a9e4 HEAD@{2}: rebase: start rebasing
f1e8b27 HEAD@{3}: checkout: moving from main to feature

Did you just accidentally git reset --hard and lose commits? The reflog has them. Find the hash before the reset and git checkout to that hash. You’ve “lost” nothing — as long as it was committed.

The reflog is local only. It doesn’t push. And it expires (30 days by default for unreachable objects, 90 days for reachable ones). But in the short term, it’s almost impossibly hard to permanently destroy committed work.

Pack Files and Garbage Collection

Initially, Git stores every object as a loose file: one file per object in .git/objects/. This is fine for small repos. For large ones, it’s slow and wasteful.

Periodically (or when you run git gc), Git compresses loose objects into pack files — .git/objects/pack/*.pack. Pack files use delta compression: instead of storing each version of a large file from scratch, they store one full copy and deltas for the rest. This is why even repos with decades of history stay manageable in size.

The Linux kernel’s .git folder is about 3.5GB. Without pack files, it would be orders of magnitude larger.

Shallow Clones

git clone --depth=1 https://github.com/torvalds/linux

A shallow clone fetches only the most recent N commits, not the full history. The clone is a fraction of the size and downloads much faster. GitHub Actions uses this by default — fetching full history for a CI run is usually wasteful.

The tradeoff: you lose the ability to git log into the past, compare old commits, or do operations that need ancestor information. Git marks the oldest fetched commit with a special “grafts” mechanism to prevent tools from trying to traverse further.

# Deepen a shallow clone after the fact
git fetch --unshallow

Edge Cases That Will Bite You

Submodules vs subtrees — both let you embed one repo inside another. Submodules are a pointer (the parent repo stores a commit hash, not the content); the embedded repo is a separate checkout. Subtrees actually merge the foreign repo’s history into yours. Submodules are more common; they’re also famously finicky because forgetting to git submodule update after pulling is extremely easy.

CRLF line endings — Windows uses \r\n, Unix uses \n. Git can auto-convert on checkout/commit via core.autocrlf, but teams with mixed OS developers regularly run into commits that change every line of a file due to line ending changes. .gitattributes is the proper fix: it forces consistent line ending behavior regardless of each developer’s OS settings.

Binary files and LFS — Git handles binary files (images, models, executables) badly. Storing a 100MB video file in a repo means every clone downloads that 100MB forever. Git Large File Storage (LFS) replaces binary files in the repo with small pointer files, storing the actual content on a separate server. GitHub provides 1GB of LFS storage free; after that it’s $5/month per 50GB.

Worktrees — a lesser-known feature: git worktree add ../hotfix-branch hotfix. Creates a second working directory attached to the same .git folder, checked out to a different branch. You can have the main branch running in one window and test a hotfix in another, without stashing or committing half-done work. Introduced in Git 2.5 (2015), still rarely used.

One Thing to Remember

Git’s entire history is a graph of immutable content-addressed objects. Nothing gets deleted immediately — branches are just pointers, rebase creates new commits (old ones linger), and the reflog tracks every HEAD movement. If you committed it, you can almost certainly get it back.

techprogramminggitversion-controldeveloper-toolsinternals