Git — Core Concepts

Git isn't just version control — it's a distributed graph database of your project's entire history. Here's how it actually works, and why it's designed the way it is.

Git: The Tool Everyone Uses, Few Truly Understand

Git was written by Linus Torvalds in 2005 — the same person who created Linux. He built it in about 10 days because he was furious at BitKeeper (the version control tool Linux used at the time) after a licensing dispute. He needed something that could handle 1,000+ contributors making simultaneous changes to the Linux kernel.

The result is now used by virtually every software project on Earth. GitHub alone hosts over 420 million repositories. And yet, most developers interact with maybe 10% of what Git can actually do.

What Git Is Actually Doing

The Three Places Your Code Lives

When you’re working with Git, your code exists in three distinct states:

Working Directory — your files as they actually are on disk right now. You type, you edit, it changes here.

Staging Area (Index) — a holding zone where you assemble changes before committing. This is the part most beginners skip over and then wonder why their commits are messy.

Repository (.git folder) — the permanent historical record. Once something is committed here, it’s essentially immutable.

The staging area is one of Git’s most misunderstood features. It exists so you can make 15 changes to your project but commit them in logical groups — “these three changes fix the login bug, these five changes add the new feature” — rather than one giant blob.

Commits Are Snapshots, Not Diffs

A common misconception: commits store the difference between versions (like a patch file). They don’t.

Each Git commit stores a complete snapshot of every tracked file in your project at that moment. What makes this practical is that Git is smart about storage — if a file hasn’t changed since the last commit, it just stores a reference to the previous copy rather than duplicating the content. Internally, Git uses a content-addressable storage system where every object is named by its own SHA-1 hash (a 40-character fingerprint). If the content is identical, the hash is identical, so no duplicate storage.

This is why Git operations are fast even on massive projects. “What changed between these two commits?” is just comparing two snapshots — no complex patching required.

Branches Are Almost Free

In older version control systems (SVN, CVS), creating a branch meant copying the entire codebase. Could take minutes. Used sparingly.

In Git, a branch is just a pointer — a file containing 41 bytes (a 40-character hash plus a newline) pointing to a specific commit. Creating a branch is essentially instantaneous, regardless of project size.

This changes how developers work. Instead of treating branches as a heavyweight operation, Git teams branch constantly. New feature? Branch. Fixing a bug? Branch. Experimenting with a weird idea? Branch. The main codebase stays stable while work happens in parallel.

When work is done, branches get merged. Git uses several merge strategies — the simple fast-forward for clean histories, the three-way merge for diverged histories, and rebase for rewriting commits as if they happened on top of the current code.

How Teams Use Git

The Pull Request Workflow

The dominant workflow on GitHub and GitLab: nobody pushes directly to the main branch. Instead:

Create a branch
Make your changes, commit them
Open a Pull Request (GitHub’s term) or Merge Request (GitLab’s term)
Teammates review the code, leave comments
Once approved, merge to main

This sounds bureaucratic. In practice, it caught an enormous number of bugs and bad decisions before they could hit production. The code review step alone — just having a second human look at changes — is one of the highest-ROI practices in software development.

Distributed Means Everyone Has Everything

Unlike older centralized systems where a central server held “the real code,” Git is distributed. Every clone of a repository is a complete copy — full history included. This means:

You can work offline with no connection to any server
If GitHub goes down, every developer still has the full history locally
There’s no single point of failure

In practice, teams still use a shared remote (usually GitHub) as the canonical source of truth. But the architecture doesn’t require it.

Common Misconceptions

“Git and GitHub are the same thing.” Git is the tool; GitHub is a website that hosts Git repositories. You can use Git without GitHub, and there are alternatives (GitLab, Bitbucket, self-hosted Gitea).

“Deleted branches mean deleted commits.” Commits don’t disappear when you delete a branch. The branch was just a pointer; the commits remain until Git’s garbage collector eventually cleans up unreferenced objects (usually after 30+ days). You can recover “deleted” work with git reflog.

“Merge conflicts are dangerous.” They’re just Git saying “both sides changed the same lines and I need a human to decide which is correct.” It’s not data corruption — it’s Git asking for help.

One Thing to Remember

Git stores complete snapshots, not differences — and branches are just pointers to commits. Once you understand that, the rest of the tool starts to make sense.

techprogramminggitversion-controldeveloper-tools