What a Commit Actually Contains
Someone on the team asks: "Why does rebasing change commit SHAs? I didn't change the code, I just rebased." The senior answers "because the parent is different." The junior blinks. "But I didn't change the parent either, I just rebased onto a different branch." And that is exactly the point: rebasing is changing the parent. A commit's identity — its SHA — is a hash of its contents, and its contents include the parent pointer. Change the parent, change the hash, change the commit identity.
Every weird Git behavior that starts with "but I only..." is explained by examining what a commit actually is, byte by byte. This lesson opens commits with
git cat-fileand shows you the six fields that make a commit's identity, why they matter, and why every rewrite operation (amend, rebase, filter-repo) produces new SHAs even for "unchanged" commits.
A Commit Is Plain Text
Every commit object in .git/objects/ is a short text file (compressed with zlib). Here is one:
git log -1 --format=%H
# a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
git cat-file -p a1b2c3d4
# tree 84f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9
# parent 789abc0def1234567890abcdef1234567890abcd
# author Sharon Sahadevan <sharon@example.com> 1713600000 +0000
# committer Sharon Sahadevan <sharon@example.com> 1713600000 +0000
#
# feat: add login endpoint
#
# Implements the POST /login route with bcrypt password verification.
# Returns a JWT on success; 401 otherwise.
Six fields, separated by newlines:
tree <sha>— the tree (snapshot of files) for this commit.parent <sha>— zero, one, or many parent commits.author <name> <email> <timestamp> <timezone>— who wrote the changes, and when.committer <name> <email> <timestamp> <timezone>— who applied/recorded the commit. Often the same as author.- Blank line.
- Commit message — free text, arbitrarily long.
And optionally:
gpgsig ...— a PGP signature (if the commit is signed, see Module 6).
The SHA of the commit is SHA1 of the entire byte sequence (with commit <size>\0 prefix). Change one character of any field → new SHA → new commit identity.
A commit's identity = SHA1 of its contents. Contents include parent SHAs, tree SHA, author info, timestamps, and message. Changing ANY of these — including rebasing (new parent) or amending the message — produces a new commit with a new SHA. The old commit is not deleted; it just becomes unreferenced.
The Six Fields in Detail
Tree
tree 84f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9
The root tree — a complete snapshot of your project at the time of the commit. Navigate it:
git cat-file -p 84f1a2b3
# 100644 blob 3b18e512... README.md
# 040000 tree 7a1c4d5e... src
# 100644 blob 2f1ead7c... package.json
git cat-file -p 7a1c4d5e # the src/ subtree
# 100644 blob ef5a6b7c... main.py
# 100644 blob dc8e9f0a... util.py
Two commits with exactly the same files have the same tree SHA. If you commit a "no-op" change that gets reverted to the original state, the resulting commit has the same tree SHA as the pre-change commit (but a different commit SHA because the parent is different).
Parent
parent 789abc0def1234567890abcdef1234567890abcd
Zero, one, or multiple parents:
- Zero parents — the first (root) commit.
- One parent — a normal commit.
- Two parents — a merge commit. The first parent is "the branch we were on"; the second is "the branch we merged in."
- Three+ parents — an octopus merge (multiple branches merged at once). Rare but valid.
# A merge commit
git cat-file -p <merge-commit-sha>
# tree ...
# parent abc123... ← first parent (what main was)
# parent def456... ← second parent (the branch we merged)
# author ...
# committer ...
#
# Merge branch 'feature' into main
The first-parent chain is how git log --first-parent walks only the "main development line" — it follows only the first parent at every merge.
Author vs Committer
author Sharon <sharon@example.com> 1713600000 +0000
committer Sharon <sharon@example.com> 1713600000 +0000
Usually the same. They diverge when:
- Someone rebases your commits (committer becomes the rebaser; author stays you).
- Someone amends your commit on your behalf (same).
- You use a patch-email workflow (Linux kernel style): you mail the patch; the maintainer applies it. Author is you, committer is them.
# Showing both
git log --format='%h %an / %cn %s' # abbrev hash, author name / committer name, subject
# a1b2c3d Sharon / Alice fix: off-by-one in pagination
# (Sharon wrote it; Alice rebased it onto main)
Timestamps: the Unix epoch + timezone offset. Git respects both — that is why log output shows "2 days ago" correctly across time zones.
Message
feat: add login endpoint
Implements the POST /login route with bcrypt password verification.
Returns a JWT on success; 401 otherwise.
Free text. Convention (not enforced by Git):
- First line = subject (~50 chars, imperative mood: "add", not "added").
- Blank line after the subject.
- Body = arbitrary paragraphs with wrapping ~72 chars.
Tools that display commit history rely on this — git log --oneline shows just the subject, git log shows the full text. Conventional commits (feat:, fix:, chore: prefixes) build on this for automated changelog generation.
Investment in commit message hygiene pays off years later. "Fix bug" as a subject tells your future self nothing. "fix(auth): reject expired JWT tokens with 401 instead of 500" tells them everything they need. Git log + good messages = living documentation for your codebase.
What Is NOT in a Commit
A commit does not contain:
- Diffs — diffs are computed by comparing trees of two commits. The commit itself stores a snapshot, not a change.
- File history — there is no "this file was renamed from X" flag. Git infers renames from blob SHA matches.
- Branch name — a commit does not know which branch it is on. Branches are just pointers to commits, not labels on commits.
- Directory entries as separate fields — all directory info is inside the tree object, not in the commit.
- Authorship proof — without a signature, the author/committer fields are whatever you set them to. Signing (Module 6) is the tamper-evident layer on top.
The author field is trivially forgeable. git config user.name "Linus Torvalds" + git config user.email "torvalds@linux-foundation.org" will happily tag your next commit as Linus. The commit is not authentic until it is signed and the signature is verified against a known key. Treat unsigned commits as "this is what someone wrote in this field"; only verified-signed commits are forensically meaningful.
Watching the SHA Change
A tiny experiment:
mkdir /tmp/shatest && cd /tmp/shatest
git init
echo "hi" > a.txt
git add a.txt
# Make a commit, note the SHA
git commit -m "first" --date=2026-01-01T00:00:00Z
git log -1 --format=%H
# a1b2c3d... (your SHA will differ)
# Amend with same message, same content — no source changes
git commit --amend --date=2026-01-01T00:00:00Z --no-edit
git log -1 --format=%H
# DIFFERENT SHA
What changed? The committer timestamp — which is always "now." Same content, same message, same author — different committer timestamp → different commit SHA.
Now the interesting case: truly identical commit:
# Freeze the committer timestamp too
GIT_AUTHOR_DATE='2026-01-01T00:00:00Z' \
GIT_COMMITTER_DATE='2026-01-01T00:00:00Z' \
git commit --amend --no-edit
git log -1 --format=%H
# SAME SHA as the original
With all fields identical → same bytes → same SHA. This is why reproducible builds of Git content are possible (and used, e.g., by Reproducible Builds for Debian).
Commit IDs in Practice
Short SHAs
Git shows 7-character abbreviations by default:
git log --oneline -3
# a1b2c3d feat: add login
# 789abc0 chore: bump deps
# d4e5f6a fix: null pointer in parser
7 chars of SHA-1 = 28 bits = 268 million possible values. For repos with ~100k commits, the odds of collision at 7 chars are low but not zero — Git auto-extends when collisions would occur.
For anything you will reference in durable documentation (changelog, security advisory, PR description), use the full 40-char SHA. Short SHAs are for conversation.
git rev-parse --short a1b2c3d # expand short to full
# a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
git rev-parse --short=12 HEAD # a longer short form
# a1b2c3d4e5f6
Referencing commits
You can refer to a commit by:
- Full or short SHA.
- Branch name (tip commit).
- Tag name.
- Relative ref (
HEAD~3,main^). - Reflog entry (
HEAD@{2},main@{yesterday}). - Search (
:/feat add loginmatches the first commit whose subject contains the text).
All of these resolve to the same underlying commit object. The name is your convenience; the SHA is the identity.
git show — The Commit Inspector
git show a1b2c3d
# commit a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
# Author: Sharon Sahadevan <sharon@example.com>
# Date: Sat Apr 20 10:00:00 2026 +0000
#
# feat: add login endpoint
#
# Implements the POST /login route with bcrypt password verification.
#
# diff --git a/src/auth.py b/src/auth.py
# index e5f6a7b8..c9d0e1f2 100644
# --- a/src/auth.py
# +++ b/src/auth.py
# @@ -1,5 +1,12 @@
# ... (the diff) ...
git show = commit metadata + diff against the parent. Variants:
git show HEAD # current commit
git show HEAD~3 # 3 commits back
git show HEAD:src/auth.py # the file content at HEAD
git show a1b2c3d:src/auth.py # file content at specific commit
git show --stat HEAD # file-change summary, no diff
git show --no-patch HEAD # metadata only
git show HEAD:path/to/file is especially useful: it prints what that file looked like at that commit, regardless of what your working directory currently has.
Commit Metadata Quirks
The % format codes
git log --format='%h %an %ad %s'
# | | | |
# | | | +- subject
# | | +----- author date
# | +--------- author name
# +------------- abbreviated SHA
# Comprehensive table:
# %H = full SHA %h = abbrev SHA
# %P = full parent SHAs %p = abbrev parent SHAs
# %an = author name %ae = author email %ad = author date
# %cn = committer name %ce = committer email %cd = committer date
# %s = subject %b = body %B = subject + body
# %GS = signer (for signed commits)
Custom formats are how you script git log output for reports, changelogs, or integrations.
# One-line-per-commit with most of what you need
git log --format='%h %ad %an | %s' --date=short
# a1b2c3d 2026-04-20 Sharon | feat: add login
# 789abc0 2026-04-19 Sharon | chore: bump deps
Commit count
git rev-list --count HEAD
# 142
git rev-list --count main..feature # how far feature is ahead of main
# 5
git shortlog -sn # commits per author
# 87 Sharon Sahadevan
# 42 Alice
# 13 Bob
Commits and Trees: The Dedup Again
Since a commit's SHA depends on its tree SHA (among other things), and a tree's SHA depends on its blob SHAs: if a commit "changes nothing," it has the same tree SHA as its parent, but a different commit SHA.
Why? The commit object still differs from its parent — it has a parent pointer, potentially a different message, definitely a different committer timestamp. Only the tree happens to be identical.
# An empty commit
git commit --allow-empty -m "no op"
git log -1 --format='tree: %T / commit: %H'
# tree: 84f1a2b3... ← same as parent
# commit: new sha ← different from parent
This is sometimes useful — deploy triggers, retry markers, CI re-runs — though purists dislike empty commits. If you use them, comment the reason in the message.
Key Concepts Summary
- A commit is six fields of plain text plus (optional) a GPG signature.
- Fields: tree SHA, parent SHAs, author, committer, blank line, message.
- Commit SHA = SHA1 of the full commit bytes. Change any field → new SHA.
- Rebase changes parents → new SHAs for every rebased commit.
- Amend changes committer timestamp → new SHA even with "no changes."
- Merge commits have 2+ parents. First parent is "where we were"; others are "what we brought in."
- Author ≠ committer when patches are applied by someone else (rebase, patch-email, PR merge).
- Nothing in a commit says which branch it was made on. Branches are refs at commits, not labels.
git cat-file -p <sha>shows any commit's raw content.git show <ref>adds the diff against the parent.- Short SHAs are conversational; full SHAs are canonical.
- Identical bytes → identical SHA. Control all six fields to produce deterministic commits.
Common Mistakes
- Assuming two "identical" commits on different branches have the same SHA. Parent differs → SHA differs.
- Thinking
git rebasepreserves SHAs. Every rebased commit has a new SHA by definition. - Using 7-char short SHAs in changelogs or documentation that outlive the repo's current size. They can collide; use full SHAs for durable references.
- Writing bad commit messages ("fix", "wip") that future-you cannot parse. Subject + body is a gift to yourself and teammates.
- Forgetting that
git commit --amendcreates a new commit. If you amended a pushed commit, your local now diverges from the remote and force-push is required (with team coordination). - Trusting the
authorfield on unsigned commits. Easily forged. Sign commits (Module 6) for real authenticity. - Looking at
git logand thinking the chronology is strict. Commits have two timestamps (author + committer) and branches merge out of order;--topo-orderor--reversemay be what you want. - Treating empty commits as a workflow tool without reason. They pollute history; use them sparingly, document when.
- Confusing merge-commit parents: first parent ≠ "the merged branch." It is the branch you were on when you merged.
- Using
git log HEAD..featureand being surprised by extra output when HEAD is detached. Always know what HEAD is pointing at before running log ranges.
You have a clean commit with an excellent message, authored 3 days ago. Someone on your team rebases your branch onto main and pushes. Looking at the rebased commit, you notice the author date and name are still yours, but the committer name is different and the SHA changed. What happened and is your authorship preserved?