Git Internals for Engineers

Debugging with Git

A production bug traces to a single suspicious line of code. Nobody on the team remembers writing it. The line looks intentional but feels wrong. The developer who wrote it left the company a year ago. In most environments this is where debugging stalls — "guess we'll experiment." In Git, it takes ninety seconds: git blame -L 42 app.py shows the commit; git show <sha> shows the full change with its message and context; git log --follow app.py shows the history of the file including across renames. The mystery resolves into a coherent story: the line was added for reason X, reviewed in PR Y, and is now out of date because of change Z. The fix comes from understanding, not guessing.

Every production codebase is an archaeological site. Git is your archaeological toolkit. This lesson covers blame, log -p, log -S (the "pickaxe"), and related commands that turn any line of code into a fully-contextualized story. The engineers who master these tools solve bugs in minutes that their teammates spend hours on.


git blame: Who Changed This Line?

git blame src/auth.py
# a1b2c3d4 (Sharon     2026-04-20 10:00:00 +0000   1) def login(user, pass):
# 789abc0d (Alice      2026-01-15 14:22:00 +0000   2)     if not user:
# 789abc0d (Alice      2026-01-15 14:22:00 +0000   3)         raise ValueError("no user")
# a1b2c3d4 (Sharon     2026-04-20 10:00:00 +0000   4)
# a1b2c3d4 (Sharon     2026-04-20 10:00:00 +0000   5)     hash = bcrypt.hashpw(pass)

For each line: the commit that last touched it, the author, the date, and the content. Clicking into the commit SHA gives you the full change with its message.

Blame options

git blame -L 40,60 src/auth.py        # only lines 40-60
git blame -L :login:src/auth.py        # the function containing 'login'
git blame -w src/auth.py                # ignore whitespace-only changes
git blame -C src/auth.py                # detect code copied from other files
git blame -C -C -C src/auth.py          # -C 3x: aggressive detection (slow)
git blame --reverse abc123..HEAD        # in which commit did each line DISAPPEAR?
git blame --ignore-revs-file=.git-blame-ignore-revs src/auth.py
# Skip mass-reformat commits; blame falls through to the real author

The --ignore-revs-file trick (critical for teams)

If you ever run a mass reformatter (prettier, black, gofmt rollout), every blame points at that reformat commit. Useless. Fix: create .git-blame-ignore-revs with the SHAs of those commits:

# .git-blame-ignore-revs
# SHAs of mass-reformat commits to skip when blaming
abc123def456abc123def456abc123def4  # prettier rollout
789abc0def123456789abc0def123456789  # black formatting

Tell Git to use it globally:

git config --global blame.ignoreRevsFile .git-blame-ignore-revs

Now git blame falls through the reformat commit to the real author. GitHub honors this file automatically in its blame UI.

KEY CONCEPT

Add .git-blame-ignore-revs to any repo before or just after you do a mass reformat. It is invisible-impact when there are no reformats, and the difference between "blame says prettier-bot" and "blame says the actual author" when there are. Every team doing routine formatter rollouts should maintain this file.


git log -L: History of Specific Lines or Functions

# History of lines 40-60 in a file
git log -L 40,60:src/auth.py
# Shows every commit that touched those lines, with diffs

# History of an entire function (by matching a regex)
git log -L :login:src/auth.py
# Finds the function called 'login' and shows its full history across time

# Multiple ranges
git log -L 40,60:src/auth.py -L 80,100:src/auth.py

-L is extremely powerful for understanding why a specific piece of code evolved. Much more targeted than git log on the whole file — it shows only changes to the lines/function you care about.

Function history across renames

git log -L :login:src/auth.py --follow
# If src/auth.py was renamed from src/authentication.py, history includes the old filename too

git log -S: The Pickaxe

"Pickaxe" is Git's term for searching commits by content change. It answers: "in which commit did this specific string of text appear (or disappear)?"

# Find the commit that introduced this string
git log -S 'SECRET_KEY' --source --all
# abc123 (feature) commit message
# def456 (main)    commit message

# With regex
git log -G 'SECRET_.*_KEY' --all

# Combine with diff
git log -p -S 'SECRET_KEY' --all
# Shows the actual diffs in commits that added/removed that string

When to use pickaxe

  • "Where did this function first appear?"git log -S 'def my_function'
  • "When was this magic number changed?"git log -S '42' -- src/config.py
  • "When was this import added?"git log -S 'import tensorflow'
  • "Who first wrote this regex?"git log -S 'some_specific_regex'
  • "What commits removed this code?"git log -S 'old_function' --all

Pickaxe searches through every commit in history — fast because Git scans only changed regions. On a repo with millions of commits, it is a few seconds.

# Pickaxe + content + all branches + time range
git log -S 'deprecated_api' --since='6 months ago' --all --oneline
PRO TIP

Pickaxe (-S) is the single most under-used Git debugging tool. It is how you answer "when did this code first exist?" or "who wrote this comment?" — questions that have always been hard in other VCSes. Try it on your repo for the next weird piece of code you encounter. You will reach for it constantly afterward.


git log -p: Commits With Their Diffs

# Every commit that touched this file, with diffs
git log -p -- src/auth.py

# Last 5 commits + diffs
git log -p -5

# Combined with pickaxe
git log -p -S 'debug_mode' --all

# Only commits affecting specific paths
git log -p -- src/auth/ tests/auth/

# With file rename tracking
git log -p --follow -- src/auth.py

git log -p is slow on huge histories but invaluable for "show me the full story of this file."

git log --stat: Summary Instead of Full Diff

git log --stat -5
# 5 most recent commits with file-change summaries

# Combined with author filter
git log --stat --author=Sharon --since='1 week ago'

# Compact
git log --oneline --stat -- src/
# One-line subjects + which files were touched

# Machine-readable
git log --numstat
# insertions deletions path (one line per file)

git log with Search and Filters

# Commits by author (regex)
git log --author='Sharon|Alice'

# Commits with specific word in the message
git log --grep='security'
git log --grep='CVE-' --all

# Exclude commits by author
git log --perl-regexp --author='^((?!BotUser).*)$'

# Date range
git log --since='2 weeks ago' --until='1 week ago'
git log --since='2026-04-01' --until='2026-04-30'

# By file
git log -- path/to/file
git log -- 'src/**/*.py'

# By path AND time AND author combined
git log --author='Sharon' --since='1 month ago' -- src/auth.py

git log --graph: See Branches Visually

git log --graph --oneline --all
# *   c5a6e7f (HEAD -> main) Merge branch 'feature'
# |\
# | * 9b3d2a1 (feature) add feature X
# * | 7e1c4b0 fix typo on main
# |/
# * 4f8a6d2 baseline
# * a1b7e0d initial

# For complex histories, this is the most readable view
git log --graph --all --oneline --decorate --color=always | less -R

# --first-parent: walk only main line, treat each merge as a single step
git log --graph --first-parent --oneline main

Set up an alias for the big graph:

git config --global alias.lg 'log --graph --oneline --all --decorate'

git lg
# Now `git lg` is your go-to history view

Finding Where Something Was Introduced

Three-phase approach:

1. Find the line's current commit

git blame -L <line>,<line>:<file>
# Tells you the commit that last touched it

2. Trace its history

git log -L <line>,<line>:<file>
# Walks back through every commit that changed those lines

3. If renamed, follow across files

git log --follow -L <line>,<line>:<file>
# Handles file renames

Combined, you can usually trace any line back to its origin in a few minutes.


Finding When a Bug Was Introduced

Already covered in detail in Module 5 Lesson 1. Quick recap:

git bisect start
git bisect bad HEAD
git bisect good <old-working-sha>
git bisect run ./test.sh
# ... Git binary-searches to the exact bad commit

Combine with blame/log on the files you suspect:

# I have a suspect file; when did it last change that might explain the bug?
git log -p --since='1 week ago' -- suspect.py | head -100

git whatchanged and git show

git whatchanged -5           # last 5 commits with changed files
git show <sha>               # commit metadata + diff
git show <sha> --stat        # file summary only
git show <sha>:path/to/file  # show file content at that commit

# Show a file at HEAD~5 and redirect to a file
git show HEAD~5:src/auth.py > /tmp/auth_old.py
diff src/auth.py /tmp/auth_old.py

git show <commit>:<path> is the cleanest way to "see what a file looked like at commit X" without checking anything out.


Binary Search With git bisect

For bugs where you cannot pinpoint which file:

git bisect start
git bisect bad HEAD                 # current is broken
git bisect good v1.0.0              # last known-good release

# Option A: manual. Git checks out middle commit. You test.
git bisect bad   # or git bisect good, depending on result
# Git narrows the range and checks out another middle commit. Repeat.

# Option B: automated. Provide a script.
git bisect run ./scripts/test-repro.sh
# Git runs the script at each step; exit 0 = good, nonzero = bad, 125 = skip
# Finishes with "<sha> is the first bad commit"

git bisect reset

For 200 commits, bisect resolves in ~8 test runs. This is unreasonably fast compared to reading commit diffs.


git grep: Search the Current Repo (or Any Commit)

# Search current working tree (like ripgrep/grep, but honors .gitignore)
git grep 'SECRET_KEY'

# In a specific commit
git grep 'SECRET_KEY' HEAD~5

# Across all branches (every HEAD of every ref)
git grep 'SECRET_KEY' $(git rev-list --all)     # slow on big repos
git grep --all 'SECRET_KEY'                     # equivalent for most cases

# Only specific file types
git grep 'SECRET_KEY' -- '*.py' '*.yaml'

# Case-insensitive, with line numbers (default)
git grep -in 'secret'

# Just file names (like grep -l)
git grep -l 'TODO'

git grep is usually faster than grep -r . because it uses Git's internal index and respects .gitignore.


Comparing Versions

# Diff between two commits for a file
git diff abc123 def456 -- path/to/file

# Diff between branches
git diff main feature -- path/to/file

# Diff with renames
git diff --find-renames main feature

# Three-dot vs two-dot
git diff main..feature       # from main to feature
git diff main...feature      # from their common ancestor to feature (useful for PR diffs)

For understanding what a PR actually changes:

git diff main...feature      # what feature added on top of where it branched from main

Finding All Commits in a Time Range Touching a Hot File

git log --since='1 month ago' --pretty='%h %an %s' -- src/checkout.py
# abc123 Sharon     feat: add discount logic
# def456 Alice      fix: null check on discount
# 789abc Bob        refactor: extract discount calculator

One line per commit, author and subject. Perfect for writing incident post-mortems ("who touched this file in the last month and when?").


Finding the Context of a Strange Comment

# I see a weird TODO: "// FIXME remove when v2 ships"
# But we shipped v3 last year. Why is this still here?

# Find when the comment was added
git log -S 'FIXME remove when v2 ships' --all

# Shows the commit. Read its message and diff.
git show <sha>
# Message: "hack for v1 launch; remove once v2 is stable"
# Ah — it was a hack for a deadline; nobody remembered to remove it.

# Blame confirms:
git blame -L :FIXME:src/legacy.py

Ninety seconds. The "weird" code is now a legacy artifact with known origin; you can confidently remove it (or fix forward).


Seeing What a Commit Contains Without Checkout

# Show files a commit changed
git show --stat abc123 --pretty=format:

# List files in the tree
git ls-tree -r abc123

# Get one specific file
git show abc123:path/to/file > /tmp/version.txt

# Full content of a specific file at a commit
git cat-file -p abc123:path/to/file

You never need to git checkout to see historical content.


Useful Aliases

git config --global alias.lg \
  "log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit"

git config --global alias.unstage 'reset HEAD --'
git config --global alias.last 'log -1 HEAD'
git config --global alias.visual '!gitk'
git config --global alias.who '!git log --format=%an | sort | uniq -c | sort -rn'
git config --global alias.churn \
  '!git log --all -M -C --name-only --format=format: | sort | uniq -c | sort -rn'

Good aliases are invisible — they disappear into muscle memory and make daily work faster.


A Realistic Debug Session

Production issue: a specific charge was calculated incorrectly.

# 1. Find the file responsible
grep -rn 'calculate_charge' src/
# src/billing/charge.py:42:def calculate_charge(amount, discount):

# 2. Blame the suspicious function
git blame -L :calculate_charge:src/billing/charge.py
# 42  789abc0 (Alice    2026-03-15)  def calculate_charge(amount, discount):
# 43  789abc0 (Alice    2026-03-15)      return amount * (1 - discount/100)
# 44  a1b2c3d (Bob      2026-04-01)      # handle negative discounts
# 45  a1b2c3d (Bob      2026-04-01)      if discount < 0:
# ...

# 3. What did Bob change on 04-01?
git show a1b2c3d
# Shows the full commit with diff and message
# "fix: handle promotional negative discounts"

# 4. Is there a PR we can reference?
git log --oneline --grep='promotional' --all
# (finds the PR commit, possibly with the PR number in the message)

# 5. Walk the function's full history
git log -L :calculate_charge:src/billing/charge.py
# Every change to that function, with diffs

# 6. Who else worked on this file recently?
git log --since='3 months ago' --pretty='%h %an %s' -- src/billing/charge.py
# Understand the collaboration context

# 7. If I think this is a regression from Bob's commit, bisect
git bisect start
git bisect bad HEAD
git bisect good a1b2c3d~1    # just before Bob's commit
git bisect run ./test_charge.sh
# Confirms: first bad commit is a1b2c3d (Bob's fix introduced a bug)

Total time: 5-10 minutes. A thorough understanding of who, when, why, and what needs fixing — without ever asking anyone.

WAR STORY

An engineer inherited a five-year-old codebase. They spent their first week confused by strange patterns. A senior suggested: "before you change anything, blame everything. Every puzzling line has a commit, a message, a reason. Find the story." Two days later, the engineer was confidently making changes. The code had not changed; their understanding had. Git tools turn legacy code from "mystery" into "document you can read." Every time you spend 30 minutes guessing at why code exists, stop and run blame + log + show. Ten of those thirty minutes solve the mystery.


Key Concepts Summary

  • git blame shows the commit, author, and date for each line. With -L scopes to lines; with --ignore-revs-file skips reformat commits.
  • git log -L follows the history of specific lines or functions across time.
  • git log -S (pickaxe) finds commits that added or removed a specific string. Underrated power tool.
  • git log -p shows commits with their diffs; scoped by path / author / date / message.
  • git log --graph visualizes branching and merging history.
  • git show <sha> shows a commit's metadata + diff; <sha>:<path> extracts file content.
  • git grep searches the current tree or any commit; respects .gitignore, fast.
  • git bisect finds regression commits via binary search.
  • .git-blame-ignore-revs excludes mass-reformat commits from blame output.
  • Combine commands: blame + log + show is a reliable archaeology recipe.

Common Mistakes

  • Using grep -r when git grep is faster and honors .gitignore.
  • Not using .git-blame-ignore-revs after reformats. git blame becomes useless.
  • Forgetting --follow on git log <file> — history stops at the file's current name.
  • Running git blame on the whole file when -L scoping to a range is 100x faster and more focused.
  • Trying to remember the pickaxe syntax. git log -S <string> is the basic form; pin it above your desk.
  • Using git log without --all and wondering why "obvious" commits on other branches are missing.
  • Not setting up git lg alias. Every day you lose 30 seconds typing the full flags.
  • Treating commit messages as disposable. Future-you is the main consumer; invest in them.
  • Assuming git blame shows the original author. It shows the LAST person to touch the line. Use git log -L or pickaxe for full history.
  • Never using git bisect because manual isn't that bad. Manual is exponentially worse. Try automated bisect once.

KNOWLEDGE CHECK

You see a comment in src/legacy.py: `# DO NOT REMOVE, needed for client X`. Client X no longer uses the product. You want to understand when and why this comment was added before deciding whether to remove it. What is the most efficient Git command sequence?