Exit Codes and Error Propagation
Every command in Unix returns an exit code between 0 and 255. Zero means success; anything else means failure. That is the entire interface Bash uses to decide whether a command worked.
Most Bash error-handling bugs come down to one of: not checking the exit code, checking the wrong one, or losing it inside a pipeline. This lesson is about $?, the conditional chains, and propagating exit codes out of scripts correctly.
Exit code 0 = success. Non-zero = failure. That is the only convention. Everything Bash builds on top — set -e, if, &&, || — uses this single number to decide what to do.
$? — the exit code of the last command
ls /tmp
echo "ls exit code: $?"
ls /nonexistent
echo "ls exit code: $?"
Output:
ls exit code: 0
ls: cannot access '/nonexistent': No such file or directory
ls exit code: 2
$? is updated after every command. It reflects only the most recent command, so it gets overwritten quickly — typically you either capture it immediately or use it in an if.
# Capture it into a variable before it changes
some_command
rc=$?
# Capture inline with ||
some_command || rc=$?
rc=${rc:-0}
What exit codes actually mean
Standard conventions (not always followed):
Some tools use their own conventions — grep returns 0 for match, 1 for no match, 2 for error. diff returns 0 for identical, 1 for different, >1 for error. Read each tool's manpage; the "non-zero = failure" rule has nuance per tool.
&& and || — conditional chains
Two operators that use exit codes:
cmd1 && cmd2 # run cmd2 only if cmd1 succeeded (exit 0)
cmd1 || cmd2 # run cmd2 only if cmd1 failed (non-zero)
The guard pattern
# Require success — exit if any step fails
mkdir -p /opt/app && \
cp config.yml /opt/app/ && \
systemctl restart app
The fallback pattern
# Try the primary; fall back on failure
primary_dns_lookup "$host" || backup_dns_lookup "$host"
The assert-or-die pattern
# Fail loudly if a command fails
[[ -d /opt/app ]] || { echo "/opt/app missing" >&2; exit 1; }
The { cmd; } grouping puts several commands behind one ||. Note the semicolon before } — required.
Chains of chains — watch out
check && action_1 || action_2
This looks like "if check then action_1 else action_2" — but it isn't. If check succeeds and action_1 fails, action_2 also runs. Safe if action_1 never fails (e.g. echo), dangerous otherwise.
For real branching, use if:
if check; then
action_1
else
action_2
fi
a && b || c is not a ternary. If b has any chance of failing, use if/else instead.
Exit codes and pipelines
Here's the subtle one. By default, a pipeline's exit code is the exit code of the LAST command:
false | true
echo $? # 0 — because true succeeded
This is almost never what you want. If anything in the pipeline failed, you probably want to know.
set -o pipefail
pipefail changes this: the pipeline's exit code is the last non-zero exit from any command, or 0 if all succeeded:
set -o pipefail
false | true
echo $? # 1 — false failed, so pipeline fails
This is the flag you want on. Without it, cat nonexistent.log | grep ERROR | wc -l happily returns 0 even though cat failed.
The PIPESTATUS array
Even better: PIPESTATUS is an array of each pipeline command's exit code:
false | true | false
echo "${PIPESTATUS[@]}" # 1 0 1
# Check if cat specifically failed
cat file | grep pattern | sort
if (( PIPESTATUS[0] != 0 )); then
echo "cat failed"
fi
Useful when you need to distinguish which part of a pipeline failed.
Propagating exit codes from functions
do_work() {
some_command || return 1
another_command || return 2
# default: last command's exit code
}
do_work
rc=$?
Return from a function with a specific code to signal what failed. Callers can then check $? or branch on it.
The "return the failing command's code" idiom
do_work() {
some_command
local rc=$?
if (( rc != 0 )); then
echo "some_command failed with $rc" >&2
return $rc
fi
another_command
return $?
}
Or more compactly:
do_work() {
some_command || return
another_command
}
Bare return with no argument returns the exit code of the last command. Combined with ||, this chains failures cleanly.
Writing scripts that exit correctly
The script's exit code is what $? is in the parent shell after it finishes. For a top-level script:
#!/usr/bin/env bash
set -euo pipefail
# ... work ...
exit 0 # explicit success
Or let the script fall through — the last command's exit code becomes the script's exit code.
Different exit codes for different failure modes
# Documented conventions at the top
# Exit codes:
# 0 - success
# 1 - generic failure
# 2 - invalid arguments
# 3 - missing dependency
# 4 - configuration error
# 5 - network error
if [[ $# -ne 1 ]]; then
echo "usage: $0 <hostname>" >&2
exit 2
fi
command -v jq >/dev/null || { echo "jq required" >&2; exit 3; }
[[ -f config.yml ]] || { echo "config.yml missing" >&2; exit 4; }
# ... etc
When a script is called from another script (or CI pipeline), distinct exit codes let the caller react differently. "Missing dependency" should retry-after-install; "invalid args" should alert a human.
For scripts in CI/CD or cron, always document and use meaningful exit codes. Alert routing based on exit code is common; "exit 1 for everything" gives you no way to tell "missing config" from "network timeout."
Common exit-code mistakes
Mistake 1: ignoring exit codes silently
# BROKEN — silently continues on failure
download_file
process_file
# With set -e, this is fixed automatically.
# Without, add explicit checks:
download_file || exit 1
process_file || exit 1
Mistake 2: using exit codes for numeric data
# DON'T
count_items() {
return $# # breaks at 256
}
# DO
count_items() {
echo $#
}
n=$(count_items "$@")
Exit codes are 0-255 and wrap. Use stdout for numbers.
Mistake 3: losing exit codes in local
From the Functions lesson:
# BROKEN — $? is always 0 (local succeeded)
my_func() {
local x=$(risky_call)
if (( $? == 0 )); then ...
}
# CORRECT
my_func() {
local x
x=$(risky_call)
if (( $? == 0 )); then ...
}
Mistake 4: pipelines without pipefail
# Without pipefail, only grep's status matters
if cat nonexistent.log | grep -q ERROR; then
echo "found errors"
fi
# If cat fails, grep returns 1 (no input, no match), so the branch isn't taken
# — but you never know cat failed.
# With pipefail, the pipeline fails if cat fails.
Mistake 5: set -e false sense of security
set -e
# This does NOT exit — the || branch catches the failure
risky_command || echo "oh no"
# And this does NOT exit — commands in if-conditions don't trigger -e
if risky_command; then ...
# And this does NOT exit — subshells have their own -e
(subshell_that_fails) || true
Read man bash for the full list of -e exemptions. Strict mode is a safety net, not a guarantee.
Propagating exit codes across pipes
A common pattern: compute something in a pipeline, preserve the inner exit code.
# Scenario: download → extract → process, and we want to know which failed
set -o pipefail
curl -fsSL "$url" | tar -xz | process_data
rc=$?
case $rc in
0) echo "success" ;;
22) echo "curl HTTP error (likely 404)" ;;
28) echo "curl timeout" ;;
*) echo "unknown failure $rc" ;;
esac
With pipefail, rc is the last non-zero exit of any pipeline command. If curl failed (22 = HTTP error), that's what rc captures.
trap ... ERR for automatic error context
From the Traps lesson:
set -e
trap 'echo "FAIL: $BASH_COMMAND (exit $?) at line $LINENO" >&2' ERR
curl -fsSL "$url" | tar -xz | process_data
# If anything fails, you get:
# FAIL: curl -fsSL ... (exit 22) at line 5
Combine set -e -o pipefail with an ERR trap and you get precise failure messages automatically.
Checking exit codes explicitly
Sometimes you need to handle a specific non-zero code:
# Example: grep returns 1 for no match, 2 for error
grep -q pattern file
rc=$?
case $rc in
0) echo "match" ;;
1) echo "no match" ;;
*) echo "grep error" ;;
esac
Or simpler with if/elif:
if grep -q pattern file; then
echo "match"
elif [[ $? -eq 1 ]]; then
echo "no match"
else
echo "error"
fi
Careful: $? changes after every command. If you need it for multiple checks, capture it first.
A real-world example: deploy script
#!/usr/bin/env bash
set -euo pipefail
trap 'echo "FAIL: line $LINENO: $BASH_COMMAND (exit $?)" >&2' ERR
# Exit code convention:
# 0 - success
# 10 - git pull failed
# 20 - tests failed
# 30 - build failed
# 40 - deploy failed
# 50 - smoke test failed
git pull --ff-only || exit 10
npm test || exit 20
npm run build || exit 30
rsync -a --delete ./build/ "$DEPLOY_HOST:/var/www/app/" || exit 40
curl -fsSL "https://$DEPLOY_HOST/health" || exit 50
echo "deploy ok"
exit 0
CI can key off the exit code to know which stage failed. Alerts can route differently for "tests failed" vs "deploy host unreachable."
Quiz
You run: set -e -o pipefail; cat missing_file | grep ERROR. The file does not exist. What happens?
What to take away
- Exit codes: 0 = success, non-zero = failure. Bash builds everything on this.
$?is the last command's exit code — changes after every command. Capture it immediately.&&/||are short-circuit operators keyed to exit codes. They are not full if/else substitutes.cmd1 && cmd2 || cmd3is NOT a ternary. Useif/else.- Pipeline default: exit code is only the LAST command's. Use
set -o pipefailto change this. PIPESTATUSarray holds every pipeline command's exit code individually.- Use distinct exit codes for distinct failure modes (invalid args, missing deps, network errors). Document them.
- Combine
set -euo pipefailwithtrap ... ERRfor automatic failure diagnostics. - Never return numeric data via
return— use stdout.
Next module: production-ready scripts — argument parsing, handling weird filenames, and when Bash is the wrong tool.