Bash & Shell Scripting for Engineers

Exit Codes and Error Propagation

Every command in Unix returns an exit code between 0 and 255. Zero means success; anything else means failure. That is the entire interface Bash uses to decide whether a command worked.

Most Bash error-handling bugs come down to one of: not checking the exit code, checking the wrong one, or losing it inside a pipeline. This lesson is about $?, the conditional chains, and propagating exit codes out of scripts correctly.

KEY CONCEPT

Exit code 0 = success. Non-zero = failure. That is the only convention. Everything Bash builds on top, set -e, if, &&, ||, uses this single number to decide what to do.

`$?`: the exit code of the last command

ls /tmp
echo "ls exit code: $?"

ls /nonexistent
echo "ls exit code: $?"

Output:

ls exit code: 0
ls: cannot access '/nonexistent': No such file or directory
ls exit code: 2

$? is updated after every command. It reflects only the most recent command, so it gets overwritten quickly, typically you either capture it immediately or use it in an if.

# Capture it into a variable before it changes
some_command
rc=$?

# Capture inline with ||
some_command || rc=$?
rc=${rc:-0}

What exit codes actually mean

Standard conventions (not always followed):

Some tools use their own conventions, grep returns 0 for match, 1 for no match, 2 for error. diff returns 0 for identical, 1 for different, >1 for error. Read each tool's manpage; the "non-zero = failure" rule has nuance per tool.

`&&` and `||`: conditional chains

Two operators that use exit codes:

cmd1 && cmd2    # run cmd2 only if cmd1 succeeded (exit 0)
cmd1 || cmd2    # run cmd2 only if cmd1 failed (non-zero)

The guard pattern

# Require success — exit if any step fails
mkdir -p /opt/app && \
cp config.yml /opt/app/ && \
systemctl restart app

The fallback pattern

# Try the primary; fall back on failure
primary_dns_lookup "$host" || backup_dns_lookup "$host"

The assert-or-die pattern

# Fail loudly if a command fails
[[ -d /opt/app ]] || { echo "/opt/app missing" >&2; exit 1; }

The { cmd; } grouping puts several commands behind one ||. Note the semicolon before }, required.

Chains of chains: watch out

check && action_1 || action_2

This looks like "if check then action_1 else action_2", but it isn't. If check succeeds and action_1 fails, action_2 also runs. Safe if action_1 never fails (e.g. echo), dangerous otherwise.

For real branching, use if:

if check; then
  action_1
else
  action_2
fi

WARNING

a && b || c is not a ternary. If b has any chance of failing, use if/else instead.

Exit codes and pipelines

Here's the subtle one. By default, a pipeline's exit code is the exit code of the LAST command:

false | true
echo $?    # 0 — because true succeeded

This is almost never what you want. If anything in the pipeline failed, you probably want to know.

`set -o pipefail`

pipefail changes this: the pipeline's exit code is the last non-zero exit from any command, or 0 if all succeeded:

set -o pipefail
false | true
echo $?    # 1 — false failed, so pipeline fails

This is the flag you want on. Without it, cat nonexistent.log | grep ERROR | wc -l happily returns 0 even though cat failed.

The `PIPESTATUS` array

Even better: PIPESTATUS is an array of each pipeline command's exit code:

false | true | false
echo "${PIPESTATUS[@]}"    # 1 0 1

# Check if cat specifically failed
cat file | grep pattern | sort
if (( PIPESTATUS[0] != 0 )); then
  echo "cat failed"
fi

Useful when you need to distinguish which part of a pipeline failed.

Propagating exit codes from functions

do_work() {
  some_command || return 1
  another_command || return 2

  # default: last command's exit code
}

do_work
rc=$?

Return from a function with a specific code to signal what failed. Callers can then check $? or branch on it.

The "return the failing command's code" idiom

do_work() {
  some_command
  local rc=$?
  if (( rc != 0 )); then
    echo "some_command failed with $rc" >&2
    return $rc
  fi

  another_command
  return $?
}

Or more compactly:

do_work() {
  some_command || return
  another_command
}

Bare return with no argument returns the exit code of the last command. Combined with ||, this chains failures cleanly.

Writing scripts that exit correctly

The script's exit code is what $? is in the parent shell after it finishes. For a top-level script:

#!/usr/bin/env bash
set -euo pipefail

# ... work ...

exit 0    # explicit success

Or let the script fall through, the last command's exit code becomes the script's exit code.

Different exit codes for different failure modes

# Documented conventions at the top
# Exit codes:
#   0 - success
#   1 - generic failure
#   2 - invalid arguments
#   3 - missing dependency
#   4 - configuration error
#   5 - network error

if [[ $# -ne 1 ]]; then
  echo "usage: $0 <hostname>" >&2
  exit 2
fi

command -v jq >/dev/null || { echo "jq required" >&2; exit 3; }

[[ -f config.yml ]] || { echo "config.yml missing" >&2; exit 4; }

# ... etc

When a script is called from another script (or CI pipeline), distinct exit codes let the caller react differently. "Missing dependency" should retry-after-install; "invalid args" should alert a human.

PRO TIP

For scripts in CI/CD or cron, always document and use meaningful exit codes. Alert routing based on exit code is common; "exit 1 for everything" gives you no way to tell "missing config" from "network timeout."

Common exit-code mistakes

Mistake 1: ignoring exit codes silently

# BROKEN — silently continues on failure
download_file
process_file

# With set -e, this is fixed automatically.
# Without, add explicit checks:
download_file || exit 1
process_file || exit 1

Mistake 2: using exit codes for numeric data

# DON'T
count_items() {
  return $#    # breaks at 256
}

# DO
count_items() {
  echo $#
}
n=$(count_items "$@")

Exit codes are 0-255 and wrap. Use stdout for numbers.

Mistake 3: losing exit codes in `local`

From the Functions lesson:

# BROKEN — $? is always 0 (local succeeded)
my_func() {
  local x=$(risky_call)
  if (( $? == 0 )); then ...
}

# CORRECT
my_func() {
  local x
  x=$(risky_call)
  if (( $? == 0 )); then ...
}

Mistake 4: pipelines without pipefail

# Without pipefail, only grep's status matters
if cat nonexistent.log | grep -q ERROR; then
  echo "found errors"
fi
# If cat fails, grep returns 1 (no input, no match), so the branch isn't taken
# — but you never know cat failed.

# With pipefail, the pipeline fails if cat fails.

Mistake 5: `set -e` false sense of security

set -e

# This does NOT exit — the || branch catches the failure
risky_command || echo "oh no"

# And this does NOT exit — commands in if-conditions don't trigger -e
if risky_command; then ...

# And this does NOT exit — subshells have their own -e
(subshell_that_fails) || true

Read man bash for the full list of -e exemptions. Strict mode is a safety net, not a guarantee.

Propagating exit codes across pipes

A common pattern: compute something in a pipeline, preserve the inner exit code.

# Scenario: download → extract → process, and we want to know which failed

set -o pipefail
curl -fsSL "$url" | tar -xz | process_data
rc=$?

case $rc in
  0) echo "success" ;;
  22) echo "curl HTTP error (likely 404)" ;;
  28) echo "curl timeout" ;;
  *)  echo "unknown failure $rc" ;;
esac

With pipefail, rc is the last non-zero exit of any pipeline command. If curl failed (22 = HTTP error), that's what rc captures.

`trap ... ERR` for automatic error context

From the Traps lesson:

set -e
trap 'echo "FAIL: $BASH_COMMAND (exit $?) at line $LINENO" >&2' ERR

curl -fsSL "$url" | tar -xz | process_data
# If anything fails, you get:
# FAIL: curl -fsSL ... (exit 22) at line 5

Combine set -e -o pipefail with an ERR trap and you get precise failure messages automatically.

Checking exit codes explicitly

Sometimes you need to handle a specific non-zero code:

# Example: grep returns 1 for no match, 2 for error
grep -q pattern file
rc=$?
case $rc in
  0) echo "match" ;;
  1) echo "no match" ;;
  *) echo "grep error" ;;
esac

Or simpler with if/elif:

if grep -q pattern file; then
  echo "match"
elif [[ $? -eq 1 ]]; then
  echo "no match"
else
  echo "error"
fi

Careful: $? changes after every command. If you need it for multiple checks, capture it first.

A real-world example: deploy script

#!/usr/bin/env bash
set -euo pipefail
trap 'echo "FAIL: line $LINENO: $BASH_COMMAND (exit $?)" >&2' ERR

# Exit code convention:
# 0  - success
# 10 - git pull failed
# 20 - tests failed
# 30 - build failed
# 40 - deploy failed
# 50 - smoke test failed

git pull --ff-only || exit 10

npm test || exit 20

npm run build || exit 30

rsync -a --delete ./build/ "$DEPLOY_HOST:/var/www/app/" || exit 40

curl -fsSL "https://$DEPLOY_HOST/health" || exit 50

echo "deploy ok"
exit 0

CI can key off the exit code to know which stage failed. Alerts can route differently for "tests failed" vs "deploy host unreachable."

Quiz

KNOWLEDGE CHECK

You run: set -e -o pipefail; cat missing_file | grep ERROR. The file does not exist. What happens?

What to take away

Exit codes: 0 = success, non-zero = failure. Bash builds everything on this.
$? is the last command's exit code, changes after every command. Capture it immediately.
&& / || are short-circuit operators keyed to exit codes. They are not full if/else substitutes.
cmd1 && cmd2 || cmd3 is NOT a ternary. Use if/else.
Pipeline default: exit code is only the LAST command's. Use set -o pipefail to change this.
PIPESTATUS array holds every pipeline command's exit code individually.
Use distinct exit codes for distinct failure modes (invalid args, missing deps, network errors). Document them.
Combine set -euo pipefail with trap ... ERR for automatic failure diagnostics.
Never return numeric data via return, use stdout.

Next module: production-ready scripts: argument parsing, handling weird filenames, and when Bash is the wrong tool.

Traps and Cleanup

Continue

Handling Inputs Safely

←→ navigateM toggle sidebar