Bash & Shell Scripting for Engineers

Exit Codes and Error Propagation

Every command in Unix returns an exit code between 0 and 255. Zero means success; anything else means failure. That is the entire interface Bash uses to decide whether a command worked.

Most Bash error-handling bugs come down to one of: not checking the exit code, checking the wrong one, or losing it inside a pipeline. This lesson is about $?, the conditional chains, and propagating exit codes out of scripts correctly.

KEY CONCEPT

Exit code 0 = success. Non-zero = failure. That is the only convention. Everything Bash builds on top — set -e, if, &&, || — uses this single number to decide what to do.


$? — the exit code of the last command

ls /tmp
echo "ls exit code: $?"

ls /nonexistent
echo "ls exit code: $?"

Output:

ls exit code: 0
ls: cannot access '/nonexistent': No such file or directory
ls exit code: 2

$? is updated after every command. It reflects only the most recent command, so it gets overwritten quickly — typically you either capture it immediately or use it in an if.

# Capture it into a variable before it changes
some_command
rc=$?

# Capture inline with ||
some_command || rc=$?
rc=${rc:-0}

What exit codes actually mean

Standard conventions (not always followed):

0success1generic failure (most common)2misuse / syntax error (e.g. bad options)126command found but not executable127command not found128+Nterminated by signal N (e.g. 130 = killed by Ctrl-C / SIGINT which is 2)

Some tools use their own conventions — grep returns 0 for match, 1 for no match, 2 for error. diff returns 0 for identical, 1 for different, >1 for error. Read each tool's manpage; the "non-zero = failure" rule has nuance per tool.


&& and || — conditional chains

Two operators that use exit codes:

cmd1 && cmd2    # run cmd2 only if cmd1 succeeded (exit 0)
cmd1 || cmd2    # run cmd2 only if cmd1 failed (non-zero)

The guard pattern

# Require success — exit if any step fails
mkdir -p /opt/app && \
cp config.yml /opt/app/ && \
systemctl restart app

The fallback pattern

# Try the primary; fall back on failure
primary_dns_lookup "$host" || backup_dns_lookup "$host"

The assert-or-die pattern

# Fail loudly if a command fails
[[ -d /opt/app ]] || { echo "/opt/app missing" >&2; exit 1; }

The { cmd; } grouping puts several commands behind one ||. Note the semicolon before } — required.

Chains of chains — watch out

check && action_1 || action_2

This looks like "if check then action_1 else action_2" — but it isn't. If check succeeds and action_1 fails, action_2 also runs. Safe if action_1 never fails (e.g. echo), dangerous otherwise.

For real branching, use if:

if check; then
  action_1
else
  action_2
fi
WARNING

a && b || c is not a ternary. If b has any chance of failing, use if/else instead.


Exit codes and pipelines

Here's the subtle one. By default, a pipeline's exit code is the exit code of the LAST command:

false | true
echo $?    # 0 — because true succeeded

This is almost never what you want. If anything in the pipeline failed, you probably want to know.

set -o pipefail

pipefail changes this: the pipeline's exit code is the last non-zero exit from any command, or 0 if all succeeded:

set -o pipefail
false | true
echo $?    # 1 — false failed, so pipeline fails

This is the flag you want on. Without it, cat nonexistent.log | grep ERROR | wc -l happily returns 0 even though cat failed.

The PIPESTATUS array

Even better: PIPESTATUS is an array of each pipeline command's exit code:

false | true | false
echo "${PIPESTATUS[@]}"    # 1 0 1

# Check if cat specifically failed
cat file | grep pattern | sort
if (( PIPESTATUS[0] != 0 )); then
  echo "cat failed"
fi

Useful when you need to distinguish which part of a pipeline failed.


Propagating exit codes from functions

do_work() {
  some_command || return 1
  another_command || return 2

  # default: last command's exit code
}

do_work
rc=$?

Return from a function with a specific code to signal what failed. Callers can then check $? or branch on it.

The "return the failing command's code" idiom

do_work() {
  some_command
  local rc=$?
  if (( rc != 0 )); then
    echo "some_command failed with $rc" >&2
    return $rc
  fi

  another_command
  return $?
}

Or more compactly:

do_work() {
  some_command || return
  another_command
}

Bare return with no argument returns the exit code of the last command. Combined with ||, this chains failures cleanly.


Writing scripts that exit correctly

The script's exit code is what $? is in the parent shell after it finishes. For a top-level script:

#!/usr/bin/env bash
set -euo pipefail

# ... work ...

exit 0    # explicit success

Or let the script fall through — the last command's exit code becomes the script's exit code.

Different exit codes for different failure modes

# Documented conventions at the top
# Exit codes:
#   0 - success
#   1 - generic failure
#   2 - invalid arguments
#   3 - missing dependency
#   4 - configuration error
#   5 - network error

if [[ $# -ne 1 ]]; then
  echo "usage: $0 <hostname>" >&2
  exit 2
fi

command -v jq >/dev/null || { echo "jq required" >&2; exit 3; }

[[ -f config.yml ]] || { echo "config.yml missing" >&2; exit 4; }

# ... etc

When a script is called from another script (or CI pipeline), distinct exit codes let the caller react differently. "Missing dependency" should retry-after-install; "invalid args" should alert a human.

PRO TIP

For scripts in CI/CD or cron, always document and use meaningful exit codes. Alert routing based on exit code is common; "exit 1 for everything" gives you no way to tell "missing config" from "network timeout."


Common exit-code mistakes

Mistake 1: ignoring exit codes silently

# BROKEN — silently continues on failure
download_file
process_file

# With set -e, this is fixed automatically.
# Without, add explicit checks:
download_file || exit 1
process_file || exit 1

Mistake 2: using exit codes for numeric data

# DON'T
count_items() {
  return $#    # breaks at 256
}

# DO
count_items() {
  echo $#
}
n=$(count_items "$@")

Exit codes are 0-255 and wrap. Use stdout for numbers.

Mistake 3: losing exit codes in local

From the Functions lesson:

# BROKEN — $? is always 0 (local succeeded)
my_func() {
  local x=$(risky_call)
  if (( $? == 0 )); then ...
}

# CORRECT
my_func() {
  local x
  x=$(risky_call)
  if (( $? == 0 )); then ...
}

Mistake 4: pipelines without pipefail

# Without pipefail, only grep's status matters
if cat nonexistent.log | grep -q ERROR; then
  echo "found errors"
fi
# If cat fails, grep returns 1 (no input, no match), so the branch isn't taken
# — but you never know cat failed.

# With pipefail, the pipeline fails if cat fails.

Mistake 5: set -e false sense of security

set -e

# This does NOT exit — the || branch catches the failure
risky_command || echo "oh no"

# And this does NOT exit — commands in if-conditions don't trigger -e
if risky_command; then ...

# And this does NOT exit — subshells have their own -e
(subshell_that_fails) || true

Read man bash for the full list of -e exemptions. Strict mode is a safety net, not a guarantee.


Propagating exit codes across pipes

A common pattern: compute something in a pipeline, preserve the inner exit code.

# Scenario: download → extract → process, and we want to know which failed

set -o pipefail
curl -fsSL "$url" | tar -xz | process_data
rc=$?

case $rc in
  0) echo "success" ;;
  22) echo "curl HTTP error (likely 404)" ;;
  28) echo "curl timeout" ;;
  *)  echo "unknown failure $rc" ;;
esac

With pipefail, rc is the last non-zero exit of any pipeline command. If curl failed (22 = HTTP error), that's what rc captures.

trap ... ERR for automatic error context

From the Traps lesson:

set -e
trap 'echo "FAIL: $BASH_COMMAND (exit $?) at line $LINENO" >&2' ERR

curl -fsSL "$url" | tar -xz | process_data
# If anything fails, you get:
# FAIL: curl -fsSL ... (exit 22) at line 5

Combine set -e -o pipefail with an ERR trap and you get precise failure messages automatically.


Checking exit codes explicitly

Sometimes you need to handle a specific non-zero code:

# Example: grep returns 1 for no match, 2 for error
grep -q pattern file
rc=$?
case $rc in
  0) echo "match" ;;
  1) echo "no match" ;;
  *) echo "grep error" ;;
esac

Or simpler with if/elif:

if grep -q pattern file; then
  echo "match"
elif [[ $? -eq 1 ]]; then
  echo "no match"
else
  echo "error"
fi

Careful: $? changes after every command. If you need it for multiple checks, capture it first.


A real-world example: deploy script

#!/usr/bin/env bash
set -euo pipefail
trap 'echo "FAIL: line $LINENO: $BASH_COMMAND (exit $?)" >&2' ERR

# Exit code convention:
# 0  - success
# 10 - git pull failed
# 20 - tests failed
# 30 - build failed
# 40 - deploy failed
# 50 - smoke test failed

git pull --ff-only || exit 10

npm test || exit 20

npm run build || exit 30

rsync -a --delete ./build/ "$DEPLOY_HOST:/var/www/app/" || exit 40

curl -fsSL "https://$DEPLOY_HOST/health" || exit 50

echo "deploy ok"
exit 0

CI can key off the exit code to know which stage failed. Alerts can route differently for "tests failed" vs "deploy host unreachable."


Quiz

KNOWLEDGE CHECK

You run: set -e -o pipefail; cat missing_file | grep ERROR. The file does not exist. What happens?


What to take away

  • Exit codes: 0 = success, non-zero = failure. Bash builds everything on this.
  • $? is the last command's exit code — changes after every command. Capture it immediately.
  • && / || are short-circuit operators keyed to exit codes. They are not full if/else substitutes.
  • cmd1 && cmd2 || cmd3 is NOT a ternary. Use if/else.
  • Pipeline default: exit code is only the LAST command's. Use set -o pipefail to change this.
  • PIPESTATUS array holds every pipeline command's exit code individually.
  • Use distinct exit codes for distinct failure modes (invalid args, missing deps, network errors). Document them.
  • Combine set -euo pipefail with trap ... ERR for automatic failure diagnostics.
  • Never return numeric data via return — use stdout.

Next module: production-ready scripts — argument parsing, handling weird filenames, and when Bash is the wrong tool.