Bash & Shell Scripting for Engineers

The set Flags That Save You

Bash defaults are wrong for scripts. By default, a script that hits an error keeps running. By default, an unset variable is silently empty. By default, a failure inside a pipeline is ignored. Each of these is a production outage waiting to happen.

The fix is four flags. Typing set -euo pipefail at the top of every script you write is the difference between a script that fails safely and one that corrupts data in interesting new ways.

KEY CONCEPT

set -euo pipefail is not optional. It is the minimum. Every production Bash script starts with it. Scripts without it are scripts that will fail silently, eventually, at the worst time.


The canonical prelude

Every Bash script should start like this:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

Let's unpack each piece.


set -e — exit on error

Without -e, Bash barrels through errors:

#!/bin/bash
cp important.txt /tmp/
rm important.txt          # deleted...
cp /tmp/important.txt ./   # ...but if /tmp/ failed earlier, this fails too
# Script exits with whatever the last command returned. No error message. No failure signal.

With -e:

#!/bin/bash
set -e
cp important.txt /tmp/    # if this fails, script exits here
rm important.txt           # never reached

The entire script now behaves "fail fast" — any unhandled non-zero exit aborts the script.

When -e does NOT trigger

-e has caveats. It does NOT exit on a failed command in these cases:

  1. Commands in if, while, until conditions. Failure is part of the condition:
if grep pattern file; then ...   # grep failing is fine — that's just "no match"
  1. Commands on the left of && or ||. The whole compound has its own logic:
cmd || echo "cmd failed"   # doesn't exit; the || catches the failure
  1. Commands whose result is negated with !.
if ! test_something; then ...    # failure is the expected path
  1. Pipelines (by default). Only the LAST command's status matters without pipefail.

  2. Subshells: a failure in a subshell doesn't exit the outer script without pipefail behavior.

The "explicit" failure pattern

Sometimes you want to allow a failure and react to it:

# This would exit with -e
risky_command

# This doesn't — || catches the failure
risky_command || handle_failure

# This doesn't either — part of an if condition
if risky_command; then ... fi

So -e doesn't take away your ability to handle errors. It takes away the ability to accidentally ignore them.

WARNING

set -e has surprising edge cases. It works for most scripts, but if you rely on it to catch every possible failure, read the bash man page for -e's full list of exemptions. When in doubt, use explicit || exit 1 on commands that must succeed.


set -u — error on unset variables

Without -u:

rm -rf "$prefix/$dir"      # if $prefix or $dir is unset: rm -rf /

That is the canonical production disaster. Missing variable → empty string → catastrophic path. -u prevents it:

set -u
rm -rf "$prefix/$dir"      # ERROR: prefix: unbound variable

Now the script exits with a clear error instead of doing something irreversible.

Handling variables that might be unset

-u errors on any unset variable reference. For variables that are legitimately optional, use a default:

# ERROR under -u if DEBUG is not set
if [[ "$DEBUG" == "true" ]]; then ...

# OK — falls back to empty string if unset
if [[ "${DEBUG:-}" == "true" ]]; then ...

Or use ${var:-default} to supply a default value:

log_level="${LOG_LEVEL:-info}"
port="${PORT:-8080}"

Checking if a variable is set

# Check if variable is set (even to empty)
if [[ -v CONFIG_FILE ]]; then ...

# Check if set AND non-empty
if [[ -n "${CONFIG_FILE:-}" ]]; then ...

Both work with -u. Use them when you need to know whether to apply a default.


set -o pipefail — propagate pipeline failures

Without pipefail, only the last command's exit status in a pipeline matters:

# Without pipefail, this exits 0 even if `cat` failed.
# grep succeeded (found "ERROR" — or not — either way it's "done").
cat nonexistent.log | grep ERROR

echo $?    # 0  — or 1 if grep didn't find, but NOT indicative of cat failing

Without pipefail, a common broken pattern:

errors=$(grep ERROR "$log" | wc -l)
# If the log doesn't exist, grep fails. wc succeeds (it just counts 0 lines).
# $errors is "0", the script continues as if everything is fine.

With pipefail:

set -o pipefail
errors=$(grep ERROR "$log" | wc -l)
# Now if grep fails (e.g. file not found), $? is non-zero.
# Combined with -e, the script exits.

Interaction with -e

-e + pipefail is the combination you want. Without pipefail, -e cannot catch mid-pipeline failures. Without -e, pipefail just sets the exit code but doesn't exit.

You almost always want both.


set -x — trace every command

For debugging, -x is gold:

#!/bin/bash
set -x
name="alice"
echo "hello $name"

# Output:
# + name=alice
# + echo 'hello alice'
# hello alice

Every command is printed (with + prefix) AFTER all expansions. This shows you exactly what Bash is running — post-variable-expansion, post-globbing.

Do not put set -x in production scripts — it floods logs. Use it to debug, then remove.

Conditional tracing

A common pattern:

# Enable tracing only when DEBUG env var is set
[[ "${DEBUG:-}" == "true" ]] && set -x

Now DEBUG=true ./script.sh runs with trace; regular runs don't.

Prettier traces with PS4

Bash's PS4 is the prefix for each traced line. Customize it for useful context:

export PS4='+${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]:-main}: '
set -x

# Now traces look like:
# +./script.sh:5:main: name=alice
# +./script.sh:6:main: echo 'hello alice'

File, line number, function name. Much easier to find the offending line in a big script.


The IFS=$'\n\t' discipline

Not a set flag, but it pairs with them. Default IFS is space, tab, newline. Setting it to just tab and newline means:

  • Word splitting on newlines/tabs still works (so you can iterate over $(ls) output if you must).
  • Spaces in filenames no longer cause word splitting.
IFS=$'\n\t'

files=$(ls /tmp)
for f in $files; do         # still word-splits on newline, but spaces are preserved
  echo "[$f]"
done

This is a defensive default — it does not fix the root problem (quote your variables), but it reduces the blast radius when you forget.

PRO TIP

IFS=$'\n\t' is recommended by Aaron Maxwell's "unofficial strict mode" article. It's a small extra layer of safety for scripts that process lines of filenames or similar.


Putting it all together — the strict mode prelude

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

# Optional: enable tracing under DEBUG
[[ "${DEBUG:-}" == "true" ]] && set -x

This is the canonical safe-Bash prelude. Copy it into every new script.

Why #!/usr/bin/env bash and not #!/bin/bash

env looks up bash in $PATH. This means your script uses the Bash the user has (e.g. an updated one from Homebrew on macOS), not the system one (which on macOS is still 3.2). For scripts relying on Bash 4+ features (associative arrays, ^^, ,,), this matters.


The mental model

set -eExit immediately if a command fails (with caveats).Without: script silently continues past errors.set -uError on references to unset variables.Without: rm -rf $prefix/$dir silently becomes rm -rf /set -o pipefailPipeline fails if ANY command in it fails.Without: cat x | grep y returns grep's status only.set -xPrint every command (post-expansion) to stderr.Enable conditionally via DEBUG env var. Do not leave on in prod.

Undoing set flags locally

Sometimes you genuinely need to allow a failure inside an otherwise-strict script:

set -e

# Temporarily allow failure for one command
set +e
risky_cmd
rc=$?
set -e

if [[ $rc -ne 0 ]]; then
  handle_failure
fi

set +e turns off errexit; set -e turns it back on. Similar for +u and +o pipefail.

Cleaner alternative using ||:

set -e
risky_cmd || rc=$?
rc="${rc:-0}"

if [[ $rc -ne 0 ]]; then
  handle_failure
fi

cmd || rc=$? captures the exit code without breaking strict mode.


Real-world failure modes

Failure mode 1: unbounded variable in a path

# Without -u
backup_dir="/backups/${USER}"
rm -rf "$backup_dir"/*

# If $USER is somehow unset, this becomes:
#   rm -rf "/backups/"/*
# which deletes EVERY backup. Catastrophe.

# With -u
# ERROR: USER: unbound variable
# Script stops before damage.

Failure mode 2: unchecked cd

# Without -e
cd /critical/dir
rm -rf *    # if cd failed (dir doesn't exist, permission denied),
             # we're still in cwd. This deletes the wrong things.

# With -e
# cd fails -> script exits before rm.

Failure mode 3: silent pipeline failure

# Without pipefail — hidden bug
if ! grep -q ERROR file.log | head -1; then
  echo "no errors"
fi
# grep's exit code is lost. Always evaluates based on head's exit code (always 0).

Failure mode 4: sourced config with unset vars

# config.sh might not define every var
source config.sh
echo "app runs on $APP_PORT"

# If APP_PORT was not defined and -u is on: error.
# Better: explicit default.
APP_PORT="${APP_PORT:-8080}"

When set -e is not enough

set -e is a safety net, not a fence. It catches many errors but not all. For critical operations, add explicit checks:

# Belt-and-suspenders
set -e

dest="/backup/$(date +%Y%m%d)"
[[ -d "$dest" ]] || { echo "destination missing: $dest" >&2; exit 1; }

cp -r source/ "$dest"/ || { echo "copy failed" >&2; exit 1; }

verify_backup "$dest" || { echo "verification failed" >&2; exit 1; }

Even with -e, adding explicit error messages on the critical paths makes failures actionable. "cp failed" is more useful than "exited with status 1."


Quiz

KNOWLEDGE CHECK

You write this script: set -e; failing_cmd | grep pattern; echo done. failing_cmd exits non-zero but grep finds a match. What happens?


What to take away

  • set -euo pipefail is the canonical strict mode. Not optional for production scripts.
  • -e exits on unhandled command failures.
  • -u errors on unset variables — prevents the rm -rf $var/$dir disaster.
  • -o pipefail makes pipelines fail if any command in them fails. Required with -e.
  • -x traces commands for debugging. Conditional via [[ "${DEBUG:-}" == "true" ]] && set -x.
  • Customize PS4 for traces that include file, line number, and function.
  • Use #!/usr/bin/env bash to pick up user's Bash (useful for Bash 4+ features on macOS).
  • Supply defaults for optional env vars: ${VAR:-default}.
  • Temporarily relax with cmd || rc=$? when you need to react to a failure without aborting.

Next lesson: traps — running cleanup code on exit, signals, and errors.