Bash & Shell Scripting for Engineers

Traps and Cleanup

A well-written script cleans up after itself. Temporary files get deleted. Lock files are released. In-progress transactions get rolled back. Subprocesses are stopped.

A poorly-written script hopes none of those things are necessary because nothing bad ever happens. Something bad always happens. The power gets cut, the user Ctrl-Cs at exactly the wrong moment, a downstream command fails, a parent script kills the child. Traps are how you make cleanup happen anyway.

KEY CONCEPT

trap 'cleanup' EXIT at the top of every non-trivial script. It runs the cleanup function when the script exits — normally, abnormally, or from a signal. Without it, partially-executed scripts leave garbage behind.


What trap does

trap registers a command (or function call) to be run when a signal or special event occurs:

trap 'echo cleanup' EXIT
echo "working..."
exit 0
# Output:
# working...
# cleanup

The quoted string is the command to run. The event name(s) after it are when to run it.

Events you care about:

  • EXIT — the script is exiting (for any reason). The most important trap.
  • ERR — a command failed (under set -e).
  • INT — the user pressed Ctrl-C (SIGINT).
  • TERM — the kernel/process manager sent SIGTERM.
  • HUP — terminal closed (SIGHUP).

The EXIT trap — the one you almost always want

Register cleanup:

#!/usr/bin/env bash
set -euo pipefail

tmpdir=$(mktemp -d)
trap 'rm -rf "$tmpdir"' EXIT

# ...do work in $tmpdir...

No matter how the script exits — successful exit 0, failing command under set -e, Ctrl-C, killed by another process — the EXIT trap runs. The tempdir gets cleaned up.

This pattern — create a temp resource, install an EXIT trap, use the resource — is the single most useful trap idiom.

Multiple cleanup actions

You can register multiple things to clean up:

tmpdir=$(mktemp -d)
lockfile="/var/lock/myapp.lock"

cleanup() {
  rm -rf "$tmpdir"
  rm -f "$lockfile"
  # also kill background processes, etc.
}

trap cleanup EXIT

touch "$lockfile"
# ...work...

Using a named function is cleaner than a long quoted string.


The ERR trap — fires on command failure

trap '...' ERR runs when any command returns non-zero (and set -e would have exited):

trap 'echo "error at line $LINENO"' ERR
set -e

cmd1
cmd2        # if this fails, ERR trap fires before exit
cmd3

A useful pattern: log the failing command

set -e
trap 'echo "FAIL: command \"${BASH_COMMAND}\" failed at line $LINENO" >&2' ERR

do_something    # if this fails:
                # FAIL: command "do_something" failed at line 15

$BASH_COMMAND is the command that just ran. $LINENO is the line it was on. Together they tell you exactly what failed.

ERR doesn't fire everywhere

Same caveats as set -e: the ERR trap does not fire on failures inside:

  • if, while, until conditions.
  • Commands on the left of &&, ||.
  • Commands negated with !.

If you care about those cases, explicit error handling is required.


Signal traps — handling Ctrl-C and SIGTERM

Users press Ctrl-C. Kubernetes sends SIGTERM. Your script should respond gracefully:

#!/usr/bin/env bash
set -euo pipefail

should_exit=0

trap 'should_exit=1' INT TERM

while (( ! should_exit )); do
  # do work
  sleep 1
done

echo "shutting down cleanly"

When the user presses Ctrl-C, the INT trap fires, sets should_exit, and the loop exits next iteration. This is much better than dying mid-operation.

The "kill children" pattern

If your script spawns background processes, you need to kill them on exit:

pids=()

start_worker() {
  worker_process &
  pids+=("$!")
}

cleanup() {
  for pid in "${pids[@]}"; do
    kill "$pid" 2>/dev/null || true
  done
  wait   # wait for them to actually exit
}

trap cleanup EXIT

start_worker
start_worker
# ...

Without this, background processes keep running after the script dies — a "zombie" script that appears to have exited but whose children are still consuming resources.


What signals actually are

A signal is a small interrupt the OS sends to a process. Common ones:

SIGHUP (1)Terminal closed — useful for reload-config signalHistorical meaning: hang-up on a modem. Often used now to ask a daemon to reload config.SIGINT (2)Ctrl-C — user interruptTrap this to clean up before exiting when the user aborts.SIGTERM (15)Polite kill — please stopKubernetes sends this first, then SIGKILL if you do not stop in the grace period.SIGKILL (9)Unconditional kill — CANNOT BE TRAPPEDThe OS kills the process immediately. No cleanup is possible.SIGUSR1 (10) / SIGUSR2 (12)User-defined — script-specific purposesYour choice what they mean. Often: toggle debug, dump stats.SIGPIPE (13)Wrote to a closed pipeWhen a pipe's reader exits before the writer finishes (e.g. cmd | head).

SIGKILL and SIGSTOP cannot be trapped. This is deliberate — it's the OS's escape hatch when a process is misbehaving. Scripts should rely on SIGTERM-and-trap for cleanup and accept that SIGKILL means "I failed; the process is gone."


The graceful shutdown pattern

The canonical "handle SIGTERM" script:

#!/usr/bin/env bash
set -euo pipefail

should_exit=0
trap 'should_exit=1' TERM INT

main_loop() {
  while (( ! should_exit )); do
    process_next_batch
    (( should_exit )) && break
    sleep 1
  done
}

cleanup() {
  echo "flushing state..."
  flush_in_progress_work
  release_locks
  echo "done"
}

trap cleanup EXIT

main_loop

Kubernetes sends SIGTERM when a pod is terminating. If you respond within terminationGracePeriodSeconds (default 30s), you get a clean shutdown. If not, SIGKILL follows.

Scripts running in containers should always trap SIGTERM.


Handling cleanup in subshells

A trap registered in the parent shell does NOT fire in a subshell. If your script forks or pipes, you might need traps in both:

cleanup_parent() {
  echo "parent cleanup"
}

trap cleanup_parent EXIT

(
  trap 'echo subshell cleanup' EXIT
  do_subshell_work
)   # EXIT trap fires when subshell exits
# At script end, parent cleanup also fires.

For most scripts, just trap EXIT in the top shell. Only worry about subshell traps if you're doing something unusual with backgrounded pipelines.


The temp file idiom — the one you will reuse most

#!/usr/bin/env bash
set -euo pipefail

tmpfile=$(mktemp)
trap 'rm -f "$tmpfile"' EXIT

# ...use $tmpfile...

Or for a whole directory:

tmpdir=$(mktemp -d)
trap 'rm -rf "$tmpdir"' EXIT

# ...use $tmpdir for intermediate files...

mktemp creates a file/dir with a unique name in a safe location (usually $TMPDIR or /tmp). The trap guarantees cleanup even if the script dies unexpectedly.

PRO TIP

Do not put rm -rf $tmpdir (unquoted) in the trap. If $tmpdir is somehow unset or empty, you'd rm -rf "" or even rm -rf /. Always quote.


The lock file idiom

Preventing two copies of a script from running simultaneously:

#!/usr/bin/env bash
set -euo pipefail

lockfile="/var/lock/myscript.lock"

if ! (set -C; echo "$$" > "$lockfile") 2>/dev/null; then
  echo "already running (lock: $lockfile)" >&2
  exit 1
fi

trap 'rm -f "$lockfile"' EXIT

# ...work...

set -C (noclobber) makes > fail if the file exists. Combined with echo $$ > lockfile, this is an atomic check-and-create: the first process creates the lock; subsequent ones fail.

For production-grade locking, use flock:

exec 200>"$lockfile"
if ! flock -n 200; then
  echo "already running" >&2
  exit 1
fi
trap 'flock -u 200; rm -f "$lockfile"' EXIT

# ...work...

flock is the robust approach — it uses OS-level file locking that is released automatically if the process dies.


Composing multiple traps

trap replaces previous handlers by default. If you want to add to an existing trap:

# Define a registry
CLEANUP_CMDS=()

add_cleanup() {
  CLEANUP_CMDS+=("$1")
}

run_cleanup() {
  local cmd
  for cmd in "${CLEANUP_CMDS[@]}"; do
    eval "$cmd"     # eval is risky — only use with strings YOU control
  done
}

trap run_cleanup EXIT

# Register cleanups as you go
tmpfile=$(mktemp)
add_cleanup "rm -f \"$tmpfile\""

tmpdir=$(mktemp -d)
add_cleanup "rm -rf \"$tmpdir\""

Or keep it simple — have one cleanup() function that does everything:

cleanup() {
  [[ -n "${tmpfile:-}" ]] && rm -f "$tmpfile"
  [[ -n "${tmpdir:-}" ]] && rm -rf "$tmpdir"
  [[ -n "${pid:-}" ]] && kill "$pid" 2>/dev/null
}

trap cleanup EXIT

Debugging traps — the DEBUG trap

Bash also has DEBUG, which fires before every command:

trap 'echo "about to run: $BASH_COMMAND"' DEBUG

cd /tmp
ls

# Output:
# about to run: cd /tmp
# about to run: ls
# file1 file2 ...

Useful for detailed tracing when set -x is too noisy. Not something you leave on in production.


Inspecting current traps

trap -p prints registered traps:

trap 'echo bye' EXIT
trap -p
# trap -- 'echo bye' EXIT

Useful for debugging — is my trap actually registered? Did something overwrite it?

Clearing a trap

trap - EXIT    # remove the EXIT trap

Common trap mistakes

Mistake 1: not quoting variables in the trap

# BROKEN
trap "rm -rf $tmpdir" EXIT
# $tmpdir was expanded NOW, at registration time.
# If it later changes, the trap still uses the old value.

# CORRECT
trap 'rm -rf "$tmpdir"' EXIT
# Single quotes defer expansion until the trap fires.

Mistake 2: trap that depends on a local variable

do_work() {
  local tmpdir=$(mktemp -d)
  trap 'rm -rf "$tmpdir"' EXIT    # $tmpdir is local
  # ...
}
# When the trap fires (at SCRIPT exit, not function exit),
# the local $tmpdir is gone.

Fix: use global variables for anything referenced by a trap, or clean up at function end with a separate mechanism.

Mistake 3: heavy work in the trap

cleanup() {
  upload_logs_to_s3    # Network call. If this hangs, script never exits.
  notify_slack
  rm -f "$tmpfile"
}

trap cleanup EXIT

Cleanup should be fast and local. Network calls, database writes, anything that can hang — do those in the main body, not in the trap.

Mistake 4: overriding default signal behavior unintentionally

trap 'echo ignoring' INT
# Now Ctrl-C doesn't actually exit.

Unless you specifically want to ignore Ctrl-C, make sure your trap either exits or sets a flag that causes the main loop to exit.


#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

# State
tmpdir=""
lockfile="/var/lock/myapp.lock"
should_exit=0

# Cleanup runs on any exit path
cleanup() {
  local rc=$?
  [[ -n "$tmpdir" ]] && rm -rf "$tmpdir"
  [[ -f "$lockfile" ]] && rm -f "$lockfile"
  (( rc != 0 )) && echo "script exited with code $rc" >&2
}
trap cleanup EXIT

# Graceful shutdown on SIGTERM / SIGINT
trap 'should_exit=1' TERM INT

# Error context on failures
trap 'echo "error at line $LINENO: $BASH_COMMAND" >&2' ERR

# Acquire lock
exec 200>"$lockfile"
flock -n 200 || { echo "already running" >&2; exit 1; }

# Create scratch space
tmpdir=$(mktemp -d)

# Main work
while (( ! should_exit )); do
  do_one_batch
  (( should_exit )) && break
  sleep 1
done

echo "done"

Every safety net in one place. Copy this as a starting point.


Quiz

KNOWLEDGE CHECK

You register a trap with double quotes around the body: trap double-quoted-rm-tmpfile EXIT. Later the tmpfile variable gets reassigned to a new path. When the script exits, what gets deleted?


What to take away

  • trap 'cleanup' EXIT — the essential cleanup idiom. Registers cleanup that runs no matter how the script exits.
  • Use single quotes for the trap body so variables are expanded at trap-fire time, not registration time.
  • Signal traps: INT (Ctrl-C), TERM (kubernetes stop), HUP (terminal close). Trap them for graceful shutdown.
  • SIGKILL can't be trapped. Plan for it at a higher level (restart logic, externally-tracked state).
  • Temp files: mktemp + trap 'rm -rf "$tmp"' EXIT is the pattern.
  • Lock files: use flock for robust mutual exclusion.
  • Keep cleanup fast. No network calls. No heavy I/O.
  • In long-running scripts, set a should_exit flag in SIGTERM/SIGINT handlers and check it in your main loop.

Next lesson: exit codes in depth — what $? really is, chaining with &&/||, and propagating status correctly.