Traps and Cleanup
A well-written script cleans up after itself. Temporary files get deleted. Lock files are released. In-progress transactions get rolled back. Subprocesses are stopped.
A poorly-written script hopes none of those things are necessary because nothing bad ever happens. Something bad always happens. The power gets cut, the user Ctrl-Cs at exactly the wrong moment, a downstream command fails, a parent script kills the child. Traps are how you make cleanup happen anyway.
trap 'cleanup' EXIT at the top of every non-trivial script. It runs the cleanup function when the script exits — normally, abnormally, or from a signal. Without it, partially-executed scripts leave garbage behind.
What trap does
trap registers a command (or function call) to be run when a signal or special event occurs:
trap 'echo cleanup' EXIT
echo "working..."
exit 0
# Output:
# working...
# cleanup
The quoted string is the command to run. The event name(s) after it are when to run it.
Events you care about:
EXIT— the script is exiting (for any reason). The most important trap.ERR— a command failed (underset -e).INT— the user pressed Ctrl-C (SIGINT).TERM— the kernel/process manager sent SIGTERM.HUP— terminal closed (SIGHUP).
The EXIT trap — the one you almost always want
Register cleanup:
#!/usr/bin/env bash
set -euo pipefail
tmpdir=$(mktemp -d)
trap 'rm -rf "$tmpdir"' EXIT
# ...do work in $tmpdir...
No matter how the script exits — successful exit 0, failing command under set -e, Ctrl-C, killed by another process — the EXIT trap runs. The tempdir gets cleaned up.
This pattern — create a temp resource, install an EXIT trap, use the resource — is the single most useful trap idiom.
Multiple cleanup actions
You can register multiple things to clean up:
tmpdir=$(mktemp -d)
lockfile="/var/lock/myapp.lock"
cleanup() {
rm -rf "$tmpdir"
rm -f "$lockfile"
# also kill background processes, etc.
}
trap cleanup EXIT
touch "$lockfile"
# ...work...
Using a named function is cleaner than a long quoted string.
The ERR trap — fires on command failure
trap '...' ERR runs when any command returns non-zero (and set -e would have exited):
trap 'echo "error at line $LINENO"' ERR
set -e
cmd1
cmd2 # if this fails, ERR trap fires before exit
cmd3
A useful pattern: log the failing command
set -e
trap 'echo "FAIL: command \"${BASH_COMMAND}\" failed at line $LINENO" >&2' ERR
do_something # if this fails:
# FAIL: command "do_something" failed at line 15
$BASH_COMMAND is the command that just ran. $LINENO is the line it was on. Together they tell you exactly what failed.
ERR doesn't fire everywhere
Same caveats as set -e: the ERR trap does not fire on failures inside:
if,while,untilconditions.- Commands on the left of
&&,||. - Commands negated with
!.
If you care about those cases, explicit error handling is required.
Signal traps — handling Ctrl-C and SIGTERM
Users press Ctrl-C. Kubernetes sends SIGTERM. Your script should respond gracefully:
#!/usr/bin/env bash
set -euo pipefail
should_exit=0
trap 'should_exit=1' INT TERM
while (( ! should_exit )); do
# do work
sleep 1
done
echo "shutting down cleanly"
When the user presses Ctrl-C, the INT trap fires, sets should_exit, and the loop exits next iteration. This is much better than dying mid-operation.
The "kill children" pattern
If your script spawns background processes, you need to kill them on exit:
pids=()
start_worker() {
worker_process &
pids+=("$!")
}
cleanup() {
for pid in "${pids[@]}"; do
kill "$pid" 2>/dev/null || true
done
wait # wait for them to actually exit
}
trap cleanup EXIT
start_worker
start_worker
# ...
Without this, background processes keep running after the script dies — a "zombie" script that appears to have exited but whose children are still consuming resources.
What signals actually are
A signal is a small interrupt the OS sends to a process. Common ones:
SIGKILL and SIGSTOP cannot be trapped. This is deliberate — it's the OS's escape hatch when a process is misbehaving. Scripts should rely on SIGTERM-and-trap for cleanup and accept that SIGKILL means "I failed; the process is gone."
The graceful shutdown pattern
The canonical "handle SIGTERM" script:
#!/usr/bin/env bash
set -euo pipefail
should_exit=0
trap 'should_exit=1' TERM INT
main_loop() {
while (( ! should_exit )); do
process_next_batch
(( should_exit )) && break
sleep 1
done
}
cleanup() {
echo "flushing state..."
flush_in_progress_work
release_locks
echo "done"
}
trap cleanup EXIT
main_loop
Kubernetes sends SIGTERM when a pod is terminating. If you respond within terminationGracePeriodSeconds (default 30s), you get a clean shutdown. If not, SIGKILL follows.
Scripts running in containers should always trap SIGTERM.
Handling cleanup in subshells
A trap registered in the parent shell does NOT fire in a subshell. If your script forks or pipes, you might need traps in both:
cleanup_parent() {
echo "parent cleanup"
}
trap cleanup_parent EXIT
(
trap 'echo subshell cleanup' EXIT
do_subshell_work
) # EXIT trap fires when subshell exits
# At script end, parent cleanup also fires.
For most scripts, just trap EXIT in the top shell. Only worry about subshell traps if you're doing something unusual with backgrounded pipelines.
The temp file idiom — the one you will reuse most
#!/usr/bin/env bash
set -euo pipefail
tmpfile=$(mktemp)
trap 'rm -f "$tmpfile"' EXIT
# ...use $tmpfile...
Or for a whole directory:
tmpdir=$(mktemp -d)
trap 'rm -rf "$tmpdir"' EXIT
# ...use $tmpdir for intermediate files...
mktemp creates a file/dir with a unique name in a safe location (usually $TMPDIR or /tmp). The trap guarantees cleanup even if the script dies unexpectedly.
Do not put rm -rf $tmpdir (unquoted) in the trap. If $tmpdir is somehow unset or empty, you'd rm -rf "" or even rm -rf /. Always quote.
The lock file idiom
Preventing two copies of a script from running simultaneously:
#!/usr/bin/env bash
set -euo pipefail
lockfile="/var/lock/myscript.lock"
if ! (set -C; echo "$$" > "$lockfile") 2>/dev/null; then
echo "already running (lock: $lockfile)" >&2
exit 1
fi
trap 'rm -f "$lockfile"' EXIT
# ...work...
set -C (noclobber) makes > fail if the file exists. Combined with echo $$ > lockfile, this is an atomic check-and-create: the first process creates the lock; subsequent ones fail.
For production-grade locking, use flock:
exec 200>"$lockfile"
if ! flock -n 200; then
echo "already running" >&2
exit 1
fi
trap 'flock -u 200; rm -f "$lockfile"' EXIT
# ...work...
flock is the robust approach — it uses OS-level file locking that is released automatically if the process dies.
Composing multiple traps
trap replaces previous handlers by default. If you want to add to an existing trap:
# Define a registry
CLEANUP_CMDS=()
add_cleanup() {
CLEANUP_CMDS+=("$1")
}
run_cleanup() {
local cmd
for cmd in "${CLEANUP_CMDS[@]}"; do
eval "$cmd" # eval is risky — only use with strings YOU control
done
}
trap run_cleanup EXIT
# Register cleanups as you go
tmpfile=$(mktemp)
add_cleanup "rm -f \"$tmpfile\""
tmpdir=$(mktemp -d)
add_cleanup "rm -rf \"$tmpdir\""
Or keep it simple — have one cleanup() function that does everything:
cleanup() {
[[ -n "${tmpfile:-}" ]] && rm -f "$tmpfile"
[[ -n "${tmpdir:-}" ]] && rm -rf "$tmpdir"
[[ -n "${pid:-}" ]] && kill "$pid" 2>/dev/null
}
trap cleanup EXIT
Debugging traps — the DEBUG trap
Bash also has DEBUG, which fires before every command:
trap 'echo "about to run: $BASH_COMMAND"' DEBUG
cd /tmp
ls
# Output:
# about to run: cd /tmp
# about to run: ls
# file1 file2 ...
Useful for detailed tracing when set -x is too noisy. Not something you leave on in production.
Inspecting current traps
trap -p prints registered traps:
trap 'echo bye' EXIT
trap -p
# trap -- 'echo bye' EXIT
Useful for debugging — is my trap actually registered? Did something overwrite it?
Clearing a trap
trap - EXIT # remove the EXIT trap
Common trap mistakes
Mistake 1: not quoting variables in the trap
# BROKEN
trap "rm -rf $tmpdir" EXIT
# $tmpdir was expanded NOW, at registration time.
# If it later changes, the trap still uses the old value.
# CORRECT
trap 'rm -rf "$tmpdir"' EXIT
# Single quotes defer expansion until the trap fires.
Mistake 2: trap that depends on a local variable
do_work() {
local tmpdir=$(mktemp -d)
trap 'rm -rf "$tmpdir"' EXIT # $tmpdir is local
# ...
}
# When the trap fires (at SCRIPT exit, not function exit),
# the local $tmpdir is gone.
Fix: use global variables for anything referenced by a trap, or clean up at function end with a separate mechanism.
Mistake 3: heavy work in the trap
cleanup() {
upload_logs_to_s3 # Network call. If this hangs, script never exits.
notify_slack
rm -f "$tmpfile"
}
trap cleanup EXIT
Cleanup should be fast and local. Network calls, database writes, anything that can hang — do those in the main body, not in the trap.
Mistake 4: overriding default signal behavior unintentionally
trap 'echo ignoring' INT
# Now Ctrl-C doesn't actually exit.
Unless you specifically want to ignore Ctrl-C, make sure your trap either exits or sets a flag that causes the main loop to exit.
A full-featured script template
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
# State
tmpdir=""
lockfile="/var/lock/myapp.lock"
should_exit=0
# Cleanup runs on any exit path
cleanup() {
local rc=$?
[[ -n "$tmpdir" ]] && rm -rf "$tmpdir"
[[ -f "$lockfile" ]] && rm -f "$lockfile"
(( rc != 0 )) && echo "script exited with code $rc" >&2
}
trap cleanup EXIT
# Graceful shutdown on SIGTERM / SIGINT
trap 'should_exit=1' TERM INT
# Error context on failures
trap 'echo "error at line $LINENO: $BASH_COMMAND" >&2' ERR
# Acquire lock
exec 200>"$lockfile"
flock -n 200 || { echo "already running" >&2; exit 1; }
# Create scratch space
tmpdir=$(mktemp -d)
# Main work
while (( ! should_exit )); do
do_one_batch
(( should_exit )) && break
sleep 1
done
echo "done"
Every safety net in one place. Copy this as a starting point.
Quiz
You register a trap with double quotes around the body: trap double-quoted-rm-tmpfile EXIT. Later the tmpfile variable gets reassigned to a new path. When the script exits, what gets deleted?
What to take away
trap 'cleanup' EXIT— the essential cleanup idiom. Registers cleanup that runs no matter how the script exits.- Use single quotes for the trap body so variables are expanded at trap-fire time, not registration time.
- Signal traps:
INT(Ctrl-C),TERM(kubernetes stop),HUP(terminal close). Trap them for graceful shutdown. - SIGKILL can't be trapped. Plan for it at a higher level (restart logic, externally-tracked state).
- Temp files:
mktemp+trap 'rm -rf "$tmp"' EXITis the pattern. - Lock files: use
flockfor robust mutual exclusion. - Keep cleanup fast. No network calls. No heavy I/O.
- In long-running scripts, set a
should_exitflag in SIGTERM/SIGINT handlers and check it in your main loop.
Next lesson: exit codes in depth — what $? really is, chaining with &&/||, and propagating status correctly.