Bash & Shell Scripting for Engineers

Structuring Larger Scripts

Every long Bash script started as a short Bash script. Somewhere around 200 lines, the structure starts to strain. Somewhere around 500 lines, it becomes hard to change without breaking something. And somewhere between 500 and 2000 lines, the right answer is not "write more Bash" but "stop writing Bash."

This lesson is about keeping a growing Bash script manageable, organizing shared helpers into libraries, and recognizing the signs that it's time to switch languages.

KEY CONCEPT

Bash is a glue language, not an application language. It excels at stringing together OS tools. It does not excel at business logic, complex data transformations, or anything that would be classes in another language. Know when to stop.

The shape of a well-structured Bash script

A Bash script over 100 lines benefits from a consistent structure:

#!/usr/bin/env bash
# Description, version, author.

# 1. Strict mode
set -euo pipefail
IFS=$'\n\t'

# 2. Constants
readonly VERSION="1.0.0"
readonly SCRIPT_NAME="${0##*/}"
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# 3. Source libraries
source "$SCRIPT_DIR/lib/logging.sh"
source "$SCRIPT_DIR/lib/utils.sh"

# 4. Global state (minimize)
VERBOSE=false
DRY_RUN=false

# 5. Functions — in order of abstraction, top-down

usage() {
  cat <<EOF
$SCRIPT_NAME — does the thing

Usage: $SCRIPT_NAME [options] <input>
...
EOF
}

parse_args() {
  while [[ $# -gt 0 ]]; do
    # ...
  done
}

do_work() {
  # ...
}

cleanup() {
  # ...
}

# 6. main — the orchestration
main() {
  parse_args "$@"
  trap cleanup EXIT
  do_work
}

# 7. Entry point
main "$@"

The structure matters less than having some structure. Scripts with no structure: where state, config, and logic are interleaved, rot the fastest.

The `main` function idiom

Wrapping the top-level flow in a main function has several benefits:

Local variables: variables in main are function-local, not script-global.
Testability: you can source the script without running it, then call main or individual functions for testing.
Readability: reading main tells you the whole flow in 10 lines.

main() {
  parse_args "$@"
  validate_input
  do_work
}

# Only run main if script is executed directly, not sourced
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
  main "$@"
fi

The if [[ "${BASH_SOURCE[0]}" == "${0}" ]] check is Bash's equivalent of Python's if __name__ == "__main__":. It lets the file be sourced (for testing or composition) without running main.

Organizing into libraries

Once you have multiple scripts in a project, shared helpers should live in a library file:

myproject/
├── bin/
│   ├── deploy
│   ├── rollback
│   └── status
└── lib/
    ├── logging.sh
    ├── config.sh
    └── git.sh

Each script in bin/ sources what it needs:

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/logging.sh"
source "$SCRIPT_DIR/../lib/git.sh"

main() {
  log_info "starting deploy"
  ensure_clean_repo
  # ...
}

main "$@"

What belongs in a library

Logging and error helpers: log_info, die, warn.
Common OS checks: require_command jq, require_file config.yml.
Retry / timeout wrappers: retry 3 'some_command'.
Protocol-level helpers: git_current_branch, k8s_current_context.

What does NOT belong in a library

Constants specific to one script.
Business logic specific to one workflow.
Stuff that only has one caller.

If a helper only has one caller, leave it inline. Extract to a library when you have 2+ callers.

A sample `lib/logging.sh`

# lib/logging.sh
# Logging helpers with levels, colors, and a standard format.

LOG_LEVEL="${LOG_LEVEL:-info}"   # debug, info, warn, error

_log() {
  local level="$1"; shift
  local msg="$*"
  printf '[%s] [%-5s] %s\n' "$(date -u +'%Y-%m-%dT%H:%M:%SZ')" "$level" "$msg" >&2
}

_should_log() {
  # Return 0 if $1 >= LOG_LEVEL
  local priority=(debug info warn error)
  local want=-1 have=-1 i
  for i in "${!priority[@]}"; do
    [[ "${priority[i]}" == "$LOG_LEVEL" ]] && want=$i
    [[ "${priority[i]}" == "$1" ]] && have=$i
  done
  (( have >= want ))
}

log_debug() { _should_log debug && _log DEBUG "$@"; }
log_info()  { _should_log info  && _log INFO  "$@"; }
log_warn()  { _should_log warn  && _log WARN  "$@"; }
log_error() { _should_log error && _log ERROR "$@"; }

die() {
  log_error "$@"
  exit 1
}

Usage:

source lib/logging.sh

log_info "starting up"
log_debug "config file: $cfg"
log_warn "disk usage high: ${pct}%"
log_error "failed to connect to $host"
die "unrecoverable error: cannot continue"

Now every script in your project has consistent logging.

A sample `lib/utils.sh`

# lib/utils.sh
# Generic helpers.

require_command() {
  for cmd in "$@"; do
    command -v "$cmd" >/dev/null || die "required command not found: $cmd"
  done
}

require_file() {
  for f in "$@"; do
    [[ -f "$f" ]] || die "required file not found: $f"
  done
}

retry() {
  local attempts="$1"; shift
  local delay="${RETRY_DELAY:-1}"
  local i
  for (( i = 1; i <= attempts; i++ )); do
    if "$@"; then
      return 0
    fi
    (( i < attempts )) && sleep "$delay"
  done
  return 1
}

confirm() {
  local prompt="${1:-Continue?}"
  read -r -p "$prompt [y/N] " reply
  [[ "${reply,,}" == "y" || "${reply,,}" == "yes" ]]
}

# Run a command with a timeout (portable-ish)
with_timeout() {
  local secs="$1"; shift
  timeout "$secs" "$@"
}

Usage:

require_command jq curl kubectl
require_file config.yml secrets.env

retry 3 curl -fsSL "$url" -o output.txt

confirm "Delete everything?" && rm -rf /stuff

Signs your script is outgrowing Bash

Not every script should become a library; not every library should stay in Bash. Red flags that Bash is the wrong tool:

When Bash IS the right tool

Despite the red flags, Bash is the right tool for:

Wrapping a handful of commands with some error handling.
CI pipeline glue that shells out to existing tools.
OS-level orchestration: systemd units, cron jobs, container entrypoints.
Ad-hoc one-offs that will probably never run again.
Deployment scripts that call ssh, docker, kubectl, git.
Anything under ~100 lines where the control flow is sequential.

Bash's superpower is zero-dependency, ubiquitous availability. On any Unix-like system, you can assume bash is present. Python/Go bring a runtime or binary; Bash just runs.

The "rewrite in Python" migration pattern

When a Bash script hits its limits, the migration usually looks like:

Step 1: identify the core logic

Separate the "orchestration" (which external tools to call, in what order) from the "logic" (data transformation, decisions, state).

Step 2: rewrite the logic in Python, keep the orchestration shell-like

Python's subprocess module runs shell commands, and libraries like sh (or invoke, or plumbum) make it feel shell-like:

import subprocess

def git_current_branch():
    result = subprocess.run(
        ["git", "branch", "--show-current"],
        capture_output=True, text=True, check=True,
    )
    return result.stdout.strip()

def deploy(env):
    branch = git_current_branch()
    if branch != "main" and env == "prod":
        raise ValueError(f"prod deploys must come from main, not {branch}")
    # ...

You get: real data types, real tests, real error handling, real packaging. The orchestration of external tools still works the same way.

Step 3: remove the old script

Keep it around for a release as a wrapper (#!/bin/bash; exec python newimpl.py "$@"), then delete.

PRO TIP

Do not rewrite working, stable Bash scripts just because they're Bash. Rewrite when there's a concrete pain point (hard to change, hard to test, hard to debug). "It's ugly" is not a reason; "it broke last week and no one could figure out why" is.

Testing Bash scripts: bats

bats (Bash Automated Testing System) is the closest Bash gets to a test framework:

#!/usr/bin/env bats

setup() {
  source "$BATS_TEST_DIRNAME/../lib/utils.sh"
  tmpdir=$(mktemp -d)
}

teardown() {
  rm -rf "$tmpdir"
}

@test "require_command succeeds for existing command" {
  run require_command ls
  [ "$status" -eq 0 ]
}

@test "require_command fails for missing command" {
  run require_command nonexistent_command_xyz
  [ "$status" -ne 0 ]
  [[ "$output" == *"not found"* ]]
}

@test "retry succeeds on second try" {
  i=0
  fake() { i=$((i + 1)); (( i >= 2 )); }
  run retry 3 fake
  [ "$status" -eq 0 ]
}

Run with bats test/. Works, but writing tests for complex Bash scripts is painful compared to Python pytest. If you find yourself needing more than a handful of tests, consider migration.

Documentation in scripts

Good scripts are self-documenting at the top:

#!/usr/bin/env bash
#
# deploy.sh — deploy app to a given environment
#
# Usage: deploy.sh [options] <env>
#
# Options:
#   -v, --verbose     verbose output
#   -n, --dry-run     show what would happen
#   -f, --force       skip confirmation
#
# Environment variables:
#   DEPLOY_HOST       target host (required)
#   DEPLOY_KEY        path to SSH key (default: ~/.ssh/id_rsa)
#
# Exit codes:
#   0  success
#   2  invalid arguments
#   10 git state is not clean
#   20 tests failed
#   30 deploy failed
#
# Examples:
#   deploy.sh prod
#   deploy.sh --dry-run staging
#   DEPLOY_KEY=~/.ssh/deploy_key deploy.sh staging

Then make --help print the same information:

usage() {
  # Print the header comments (lines starting with '#')
  sed -n '/^#$/,/^[^#]/p' "$0" | sed 's/^# //;s/^#//'
}

This keeps --help and the top-of-file docs in sync, one source of truth.

Common structural mistakes

Mistake 1: global state everywhere

# BROKEN — state scattered across script
current_env=""
deploy_target=""
extra_flags=()

set_env() {
  current_env="$1"
  # ...
}

set_target() {
  deploy_target="$1"
  # ...
}

If the global variables are set in one function and read in another with nothing in between showing the flow, good luck debugging.

Fix: keep state local when possible, and pass explicitly between functions.

Mistake 2: functions with side effects on globals

do_thing() {
  result="computed_value"   # modifies a global
}

do_thing
echo "$result"

Breaks the moment you call do_thing from a pipeline (subshell) or rename anything. Prefer returning via stdout or nameref.

Mistake 3: no separation between orchestration and work

# 200 lines of commands interleaved with if/else and echos
# No functions. No structure.

Break it up. Even if each function is called once, named functions turn a wall of code into a readable outline.

Mistake 4: copy-pasting between scripts

The moment you copy code between scripts, extract it to a library. Every copy will drift, and by the third copy they'll all be slightly different.

Quiz

KNOWLEDGE CHECK

Your Bash deploy script has grown to 600 lines, uses associative arrays for config, computes JSON with jq in most functions, and broke three times last quarter during incidents. What is the right move?

What to take away

Structure growing scripts: strict mode, constants, sourced libraries, functions, main function, entry point guard.
main "$@" guarded by if [[ "${BASH_SOURCE[0]}" == "${0}" ]] lets the script be sourced without running.
Extract shared helpers into lib/*.sh. Source them at the top of each script.
Minimum libraries to have: logging.sh, utils.sh (require_command, retry, confirm).
Document usage at the top of the file. Make --help print the same docs.
Know the signs to leave Bash: >500 lines, nested data structures, heavy jq/awk dependence, real tests required, performance bottlenecks.
When rewriting, keep the orchestration style; move the logic to Python or Go.
bats exists for testing Bash, but it's a signal you're nearing the language's limits.

Next module: debugging shell scripts: the tools, the common failure modes, and ShellCheck.

Working with Files and Paths

Continue

The Debugging Toolkit

←→ navigateM toggle sidebar