Bash & Shell Scripting for Engineers

Functions and Scope

Bash functions look like functions in other languages until you try to "return" something. Then you discover that Bash's notion of "return" is an exit code (0-255), variables leak to the caller by default, and the idiomatic way to hand data back is through stdout and command substitution.

This lesson is about the specific patterns that make Bash functions behave — local variables, return codes vs stdout output, error propagation, and the best ways to structure a function that needs to "return" complex data.

KEY CONCEPT

Bash functions return an exit code, not a value. Data flows out through stdout (captured by $(func)) or through a named output variable (via local -n). Treating functions like they work in Python or Go is the most common source of bugs.


Defining a function

Two syntaxes — both work, the first is more common:

greet() {
  echo "Hello, $1!"
}

function greet() {
  echo "Hello, $1!"
}

Stick with name() { ... }. It works in every Bash version and most POSIX shells.

Arguments are positional

Arguments to a function work like arguments to a script:

greet() {
  local name="$1"
  local age="$2"
  echo "Hello $name, you are $age"
}

greet alice 30
# Hello alice, you are 30

# Inside a function:
#   $0  — still the script name, NOT the function name
#   $1, $2, ... — function arguments
#   $#  — number of arguments
#   $@  — all arguments (use "$@")

Note: $0 inside a function is the script name, not the function name. To get the function name, use ${FUNCNAME[0]}.


local — the single most important function discipline

Variables in Bash are global by default. Unless you explicitly declare them local, they leak to the caller.

counter=0

count_up() {
  counter=$((counter + 1))    # modifies the GLOBAL counter
  temp="intermediate value"    # ALSO leaks to global scope
}

count_up
count_up
echo "counter=$counter"   # counter=2
echo "temp=$temp"         # temp=intermediate value  <- leaked

This is almost never what you want. Fix it with local:

count_up() {
  local temp="intermediate value"
  counter=$((counter + 1))    # still global — fine, that was intentional
}

Inside a function, anything that is not supposed to affect the caller should be local. This is a habit worth building:

process() {
  local input="$1"
  local tmpfile
  local i

  tmpfile=$(mktemp)
  for (( i = 0; i < 10; i++ )); do
    echo "$i" >> "$tmpfile"
  done

  cat "$tmpfile"
  rm "$tmpfile"
}

Every function-local variable explicitly local. The caller is untouched.

WARNING

Forgetting local on loop variables (i, j, file) is the most common leak. If a loop variable was already in use by the caller, the function silently trashes it.

The local and exit code trap

Watch out for this:

get_value() {
  echo "42"
  return 0
}

my_func() {
  local value=$(get_value)     # exit code here is the exit code of `local`, NOT get_value!
  if [[ $? -eq 0 ]]; then       # $? is always 0 here because local succeeded
    echo "got $value"
  fi
}

local x=... is a single command. Its exit code is the success of local, which is essentially always 0. The exit code of get_value is lost.

Fix: split the declaration from the assignment.

my_func() {
  local value
  value=$(get_value)            # now $? is the exit code of get_value
  if [[ $? -eq 0 ]]; then
    echo "got $value"
  fi
}

This is a subtle trap. ShellCheck (covered later) will flag it for you.


Returning values — three patterns

Bash's return takes a 0-255 exit code, not a value. To return data, pick a pattern.

Pattern 1: stdout + command substitution

The most common pattern:

get_hostname() {
  hostname -s
}

host=$(get_hostname)

Pro: clean, composable, works anywhere. Con: you fork a subshell on each call; variables set inside do not escape.

Pattern 2: a named output variable (nameref)

Bash 4.3+ has namerefs:

get_hostname() {
  local -n result=$1
  result=$(hostname -s)
}

get_hostname host
echo "$host"

Pro: no subshell, faster, can return arrays. Con: Bash 4.3+, callers must name a target variable.

Pattern 3: set a global

# By convention, name-spaced with the function name
get_hostname() {
  _HOST=$(hostname -s)
}

get_hostname
echo "$_HOST"

Pro: simple. Con: ugly, pollutes global namespace, requires calling convention.

PRO TIP

For simple values, use stdout + $(). For large data or performance-critical code, use a nameref. Avoid raw globals — they are the pattern that rots scripts over time.


Returning multiple values

Three options:

Option A: space-separated, split on the call side

get_dimensions() {
  echo "1920 1080"
}

read -r w h < <(get_dimensions)
echo "width=$w height=$h"

Option B: global array

get_dimensions() {
  DIMENSIONS=(1920 1080)
}

get_dimensions
echo "width=${DIMENSIONS[0]} height=${DIMENSIONS[1]}"

Option C: namerefs

get_dimensions() {
  local -n w_var=$1
  local -n h_var=$2
  w_var=1920
  h_var=1080
}

get_dimensions w h
echo "width=$w height=$h"

All three are clunky compared to real languages. For anything more complex than 2-3 values, consider whether Bash is the right tool.


Return codes — exit status as "success/failure"

return sets the function's exit code:

is_valid_email() {
  local email="$1"
  if [[ "$email" =~ ^[^@]+@[^@]+\.[^@]+$ ]]; then
    return 0    # success
  else
    return 1    # failure
  fi
}

if is_valid_email "alice@example.com"; then
  echo "valid"
fi

The function itself is used as a condition. This is the most "pythonic" (well, bash-onic) way to do predicates.

Returning numeric data — DON'T

# DON'T do this:
get_count() {
  return 42   # exit codes are 0-255; "42" is ambiguous with error codes
}

get_count
echo "count=$?"

Exit codes are for success/failure. They wrap around at 256. They conflict with error codes. If you need a number, echo it and capture via $().

get_count() {
  echo 42
}
count=$(get_count)

Chaining success/failure

Because functions return exit codes, you chain them with && and ||:

is_prod() { [[ "$ENV" == "prod" ]]; }
is_admin() { [[ "$USER" == "root" ]]; }

if is_prod && is_admin; then
  echo "running in prod as admin"
fi

# Or as a guard:
is_installed "jq" || { echo "jq required"; exit 1; }

The last command's exit code is the function's exit code (unless you explicitly return). So:

# This function returns the exit code of grep
search() {
  grep -q "$1" "$2"
}

if search "ERROR" app.log; then
  echo "found errors"
fi

No explicit return 0/return 1 needed — the function's exit code is grep's.


Pattern: the "fail loudly" function

A function that validates and fails with a clear error message:

require_file() {
  local file="$1"
  if [[ ! -f "$file" ]]; then
    echo "error: file not found: $file" >&2
    return 1
  fi
  if [[ ! -r "$file" ]]; then
    echo "error: file not readable: $file" >&2
    return 1
  fi
  return 0
}

require_file "/etc/config.yml" || exit 1

Note:

  • Error messages go to >&2 (stderr), so they don't pollute stdout that the caller might capture.
  • Return 0 on success, non-zero on failure.
  • Caller uses || to exit the script or handle the failure.
KEY CONCEPT

Always send diagnostic/error output to stderr (>&2). Functions that mix stdout data with stderr messages break when callers capture with $().


Recursion

Bash supports recursion. It is almost never the right choice for real work (slow, easy to blow the stack), but occasionally useful:

factorial() {
  local n="$1"
  if (( n <= 1 )); then
    echo 1
    return
  fi
  echo $(( n * $(factorial $((n - 1))) ))
}

factorial 5    # 120

Each recursive call is a subshell (due to $()), so it is slow. For real recursion, use Python.


Sourcing vs executing — how functions propagate

Functions defined in a script are available only within that script. To use a library of functions across scripts, source the file:

# lib.sh
log()  { echo "[$(date +%H:%M:%S)] $*" >&2; }
die()  { log "ERROR: $*"; exit 1; }
# myscript.sh
source ./lib.sh        # or:  . ./lib.sh
log "starting up"
some_command || die "something failed"

source (or .) reads the file in the current shell. Functions, variables, and aliases it defines are now available.

PRO TIP

Create a small lib.sh for your team's frequently-used helpers (logging, error handling, retries). Source it at the top of every script. Much cleaner than copy-pasting the same helpers.


Function naming conventions

Bash has no access modifiers, but conventions help:

# Public: shortest, most-used names
log()  { ... }
die()  { ... }

# Private: leading underscore (convention only)
_parse_config() { ... }

# Namespaced: prefix by module
config::load()    { ... }
config::validate() { ... }

Bash allows :: in function names. Using it for a namespace is helpful in larger scripts.


Common function mistakes

Mistake 1: No local

do_work() {
  file="my_file"    # leaks to caller
  for i in {1..10}; do   # leaks
    echo "$file $i"
  done
}

Mistake 2: Mixing stdout and errors

get_config_value() {
  if [[ ! -f config ]]; then
    echo "config missing"       # goes to stdout, breaks $()
    return 1
  fi
  echo "value"
}

v=$(get_config_value)
# If config is missing, v = "config missing" — not an empty string and not detected as an error.

Fix: send errors to stderr.

get_config_value() {
  if [[ ! -f config ]]; then
    echo "config missing" >&2
    return 1
  fi
  echo "value"
}

v=$(get_config_value) || { echo "failed"; exit 1; }

Mistake 3: local x=$(cmd) losing exit code

Covered above. Separate local x from x=$(cmd) when you care about the command's exit code.

Mistake 4: returning numeric data via return

count_items() {
  local n=0
  for _ in "$@"; do
    n=$((n + 1))
  done
  return "$n"     # BAD: n may exceed 255; confuses success/failure
}

Fix: echo "$n" and capture with $().

Mistake 5: argument quoting errors

run_for_each() {
  local cmd="$1"
  shift
  for arg in "$@"; do
    "$cmd" "$arg"
  done
}

This passes each arg individually — good. But if the command has its own args to pass, you need to handle that:

run_for_each() {
  local -a cmd_args
  local -a items
  # Parse cmd_args from initial args until a sentinel
  while [[ $# -gt 0 && "$1" != "--" ]]; do
    cmd_args+=("$1"); shift
  done
  shift  # consume the --
  items=("$@")

  for item in "${items[@]}"; do
    "${cmd_args[@]}" "$item"
  done
}

run_for_each grep -r pattern -- dir1 dir2 dir3

Quiz

KNOWLEDGE CHECK

You want a function to return the home directory of a given username — not through stdout but via a named variable, for performance. Which pattern is correct in Bash 4.3+?


What to take away

  • function_name() { ... } — the portable function definition.
  • local every variable that should not escape the function. Doing this consistently avoids a whole category of bugs.
  • return is for exit codes (0-255), not for data. Use stdout + $() for data, or a nameref for performance.
  • Separate local x from x=$(cmd) when you need the command's exit code.
  • Diagnostic output goes to >&2. Keep stdout clean so callers can capture it.
  • Source a lib.sh for shared helpers; don't copy-paste.
  • Bash functions return exit codes, making if my_func; then natural.

Next module: error handling — set -euo pipefail, traps, cleanup, and turning a script into something that fails loudly and cleans up after itself.