Bash & Shell Scripting for Engineers

Arrays and Associative Arrays

Most Bash scripts are broken — in a slow, rare, not-yet-noticed way — because they use strings where they should use arrays. Space in a filename? Quote containing a quote? Empty argument that needs to be preserved? All of these break strings-as-argument-lists. Arrays are the fix.

This lesson is about how to use arrays (indexed and associative), how to expand them safely, and the one operator — "$@" vs "$*" — that catches out almost every engineer the first time they hit it.

KEY CONCEPT

If your variable holds a list of arguments, it must be an array. Always. Strings with spaces-as-separators cannot survive round-trips. Arrays preserve element boundaries through quoting.


Why arrays exist — the case a string cannot handle

# "A list of arguments" stored as a string:
args="--verbose --filename=my report.txt --retries=3"
some_command $args   # Fails: word-splits into
                     # ["--verbose", "--filename=my", "report.txt", "--retries=3"]
                     # The space inside "my report.txt" destroyed the boundary.

some_command "$args" # Fails: passes one giant argument
                     # ["--verbose --filename=my report.txt --retries=3"]

No amount of quoting on a string fixes this. The information "my report.txt is one argument" is gone the moment it becomes part of a space-separated string.

The array solution:

args=(--verbose "--filename=my report.txt" --retries=3)
some_command "${args[@]}"
# Passes three arguments:
# ["--verbose", "--filename=my report.txt", "--retries=3"]

Element boundaries are preserved. The space inside element 2 is part of that element and does not split into two.


Declaring and using indexed arrays

# Declare and assign
fruits=(apple banana cherry)

# Add later
fruits+=(date)

# Access by index
echo "${fruits[0]}"       # apple
echo "${fruits[1]}"       # banana
echo "${fruits[-1]}"      # date (negative index: from end)

# Length
echo "${#fruits[@]}"      # 4

# All elements
echo "${fruits[@]}"       # apple banana cherry date

# All indexes
echo "${!fruits[@]}"      # 0 1 2 3

# Slice
echo "${fruits[@]:1:2}"   # banana cherry (from index 1, 2 elements)

Building up an array

files=()
for path in /var/log/*.log; do
  files+=("$path")       # append each matching file
done

echo "Found ${#files[@]} log files"
do_something "${files[@]}"

+= on an array appends. On a scalar, it appends to the string.

Iterating

# Iterate elements
for fruit in "${fruits[@]}"; do
  echo "$fruit"
done

# Iterate indexes (useful when you need both)
for i in "${!fruits[@]}"; do
  echo "$i: ${fruits[$i]}"
done

Always quote "${arr[@]}" when iterating. Without quotes you get word splitting on each element.


The "$@" vs "$*" distinction

This is the single concept every Bash engineer must understand.

$@ and $* are the positional parameter lists. They behave differently when quoted:

# Script invoked as: ./script.sh "one arg" "two arg" "three"

echo "$#"    # 3
echo "$@"    # one arg two arg three     (unquoted — word-split)
echo "$*"    # one arg two arg three     (unquoted — same)

echo "$@"    # with quotes: ["one arg"] ["two arg"] ["three"]
echo "$*"    # with quotes: ["one arg two arg three"]  (all joined by $IFS)

The rule

  • "$@" — expands to N separate words, one per argument, each individually quoted. Use this for passing args through.
  • "$*" — expands to ONE word containing all arguments joined by the first character of $IFS. Use this when you genuinely want a single joined string (rare).

The unquoted forms ($@ and $*) are essentially always wrong — they both subject to word splitting, and you lose the original argument boundaries.

Wrapper script example

# Correct — passes args through unchanged
wrapper() {
  echo "About to run: $*"
  the_real_command "$@"        # use "$@" for argument forwarding
}

wrapper --input "my file.txt" --retries 3
# About to run: --input my file.txt --retries 3
# the_real_command gets 4 distinct arguments

If you used "$*" instead, the_real_command would get one argument: the literal string "--input my file.txt --retries 3".

KEY CONCEPT

Forwarding arguments? Use "$@". Joining arguments into a message? Use "$*". Anything else is wrong in one of the two cases.


Arrays work the same way

The same @ vs * applies to named arrays:

args=(--verbose "my file.txt" --retries 3)

command "${args[@]}"    # 4 arguments, preserved
command "${args[*]}"    # 1 argument, joined by $IFS

Always "${arr[@]}" for passing arguments. Always.


Building a command dynamically

A common real-world pattern — conditionally include flags:

args=(--input "$INPUT" --output "$OUTPUT")

if [[ "$VERBOSE" == "true" ]]; then
  args+=(--verbose)
fi

if [[ -n "$TIMEOUT" ]]; then
  args+=(--timeout "$TIMEOUT")
fi

if [[ "$DRY_RUN" == "true" ]]; then
  args+=(--dry-run)
fi

my_command "${args[@]}"

This is clean, each element stays its own argument, and you can add/remove flags without string concatenation nightmares.

Compare to the (wrong) string approach:

cmd="my_command --input $INPUT --output $OUTPUT"
[[ "$VERBOSE" = "true" ]] && cmd="$cmd --verbose"
[[ -n "$TIMEOUT" ]] && cmd="$cmd --timeout $TIMEOUT"
eval $cmd    # eval is an anti-pattern; also breaks on spaces

Use arrays. Never eval.


Associative arrays

Bash 4+ introduces associative arrays (hash maps):

declare -A ages

ages[alice]=30
ages[bob]=25
ages["charlie brown"]=42   # keys can contain spaces if quoted

echo "${ages[alice]}"       # 30
echo "${ages[charlie brown]}"   # 42

# Keys
echo "${!ages[@]}"           # alice bob charlie brown
# Values
echo "${ages[@]}"            # 30 25 42

# Iterate
for name in "${!ages[@]}"; do
  echo "$name is ${ages[$name]}"
done

Associative array use cases

Configuration maps:

declare -A regions=(
  [us-east-1]="Virginia"
  [us-west-2]="Oregon"
  [eu-west-1]="Ireland"
)

echo "Region ${1} is in ${regions[$1]}"

Deduplication:

declare -A seen
for item in "${items[@]}"; do
  if [[ -z "${seen[$item]}" ]]; then
    seen[$item]=1
    echo "unique: $item"
  fi
done

Counting:

declare -A counts
for word in one two three one two one; do
  counts[$word]=$(( ${counts[$word]:-0} + 1 ))
done
# counts[one]=3, counts[two]=2, counts[three]=1
WARNING

Associative arrays are Bash 4+. macOS ships Bash 3.2 by default. If your script needs to run on default macOS, you cannot use associative arrays. Consider Bash-for-macOS via Homebrew or rewrite in Python.


Common array mistakes

Mistake 1: unquoted expansion

# Broken — splits elements containing spaces
for f in ${files[@]}; do
  echo "$f"
done

# Correct — preserves spaces in elements
for f in "${files[@]}"; do
  echo "$f"
done

Mistake 2: $array instead of ${array[@]}

arr=(a b c)
echo $arr              # prints "a" — bare $arr is ${arr[0]}
echo "${arr[@]}"       # prints "a b c"

Bare $arr references only the first element. This trips people up once and then they remember forever.

Mistake 3: using strings when you need arrays

# Tempting
TAGS="--tag=v1 --tag=v2 --tag=v3"
docker build $TAGS .         # BROKEN on tags with spaces, gets fragile

# Correct
TAGS=(--tag=v1 --tag=v2 --tag=v3)
docker build "${TAGS[@]}" .

Mistake 4: "${arr[*]}" when you meant "${arr[@]}"

arr=(one "two words" three)
command "${arr[*]}"    # passes ONE argument: "one two words three"
command "${arr[@]}"    # passes THREE arguments: "one" "two words" "three"

The * form is rarely what you want.


Array patterns that matter

Reading lines from a file into an array

# Modern — readarray aka mapfile
mapfile -t lines < file.txt
echo "${lines[0]}"      # first line

# Alternative with a loop (for older Bash)
lines=()
while IFS= read -r line; do
  lines+=("$line")
done < file.txt

-t strips the trailing newline from each line.

Splitting a string on a delimiter

csv="alice,bob,charlie"

IFS=',' read -ra parts <<< "$csv"
# parts=(alice bob charlie)
echo "${parts[1]}"   # bob

read -r -a reads into an indexed array. IFS=',' scopes IFS to this one command.

Joining an array on a delimiter

Bash has no built-in join; the idiomatic workaround:

arr=(one two three)
IFS=','
joined="${arr[*]}"    # "one,two,three"
# Or as a one-liner:
printf '%s,' "${arr[@]}"  # outputs "one,two,three," with trailing comma

Removing an element by index

unset 'arr[2]'    # removes index 2 — but leaves a gap in indexes

# To actually compact the array
arr=("${arr[@]}")   # reassignment rebuilds the index

Passing arrays to functions

Arrays are not first-class values — you can't just pass them as a single argument. Two patterns:

Pattern 1: pass as "$@"

process_items() {
  for item in "$@"; do
    echo "processing: $item"
  done
}

items=(a "b with spaces" c)
process_items "${items[@]}"

This is the simplest and most portable.

Pattern 2: nameref (Bash 4.3+)

process_array() {
  local -n arr="$1"   # nameref — arr now aliases the caller's variable
  for item in "${arr[@]}"; do
    echo "$item"
  done
}

items=(a "b with spaces" c)
process_array items   # pass the NAME, not the expansion

Useful when you need to mutate the caller's array.


Quiz

KNOWLEDGE CHECK

You have files=("a.txt" "my report.pdf" "c.log"). Which loop correctly iterates over all three files, preserving the space in the second one?


What to take away

  • Lists of arguments must be arrays, not space-separated strings.
  • "${arr[@]}" — quoted, with @ — is the universal safe expansion. One word per element, quoting preserved.
  • "${arr[*]}" joins elements into a single string. Use only when you specifically want that.
  • Bare $arr is ${arr[0]} — not the whole array.
  • Declare associative arrays with declare -A (Bash 4+). Use them for maps, counts, and dedup.
  • Build commands as arrays, not as strings: args+=(...), then cmd "${args[@]}". Never use eval.
  • "$@" for forwarding arguments; "$*" when you want a joined string message.

Next lesson: the quoting rules you actually need — single vs double vs none — and the specific cases where getting it wrong breaks things.