Bash & Shell Scripting for Engineers

Loops That Do Not Break

Looping in Bash is where "it works on my machine" meets production. A loop that iterates over filenames works fine when every filename is a.txt, b.txt, c.txt. It breaks the moment someone names a file my report.txt or the data contains a newline.

This lesson covers the loop forms that stay correct under real-world input — filenames with spaces, lines ending in backslashes, CSV rows with embedded newlines. The goal is loops you never have to come back to because edge-case input broke them.

KEY CONCEPT

The default for x in $(cmd); do is almost always wrong. It word-splits and globs. Use while IFS= read -r or for x in glob/* instead. A few memorized idioms cover 95% of real loops.


The for loop — two flavors

Flavor 1: list iteration

for fruit in apple banana cherry; do
  echo "$fruit"
done

Iterates over the literal words after in. Works well when the list is static or comes from a safe source (a glob, an array).

Flavor 2: C-style

for (( i = 0; i < 10; i++ )); do
  echo "iteration $i"
done

C-syntax for counting. Useful when you need an index. No $ needed on variables inside the (( )).

Iterating over arrays

files=("a.txt" "my report.pdf" "c.log")

for f in "${files[@]}"; do
  echo "$f"
done

The quoted "${arr[@]}" form is mandatory. Without quotes, each element gets word-split again.

Iterating over glob matches

for file in /var/log/*.log; do
  echo "$file"
done

This is the idiomatic way to iterate files. The shell expands the glob; each match becomes one iteration. Spaces in filenames work correctly because glob expansion preserves element boundaries.

# Enable nullglob so the loop doesn't run with "*.log" literal when no matches
shopt -s nullglob
for file in /var/log/*.log; do
  process "$file"
done

Without nullglob, if no .log files exist, the loop runs once with $file equal to the literal string *.log. That is almost never what you want.


Why for x in $(cmd) is a trap

The most common broken loop:

# BROKEN for filenames with spaces or newlines
for file in $(ls /tmp); do
  rm "$file"
done

What breaks:

  1. $(ls) word-splits on every IFS character (space, tab, newline).
  2. A filename like my report.txt splits into two "files": my and report.txt.
  3. A filename containing a * gets globbed against the current directory.

The fixes

Fix 1: use a glob directly.

for file in /tmp/*; do
  rm "$file"
done

Fix 2: use while read with null-delimited input for maximum safety.

while IFS= read -r -d '' file; do
  rm "$file"
done < <(find /tmp -maxdepth 1 -print0)

Fix 3: read into an array with readarray/mapfile.

mapfile -t files < <(ls /tmp)
for file in "${files[@]}"; do
  rm "/tmp/$file"
done

The glob fix is almost always the simplest. Reach for find + null-delimited read when you need recursive traversal or filtering.


The while read loop — the workhorse

For reading input line by line:

while IFS= read -r line; do
  echo "got: $line"
done < input.txt

Three pieces that matter:

1. IFS= (empty)

Normally, read strips leading/trailing whitespace. IFS= on the read command line disables that, preserving the line exactly.

# Without IFS=
line="   hello   "
echo "$line" | while read line; do echo "[$line]"; done
# [hello]     <- leading and trailing spaces gone

# With IFS=
echo "$line" | while IFS= read line; do echo "[$line]"; done
# [   hello   ]   <- preserved

2. -r (no backslash interpretation)

Without -r, read interprets \ as an escape character. This breaks lines that legitimately contain backslashes (paths on Windows, regex patterns, etc.)

# Without -r
echo 'hello\nworld' | while read line; do echo "[$line]"; done
# [hellonworld]   <- backslash interpreted as escape, "\n" became "n"

# With -r
echo 'hello\nworld' | while read -r line; do echo "[$line]"; done
# [hello\nworld]  <- backslash preserved as literal

Always use -r. There is no scenario where you want the backslash interpretation.

3. done < input.txt (not piping)

If you pipe into the while loop (cat file | while ...), the loop runs in a subshell and variable changes don't escape. Redirecting (done < file) runs the loop in the current shell.

See the earlier lesson on subshells for the details.

WARNING

while IFS= read -r line is the canonical line-reading idiom. Memorize it. Anything missing -r or IFS= is subtly wrong for real-world input.


Reading into multiple fields

read can split a line into multiple variables:

echo "alice 30 engineer" | while read -r name age role; do
  echo "name=$name age=$age role=$role"
done
# name=alice age=30 role=engineer

The last variable gets the remainder of the line (any extra fields).

With a delimiter

IFS=: while read -r user _ uid _ _ _ shell; do
  echo "$user uses $shell"
done < /etc/passwd

Sets IFS to : for the read, splits each line on :, assigns to positional variables, _ is a convention for "ignore."

Reading CSV (approximately)

# Simple case — no embedded commas or quotes
while IFS=',' read -r col1 col2 col3; do
  echo "$col1 / $col2 / $col3"
done < data.csv

For real CSV (with embedded commas, quotes, multi-line fields) — don't use Bash. Use csvkit, python, or a real CSV tool. Bash's read cannot parse CSV correctly.


Loops over command output — the right pattern

When you need to iterate over command output (not just files), use while read with process substitution:

# Process users from a command
while IFS= read -r user; do
  echo "processing $user"
done < <(getent passwd | awk -F: '{print $1}')

This avoids both:

  1. The for x in $(cmd) word-splitting problem.
  2. The cmd | while subshell-variable-loss problem.

Null-delimited loops — the bulletproof version

For maximum safety against unusual filenames (newlines in names, any special characters), use null-delimited input:

# find: -print0 emits null-separated output
# read: -d '' reads until null byte
while IFS= read -r -d '' file; do
  echo "processing: $file"
done < <(find /path -type f -print0)

This handles any filename except ones with literal \0 in them — which is impossible on Unix filesystems.

When to reach for this:

  • Untrusted input where filenames might contain newlines.
  • Scripts that absolutely must not fail on edge cases.
  • High-integrity operations (rm, chmod) where a wrong filename is catastrophic.

For casual scripts, glob patterns are usually enough.


The continue and break keywords

Both work as expected:

for file in /var/log/*.log; do
  [[ ! -r "$file" ]] && continue   # skip unreadable

  if [[ "$(head -1 "$file")" == "DONE" ]]; then
    break                           # stop after finding a completed one
  fi

  process "$file"
done

With nested loops, continue N / break N jumps N levels:

for a in 1 2 3; do
  for b in x y z; do
    if [[ "$b" == "y" ]]; then
      continue 2   # continue the OUTER loop
    fi
    echo "$a $b"
  done
done
# 1 x
# 2 x
# 3 x

Rarely needed, but useful when you need it.


The until loop

Like while, but loops until the condition is true:

until ping -c1 -W1 server >/dev/null 2>&1; do
  echo "server not ready, waiting..."
  sleep 5
done
echo "server is up"

Equivalent to while ! cond; do — use whichever reads better.


Reading a file into an array — mapfile / readarray

Bash 4+ has mapfile (alias readarray) for reading lines into an array:

mapfile -t lines < file.txt
echo "${#lines[@]} lines"
echo "${lines[0]}"        # first line
echo "${lines[-1]}"       # last line

-t strips the trailing newline from each line. Without -t, each array element includes its \n.

# With a filter callback
mapfile -t logs < <(grep ERROR app.log)

mapfile is faster than a manual while read loop for large files.


Progress reporting in long loops

total=$(find /data -type f | wc -l)
i=0
while IFS= read -r -d '' file; do
  i=$((i + 1))
  if (( i % 100 == 0 )); then
    printf '\r%d / %d (%.0f%%)' "$i" "$total" "$(bc -l <<< "$i/$total*100")"
  fi
  process "$file"
done < <(find /data -type f -print0)
echo

\r returns to the start of the line so you overwrite instead of scrolling. Update every N iterations, not every iteration — otherwise you're paying printf cost per item.


Loop performance tips

1. Minimize forks

# SLOW — forks a process every iteration
for f in *.txt; do
  length=$(wc -l "$f" | awk '{print $1}')
  echo "$f: $length"
done

# FASTER — read once, process in Bash where possible
for f in *.txt; do
  mapfile -t lines < "$f"
  echo "$f: ${#lines[@]}"
done

Inside a hot loop, every subshell or command substitution is a fork. Cut them where you can.

2. Use mapfile for whole-file reads

# SLOW
while IFS= read -r line; do
  lines+=("$line")
done < file.txt

# FASTER
mapfile -t lines < file.txt

3. Batch external calls

# SLOW — invokes grep per file
for f in *.log; do
  grep ERROR "$f"
done

# FAST — one grep invocation over all files
grep ERROR *.log

For real CPU-bound work, Bash is the wrong tool. But for I/O-bound loops, the above tricks help.

PRO TIP

When a Bash loop feels too slow, the question is usually not "how can I speed up the loop" but "why is there a subprocess inside this loop?" Remove the fork; the loop speeds up 10-100x.


A worked example — processing a log directory

A script that rotates old logs:

#!/bin/bash
set -euo pipefail
shopt -s nullglob

log_dir="${1:-/var/log/myapp}"
retention_days=30
archive_dir="$log_dir/archive"

mkdir -p "$archive_dir"

# Find .log files older than retention_days
found=0
while IFS= read -r -d '' file; do
  base="$(basename "$file")"
  dest="$archive_dir/$base.$(date +%Y%m%d).gz"
  gzip -c "$file" > "$dest"
  rm -f "$file"
  found=$((found + 1))
done < <(find "$log_dir" -maxdepth 1 -name "*.log" -mtime +$retention_days -print0)

echo "archived $found log files"

Every idiom from this lesson: nullglob, while IFS= read -r -d '', null-delimited find, safe quoting everywhere, counter increments in the parent shell (no pipeline subshell), error-proof with set -euo pipefail.


Quiz

KNOWLEDGE CHECK

You want to iterate over every .txt file in the current directory, including ones with spaces in the name. Which loop is safest?


What to take away

  • for file in glob/* is the safe, idiomatic file-iteration loop. Add shopt -s nullglob for zero-match safety.
  • while IFS= read -r line; do ...; done < file is the canonical line-reading loop.
  • Never for x in $(cmd). Never cmd | while. Both have quiet bugs.
  • For maximum safety against exotic filenames, use null-delimited: while IFS= read -r -d '' f; do ...; done < <(find ... -print0).
  • mapfile -t arr < file is the fastest way to slurp lines into an array.
  • (( i = 0; i < N; i++ )) for C-style counting loops.
  • Remove subprocess forks from hot loops — they dominate performance.

Next lesson: functions — local scope, return codes vs stdout, and the pattern for "returning" strings from a function.