Command Substitution and Subshells
Command substitution is how you capture the output of a command into a variable or an argument. Subshells are what Bash silently creates to run many of those commands. Understanding the relationship between them explains a whole class of bugs that otherwise look like "Bash is broken."
The canonical example of a question engineers ask after an hour of debugging:
"Why does my variable keep its old value after the loop? I set it inside the pipeline!"
The answer is always "because a subshell ran your loop." This lesson is about when that happens and how to work around it.
Command substitution always runs in a subshell. A subshell is a separate process with its own memory. Changes it makes cannot reach the parent. If you want to return data from a subshell, it has to come out through stdout.
$(cmd) vs backticks
Two syntaxes for the same thing:
# Modern — preferred
files=$(ls /tmp)
# Legacy — avoid in new scripts
files=`ls /tmp`
$(...) is better because:
- It nests cleanly.
$(a $(b))just works. Backtick nesting gets uglier with every level. - Quoting inside is intuitive. Inside
$(...), quotes behave like in a normal command. Inside backticks, backslashes have to be escaped in a way that is hard to reason about. - It is visually distinct from a plain single quote, which is often mistaken for a backtick in monospace fonts.
Always use $(...) for new code. Backticks work, but they are legacy.
What command substitution actually does
date_str=$(date +%Y-%m-%d)
Three things happen:
- Bash forks a subshell.
- The subshell runs
date +%Y-%m-%d. - The subshell's stdout is captured (with trailing newlines stripped) and that becomes the value of
date_str.
The subshell then exits. The variable gets the output. Done.
echo "$date_str" # 2026-04-21
Two important details:
- Trailing newlines are stripped. If the command's output ends with
\n, that newline is not in the variable. Embedded newlines inside the output are preserved. - Exit status of the substitution is the exit status of the inner command.
$?immediately after captures it.
files=$(ls /nonexistent 2>&1)
echo "exit: $?" # nonzero — ls failed
echo "output: $files"
Subshells — when Bash forks without you asking
Command substitution always creates a subshell. So do several other constructs you might not realize:
Not a subshell: { commands; } (braces). Braces group commands but run them in the current shell. This is the construct to reach for when you want grouping but need variable changes to stick.
The classic pipeline bug
Here is the canonical case that catches every Bash engineer exactly once:
count=0
ls *.txt | while read -r file; do
count=$((count + 1))
done
echo "processed $count files"
# Output: processed 0 files
What happened:
- The pipeline runs
while read ...in a subshell. - Inside the subshell,
countis incremented. - The subshell exits. Its changes are gone.
- The parent shell's
countis still 0.
Fix 1: process substitution
count=0
while read -r file; do
count=$((count + 1))
done < <(ls *.txt)
echo "processed $count files"
# Output: processed 3 files
<(cmd) runs the command with its output connected to a file descriptor, and the while runs in the current shell. This is almost always the right fix.
Fix 2: shopt lastpipe (bash only)
shopt -s lastpipe
count=0
ls *.txt | while read -r file; do
count=$((count + 1))
done
echo "processed $count files"
# Output: processed 3 files
lastpipe makes the last command in a pipeline run in the current shell (not a subshell). Only works when job control is off, which is the default in scripts.
Fix 3: restructure to glob directly
count=0
for file in *.txt; do
count=$((count + 1))
done
Often the cleanest — skip the pipeline entirely.
ls *.txt | while ... is a code smell. Every time you see this pattern, check whether the body is modifying variables. If it is, refactor.
The "variable set but empty" mystery
A variant of the pipeline bug:
first_line=""
head -1 somefile.txt | read first_line
echo "got: [$first_line]"
# Output: got: []
read ran in a subshell because it is part of a pipeline. The variable got set in the subshell. The parent sees nothing.
Fix:
first_line=$(head -1 somefile.txt)
Or:
read first_line < <(head -1 somefile.txt)
Or using a here-string for literal input:
read first_line <<< "$some_variable"
Nesting command substitution
$(...) nests cleanly:
# Inside-out: basename "$(pwd)" gets evaluated, result used by echo
echo "Current dir: $(basename "$(pwd)")"
Note the double quoting around the inner substitution. "$(pwd)" preserves spaces in the path. This is the point where many engineers mis-quote; when in doubt, quote every expansion.
Multi-level example:
# Last 5 lines of the largest log file in /var/log
tail -n 5 "$(ls -S /var/log/*.log | head -1)"
Readable enough. With backticks the same nesting gets ugly and error-prone to escape.
The trailing-newline gotcha
Command substitution strips trailing newlines. This is sometimes what you want and sometimes a surprise:
filename_with_newline=$(echo -e "hello\n")
# filename_with_newline = "hello" — the trailing newline was stripped
filename_with_newline_in_middle=$(echo -e "hello\nworld\n")
# The final trailing newline is stripped, but the embedded newline is kept
If you specifically need to preserve a trailing newline (rare), append a sentinel:
output=$(command; printf x)
output=${output%x} # remove the sentinel
This trick also matters when capturing binary data or data where trailing whitespace is significant. For most take-the-output-of-a-command cases, the default behavior is what you want.
Arithmetic substitution
Related but different: $((expr)):
count=$((count + 1))
half=$((total / 2))
x=$((x * 2 + 1))
This is NOT command substitution. It is arithmetic expansion — evaluated inside Bash, no subshell. Prefer it over expr and $(expr ...) which is slower and runs an external program.
# SLOW (old style) — forks for expr
count=$(expr $count + 1)
# FAST — arithmetic expansion
count=$((count + 1))
Inside $(( ... )) you also do not need $ in front of variable names:
total=$((sum + 5)) # same as $((${sum} + 5))
(...) — the explicit subshell
Parentheses group commands and run them in a subshell. Useful for:
# Temporarily change directory without affecting the parent
(
cd /tmp
do_stuff_here
)
# We are back in the original directory, no cd .. needed
Compare to braces:
# Braces run in the current shell — the cd WOULD affect the caller
{
cd /tmp
do_stuff_here
}
# Now we are in /tmp, and everything else in the script runs from /tmp
Use (...) when you want isolation. Use { ...; } when you just want grouping.
Braces are syntax for grouping. Parens are execution in a subshell. They look similar; they behave very differently.
Background jobs and subshells
cmd & runs cmd in the background, in a subshell:
result=""
long_running_command &
pid=$!
# We cannot easily get $result out of the background process;
# it ran in a subshell. Communicate via a file or a pipe instead.
This is why background jobs cannot return values. Use a temp file, a named pipe, or a process substitution.
The useless-use-of-cat pattern
One of the most common anti-patterns:
# Useless subshell + useless process
grep pattern $(cat file.txt) # grep over the contents as filenames?
# Probably wanted:
grep pattern < file.txt # grep over file.txt contents
# or
grep pattern file.txt # same thing
Most "cat into something" pipelines do not need the cat. It is a subshell and a process hop for no benefit.
Debugging subshell-state issues
When you suspect a subshell is eating your changes, prove it:
# Print the shell level — higher means deeper in nested subshells
echo "shell level: $BASH_SUBSHELL"
$BASH_SUBSHELL is 0 in the top-level shell, 1 inside the first subshell, and so on. Sprinkling this in your script tells you exactly which branch is in a subshell.
count=0
echo "level: $BASH_SUBSHELL count=$count"
ls *.txt | while read -r file; do
echo "level: $BASH_SUBSHELL count=$count"
count=$((count + 1))
done
echo "level: $BASH_SUBSHELL count=$count"
# Output:
# level: 0 count=0
# level: 1 count=0 <- we are in a subshell
# level: 1 count=1
# level: 1 count=2
# level: 0 count=0 <- back in parent, changes gone
Seeing BASH_SUBSHELL flip from 0 to 1 is the smoking gun.
Quiz
You write this script, expecting it to print a count of 3, but it prints 0. Why? count=0; ls *.txt | while read -r file; do count=$((count+1)); done; echo count=$count
What to take away
- Prefer
$(...)to backticks. Same semantics, nests cleanly, easier to quote. - Command substitution always runs in a subshell. Its exit status is available via
$?. - Subshells cannot modify parent-shell variables. This is the root cause of the pipeline-variable bug.
cmd | whileruns thewhilein a subshell. Usewhile ... done < <(cmd)to avoid it.(...)runs in a subshell (useful for isolation).{ ...; }runs in the current shell (useful for grouping).$(( ... ))is arithmetic, not command substitution — no subshell. Prefer it overexpr.$BASH_SUBSHELLis a debugging aid that tells you your current subshell depth.
Next lesson: the full parsing order — brace → tilde → parameter → command substitution → word splitting → globbing. Seeing the order makes several otherwise-confusing behaviors click.