Linux Fundamentals for Engineers

File Descriptors, Pipes, and Redirection

Someone on your team writes a cron job: run_report.sh > /var/log/report.log 2>&1. It works in staging. In production, the logs capture the program's normal output but miss every error. They "fix" it by flipping to 2>&1 > /var/log/report.log. Now the logs capture everything except errors. They file a support ticket blaming the scheduler.

This is not a scheduler bug. It is a file-descriptor-order bug, and it comes up in some form — redirection order, pipe direction, a program that silently discards stderr, a tee that drops the wrong stream — in essentially every engineer's first year. The whole issue collapses into nothing once you understand what a file descriptor actually is and how the shell reshuffles them before executing your command. This lesson is that understanding.


What a File Descriptor Really Is

A file descriptor (fd) is a small integer that identifies an open file in a process's file descriptor table. Every process has its own table; the kernel maps each entry to a real kernel file object.

          Your process                    Kernel
     ┌──────────────────┐        ┌─────────────────────┐
     │ fd 0 → ─────────┼──────► │ terminal input      │
     │ fd 1 → ─────────┼──────► │ terminal output     │
     │ fd 2 → ─────────┼──────► │ terminal output     │
     │ fd 3 → ─────────┼──────► │ /etc/passwd open RO │
     │ fd 4 → ─────────┼──────► │ TCP socket to :5432 │
     │ ...              │        │ ...                 │
     └──────────────────┘        └─────────────────────┘

Three are always there by convention:

  • fd 0 — stdin (standard input)
  • fd 1 — stdout (standard output)
  • fd 2 — stderr (standard error)

Every other fd you get by calling open(), pipe(), socket(), accept(), etc. The kernel hands you back the lowest unused integer. Close fd 3 and the next open() will hand you 3 back.

# Your shell's file descriptor table
ls -l /proc/$$/fd
# lrwx------ 1 admin admin 64 Apr 19 10:00 0 -> /dev/pts/1
# lrwx------ 1 admin admin 64 Apr 19 10:00 1 -> /dev/pts/1
# lrwx------ 1 admin admin 64 Apr 19 10:00 2 -> /dev/pts/1
# lrwx------ 1 admin admin 64 Apr 19 10:00 255 -> /dev/pts/1

# An nginx master process
ls -l /proc/$(pgrep -f 'nginx: master' | head -1)/fd | head
# 0 -> /dev/null
# 1 -> /dev/null
# 2 -> /var/log/nginx/error.log
# 3 -> anon_inode:[eventpoll]
# 4 -> socket:[123456]
# 5 -> socket:[123460]
# ...

Everything you see in /proc/[pid]/fd is a file descriptor — and as discussed in Module 1, file includes sockets, pipes, kernel event queues, and so on. There is one uniform mechanism.

KEY CONCEPT

A file descriptor is just a small integer indexing into a per-process table. Everything else — "stdin is keyboard input," "pipelines run in parallel," "redirection sends output to a file" — is the shell arranging those three integers (0, 1, 2) to point at different kernel objects before it runs your program. The program itself only knows about the integers. This is the single most important idea for understanding shell redirection.


stdin, stdout, stderr — Why Three?

Every Unix program inherits three fds from its parent. The convention, which has held since 1970, is:

  • fd 0 (stdin): where the program reads input from
  • fd 1 (stdout): where the program writes normal output
  • fd 2 (stderr): where the program writes errors and diagnostics

Why separate stdout and stderr? Because you want to pipe normal output to the next command but still see errors on your screen. grep should pipe matches into wc, but if grep cannot open a file it should tell you — not shove "grep: file.txt: Permission denied" into the count.

# Happy path — normal output goes through the pipe
grep pattern file.txt | wc -l

# Error — grep writes "No such file" to stderr, which bypasses the pipe
grep pattern nonexistent.txt | wc -l
# grep: nonexistent.txt: No such file or directory    <- stderr, visible
# 0                                                    <- wc got nothing

# Prove it: suppress stderr to see just the piped stdout
grep pattern nonexistent.txt 2>/dev/null | wc -l
# 0
PRO TIP

When you pipe a command and "it is not working but it seems to print something," check whether the output is going to stderr. The pipe only carries stdout. some_command 2>&1 | less (merging stderr into stdout before the pipe) is the standard way to capture everything.


Redirection: The Shell's Fd Shuffle

Redirection operators in the shell (>, <, >>, 2>, 2>&1, <<, <<<, |, |&) are instructions to the shell about how to set up the fd table before execve()-ing your program. The program itself has no idea the shell did anything.

Here is what each operator does in terms of fds.

> file and >> file — redirect stdout to a file

ls > /tmp/out.txt

What the shell does:

  1. Fork.
  2. In the child, open("/tmp/out.txt", O_WRONLY|O_CREAT|O_TRUNC, 0644) — gets fd 3.
  3. dup2(3, 1) — fd 1 now points at the file; old fd 1 is closed.
  4. close(3) — we do not need the extra fd anymore.
  5. execve("ls", ...)ls runs, writes to fd 1, which is now the file.

>> is identical except with O_APPEND instead of O_TRUNC, so the file is not emptied first.

< file — redirect stdin from a file

wc -l < /etc/passwd

Same dance, but on fd 0:

  1. open("/etc/passwd", O_RDONLY) — fd 3.
  2. dup2(3, 0) — fd 0 is now the file.
  3. close(3); execve(...).

2> file — redirect stderr to a file

make 2> build-errors.log
  1. open(...) — fd 3.
  2. dup2(3, 2) — fd 2 points at the file.
  3. close(3); execve(...).

2>&1 — redirect stderr to wherever stdout currently points

This is the one that trips everyone up. & means "the thing on the right is an fd, not a filename." 2>&1 means dup2(1, 2) — copy fd 1's current destination into fd 2.

Order matters because dup2 copies wherever fd 1 points at the moment it runs.

# CORRECT: send stdout to file, then point stderr at whatever stdout is pointing at (the file)
command > file 2>&1
# Shell does:
#   1. open(file) -> fd 3
#   2. dup2(3, 1)     <- stdout now points at file
#   3. dup2(1, 2)     <- stderr now points at file (wherever fd 1 currently is)

# WRONG: dup stderr to stdout (still the terminal) before stdout is redirected
command 2>&1 > file
# Shell does:
#   1. dup2(1, 2)     <- stderr now points at terminal (wherever fd 1 currently is)
#   2. open(file) -> fd 3
#   3. dup2(3, 1)     <- stdout now points at file; but stderr still points at terminal!
# Result: stdout goes to file, stderr goes to terminal. Probably not what you want.
WARNING

2>&1 means "fd 2 now points to whatever fd 1 currently points at" — it is a snapshot taken at that moment, not an alias. If you later change fd 1, fd 2 does not follow. This is why >file 2>&1 works and 2>&1 >file silently fails. Memorize the correct form: redirect the destination first (stdout), then alias stderr to it.

&>file and |& — shortcuts

Bash and zsh both support &>file as a shortcut for >file 2>&1:

command &> combined.log         # redirect both stdout and stderr to a file
command |& less                 # pipe both stdout and stderr into less

<< (heredoc) and <<< (herestring)

# Heredoc: stdin is the block between the two markers
cat <<EOF
line 1
line 2
EOF

# Herestring: stdin is one line (with a trailing newline)
wc <<< "hello world"
# 1 2 11

The shell implements heredocs by writing the content to a pipe or temp file and dup2-ing it to fd 0.


Pipes: fd-to-fd Wiring Between Processes

A pipe is a kernel-managed, in-memory circular buffer with two ends. The pipe() syscall creates one and returns two fds: the read end and the write end. Anything written to the write end can be read from the read end.

When you type ls | grep foo, the shell:

  1. Calls pipe() — gets back fds 3 (read end) and 4 (write end).
  2. Forks for ls:
    • In the child: dup2(4, 1) (ls's stdout is now the pipe's write end).
    • close(3) and close(4) in the child (not needed after dup).
    • execve("ls", ...).
  3. Forks for grep:
    • In the child: dup2(3, 0) (grep's stdin is now the pipe's read end).
    • close(3) and close(4).
    • execve("grep", ["grep", "foo"], ...).
  4. In the shell: close(3) and close(4), then wait() for both children.

Two processes, one pipe, two fds — and the shell wrote that choreography for you.

How the shell sets up `ls | grep foo`

Click each step to explore

Two critical implications:

  • Pipes run in parallel. ls and grep execute simultaneously. A slow consumer backpressures the producer (which blocks in write() when the pipe buffer fills).
  • Pipe buffer is finite. Linux defaults to 64 KB. A producer that writes 100 MB before the consumer reads anything will block at 64 KB until the consumer catches up.
# See the pipe buffer size
getconf PIPE_BUF
# 4096

# Pipes can be named (fifos, see Module 1)
mkfifo /tmp/myfifo

The Classic Redirection Cheat Sheet

# Stdout
cmd > file              # stdout to file (truncate)
cmd >> file             # stdout to file (append)
cmd > /dev/null         # discard stdout

# Stderr
cmd 2> file             # stderr to file
cmd 2>> file            # stderr to file, append
cmd 2> /dev/null        # discard stderr ("shut up")

# Both — three ways
cmd > file 2>&1         # the explicit form
cmd &> file             # bash shortcut
cmd >& file             # csh-style, also supported by bash

# Swap stdout and stderr (useful to grep on stderr only)
cmd 3>&1 1>&2 2>&3 | grep something

# Read from a file (same as stdin < file)
cmd < input.txt

# Heredoc — stdin is the block
cat <<EOF
multi-line content
inline $variables still expand
EOF

# Heredoc without expansion — quote the delimiter
cat <<'EOF'
$variables are literal here
EOF

# Tee — split one stream into a file and stdout
cmd | tee file          # write to file AND continue down the pipe
cmd | tee -a file       # append
cmd |& tee file         # both stdout and stderr

# Process substitution — use a command as if it were a file
diff <(cmd1) <(cmd2)    # compare two commands' outputs without temp files
paste <(cut -f1 a.tsv) <(cut -f2 b.tsv)
PRO TIP

tee is underrated. When you are running a long command and want to see the output and save it, cmd |& tee log.txt is almost always the right answer. When you want to save interesting output to a file mid-pipeline while still piping it downstream, cmd1 | tee interesting.txt | cmd2 works. When you need to write to a root-owned file through sudo without running the whole pipeline as root, cmd | sudo tee /etc/protected.conf > /dev/null is the idiomatic way.


Process Substitution: Commands That Look Like Files

<(cmd) creates a pipe and returns a path (like /dev/fd/63) that refers to it. Reads from that path read the command's output.

# Compare the output of two commands without temp files
diff <(kubectl get pods -A -o name | sort) <(cat expected-pods.txt | sort)

# Feed two streams into paste
paste <(cut -f1 left.csv) <(cut -f2 right.csv)

# Check what a command would print, read by another tool
grep ERROR <(journalctl -u myservice --since '1 hour ago')

Output redirection with >(cmd) is the mirror:

# Write stdout to two destinations at once
cmd > >(compress > archive.gz) 2> >(logger -t myapp)

Process substitution works because the shell exposes the pipe's fd under /dev/fd/, which is a symlink to /proc/self/fd/ — Linux's standard trick for referring to fds by path.


exec: Permanently Reshuffle the Current Shell's Fds

exec without a command re-wires the shell itself instead of a child:

# In a script: capture everything the rest of the script does
exec > /var/log/myapp-install.log 2>&1
# From here on, every command in the script writes to that log

# Open a new fd for reading
exec 3< /etc/hosts
read -u 3 first_line
echo "first line: $first_line"
exec 3<&-   # close fd 3

# Bash magic: talk to a TCP server through an fd (covered in Module 1)
exec 5<>/dev/tcp/example.com/80
printf 'GET / HTTP/1.0\r\n\r\n' >&5
cat <&5

This is how service scripts and installers typically redirect all their output to a log file: one exec >log 2>&1 at the top, and the rest of the script inherits that redirection.


Debugging Fd Problems

# What fds does this process have open, and where do they point?
ls -l /proc/$PID/fd

# Extra info (position, flags) for a specific fd
cat /proc/$PID/fdinfo/3
# pos:    4096
# flags:  02101002
# mnt_id: 36
# ino:    123456

# Systemwide: every open file on the system
sudo lsof | wc -l

# Who is using a particular file?
sudo lsof /var/log/syslog

# Who is listening on a TCP port?
sudo lsof -iTCP:80 -sTCP:LISTEN

# What fd leaked? Check the count over time
watch "ls /proc/$PID/fd | wc -l"
WAR STORY

An engineer's Go service worked fine under 100 req/s but ran out of file descriptors at 500 req/s. ulimit -n was 1024 — plenty for 500 concurrent connections, they thought. Turns out every HTTP response body that was not explicitly Close()-ed was leaking a file descriptor per request. Watching ls /proc/$PID/fd | wc -l climb monotonically during load was the giveaway. One defer resp.Body.Close() fixed it. Every language has its own version of this bug; ls /proc/[pid]/fd | wc -l is the universal diagnostic.


Key Concepts Summary

  • A file descriptor is a small integer indexing into a per-process table maintained by the kernel. Everything you read, write, or send goes through an fd.
  • fd 0, 1, 2 are stdin, stdout, stderr by convention — every Unix program assumes it.
  • Redirection operators rearrange the fd table before exec. >, <, >>, 2>, 2>&1, | are all shell-side reshuffles, not something the target program knows about.
  • Order matters with 2>&1. It is a snapshot of where fd 1 points at that moment. Put 2>&1 after the stdout redirection.
  • A pipe is two fds around an in-memory buffer. Pipes run the two commands in parallel with kernel-managed backpressure.
  • tee splits a stream; process substitution makes commands look like files. Both are powerful and underused.
  • exec without a command rewires the current shell's fds. Good for logging installers and daemon-style scripts.
  • File descriptor leaks are usually visible in /proc/[pid]/fd. Watch the count over time under load.
  • lsof + ss + /proc/[pid]/fd are the core fd debugging tools. Every "who has this file open" or "why is this port in use" question has an answer in one of them.

Common Mistakes

  • Writing cmd 2>&1 > file when you mean cmd > file 2>&1. The wrong order sends stderr to the terminal and only stdout to the file.
  • Running a pipeline and then asking "why do I not see the error from the first command?" The pipe only carries stdout — use |& or 2>&1 | to include stderr.
  • Using > file to write to a root-owned file with sudo: sudo cmd > /etc/x runs the redirection as you, not as root. Use cmd | sudo tee /etc/x > /dev/null instead.
  • Forgetting that > file truncates the file immediately, even before the command runs. command-that-fails > important.log will leave you with an empty important.log.
  • Thinking ls | grep foo is faster than ls -l | grep foo | awk ... because "it is simpler." Pipes cost essentially nothing — the kernel handles them efficiently. Chain as many stages as clarity requires.
  • Using for f in $(ls) and getting bitten by filenames with spaces. find with -print0 and xargs -0, or for f in *, or reading a null-delimited stream, all avoid this.
  • Leaking file descriptors in long-running programs. Every language has a defer close, try-with-resources, or with block; use them.
  • Not realizing /dev/fd/N is how bash exposes pipes as paths for process substitution. It is a kernel symlink trick, not magic.
  • Using heredocs with <<EOF and then being surprised $VARIABLE expanded. Quote the delimiter (<<'EOF') to suppress expansion.

KNOWLEDGE CHECK

You run `./my_script.sh > all.log 2>&1 | less`. The script prints to both stdout and stderr. What does `less` actually see, and what ends up in all.log?