Observability Fundamentals for Engineers

Structured Logging

The fastest way to identify a team that will struggle to debug production incidents: look at their logs. If they are unstructured strings like "request finished in 234ms", debugging is a game of grep. If they are JSON with fields, it is a game of queries.

This lesson covers why structured logging has become table stakes, how to do it right in the services you own, and what to include in every log record so that future-you can answer questions you have not yet thought of.

KEY CONCEPT

Logs are a database you write once and query many times. The choice is whether to write them in a format a machine can index, or in a format only a human can read. Structured logging is simply picking the former.


Unstructured vs structured — the difference

Traditional log line:

2026-04-19 14:23:01 INFO  request to /api/orders from user 42 finished in 234ms status 200

Structured equivalent:

{
  "timestamp": "2026-04-19T14:23:01.234Z",
  "level": "info",
  "message": "request completed",
  "service": "api",
  "path": "/api/orders",
  "method": "GET",
  "user_id": "42",
  "duration_ms": 234,
  "status": 200,
  "trace_id": "5e1f9c2b8a6d4e7f",
  "request_id": "req_7Hn2k9L"
}

Both contain the same information. The structured version lets you:

  • Filter by user_id = "42" without regex.
  • Aggregate avg(duration_ms) by path.
  • Correlate with a trace by trace_id.
  • Build dashboards without parsing.

With text logs, all of this requires regex, and regex fails the moment someone adds a new field or changes the wording.


Why unstructured logs are a dead end

The problem with text logs is not that they are hard to read — humans read them fine. The problem is that log volumes at modern scale make human reading impossible. A mid-sized service generates millions of log lines per day. No human is grepping that.

Once you accept that logs must be queried by machines, everything that makes text logs nice for humans (prose, variable formatting, embedded values) becomes a liability.

WARNING

Every regex over production logs is an incident waiting to happen. Regexes break silently when field order changes, when null values appear, when a new version adds a field in the middle. Structured logging makes field access explicit and regex-free.


The anatomy of a good log record

A well-designed log record has five categories of fields:

1. TIMINGtimestamp (RFC3339, UTC, nanosecond precision if possible)2. SEVERITY + MESSAGElevel (debug/info/warn/error), message (short, constant string)3. IDENTITYservice, environment, version, host/pod — where this log came from4. CORRELATIONtrace_id, span_id, request_id — connect logs across services and to traces5. CONTEXTthe actual event data — user_id, path, duration_ms, status, error_type, ...

If a log record is missing any of the first four categories, it will be painful to work with during an incident.


The message field — constant, not formatted

This is the subtlest rule in structured logging. The message field should be a constant string — the same value for every log of that type. All the variable data goes into separate fields.

// WRONG — message contains variable data
{"message": "request to /api/orders from user 42 finished in 234ms"}

// RIGHT — constant message, variable data in fields
{
  "message": "request completed",
  "path": "/api/orders",
  "user_id": "42",
  "duration_ms": 234
}

Why constant? Because you want to count or group by message:

count(message="request completed") by service

If the message has the path embedded, every path is a different message, and you cannot aggregate anything.

PRO TIP

Think of message as an event type, not a human-readable sentence. "request completed", "db query failed", "cache miss", "deploy started" — short, descriptive, constant.


Correlation IDs — the thing that makes logs useful

A trace_id (or request_id) is a unique identifier attached to every log line produced while handling a single request. It is what lets you go from:

"the service is slow"

to:

"request req_7Hn2k9L hit the API gateway at 14:23:01, queried postgres for 850ms, then waited 400ms on the recommendation service before returning"

Without correlation IDs, you cannot reconstruct what happened to any specific request. With them, every question becomes a query.

How correlation IDs flow

  1. An inbound request hits the first service (usually the API gateway or edge proxy).
  2. If the request already has a trace_id header, use it. Otherwise, generate one.
  3. Attach it to the request context.
  4. Every log line emitted while handling this request includes the trace_id.
  5. Every outbound HTTP call propagates the trace_id as a header.
  6. The next service does the same thing.
KEY CONCEPT

The standard header is traceparent (W3C Trace Context). OpenTelemetry handles this automatically across most frameworks. We will cover this in detail in Module 4.

What a trace_id looks like

// Request arrives at service A
{"service": "api-gateway", "trace_id": "5e1f9c2b8a6d4e7f", "message": "request received", "path": "/orders"}

// A calls B
{"service": "api-gateway", "trace_id": "5e1f9c2b8a6d4e7f", "message": "outbound call", "target": "orders-service"}

// Request arrives at service B
{"service": "orders-service", "trace_id": "5e1f9c2b8a6d4e7f", "message": "request received", "path": "/orders"}

// B queries the database
{"service": "orders-service", "trace_id": "5e1f9c2b8a6d4e7f", "message": "db query executed", "duration_ms": 23}

// Response flows back
{"service": "orders-service", "trace_id": "5e1f9c2b8a6d4e7f", "message": "request completed", "status": 200}
{"service": "api-gateway", "trace_id": "5e1f9c2b8a6d4e7f", "message": "request completed", "status": 200}

Query for trace_id = "5e1f9c2b8a6d4e7f" and you get the full story.


Logging in Go with slog

Go 1.21+ ships log/slog in the standard library. It was designed specifically for structured logging and it is the right default for new Go services:

import "log/slog"

// At startup — configure the global logger
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
    Level: slog.LevelInfo,
}))
slog.SetDefault(logger)

// Throughout the codebase
slog.Info("request completed",
    "path", r.URL.Path,
    "method", r.Method,
    "status", rw.StatusCode,
    "duration_ms", dur.Milliseconds(),
    "user_id", userID,
)

Output:

{"time":"2026-04-19T14:23:01Z","level":"INFO","msg":"request completed","path":"/api/orders","method":"GET","status":200,"duration_ms":234,"user_id":"42"}

For request-scoped context (like trace_id), use slog.With:

reqLogger := slog.With(
    "trace_id", traceID,
    "request_id", requestID,
)
ctx = context.WithValue(ctx, "logger", reqLogger)

// Later, anywhere in the request:
reqLogger.Info("cache miss", "key", cacheKey)
// -> includes trace_id + request_id automatically

Logging in Python with structlog

Python's stdlib logging module was designed for text logs. structlog is the de facto choice for structured logging:

import structlog

structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.add_log_level,
        structlog.processors.JSONRenderer(),
    ],
)

log = structlog.get_logger()

# At request entry, bind context that stays for the whole request
structlog.contextvars.bind_contextvars(
    trace_id=trace_id,
    request_id=request_id,
    user_id=user_id,
)

# Throughout the request handler
log.info("request completed",
    path=request.path,
    method=request.method,
    status=response.status,
    duration_ms=duration_ms,
)

contextvars works with Python's async model — the bound context follows the request across awaits.


Logging in Node with pino

import pino from 'pino';

const logger = pino({
  level: 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
  timestamp: pino.stdTimeFunctions.isoTime,
});

// Per-request child logger with trace context
app.use((req, res, next) => {
  req.log = logger.child({
    trace_id: req.headers['traceparent'] || crypto.randomUUID(),
    request_id: crypto.randomUUID(),
  });
  next();
});

// In handlers
req.log.info({
  path: req.path,
  method: req.method,
  status: res.statusCode,
  duration_ms: Date.now() - start,
}, 'request completed');

What NOT to log

Just because you can log something does not mean you should. Bad things to log:

  1. PII — emails, names, phone numbers, addresses. Violates GDPR / CCPA and ends up in Splunk where everyone can see it.
  2. Secrets — API keys, passwords, tokens, signed URLs. Logs get shipped to third-party services.
  3. Full request / response bodies — unbounded size, often contain PII.
  4. Stack traces at info level — log them at error level, and only once per error, not repeatedly.
  5. Every function entry / exit — use traces, not logs, for that.
WARNING

Treat logs as a public record of what happened. Anything sensitive gets redacted at the source — not in post-processing. Once it is in your log aggregator, assume someone will find it.


Redaction patterns

Most structured loggers support field-level redaction:

// Go — wrap secret types so they serialize as "***"
type Secret string
func (s Secret) LogValue() slog.Value {
    return slog.StringValue("***")
}

slog.Info("user authenticated", "user_id", userID, "token", Secret(token))
// -> {"user_id":"42","token":"***"}
# Python with structlog — a custom processor
def redact_secrets(logger, method_name, event_dict):
    for key in ("password", "token", "api_key", "authorization"):
        if key in event_dict:
            event_dict[key] = "***"
    return event_dict

The formatting rules

  • One JSON object per line. Never pretty-print. Log shippers expect newline-delimited JSON.
  • UTC timestamps, ISO 8601 / RFC 3339. Nanosecond precision if your language supports it.
  • Stable field names. user_id, not sometimes userId and sometimes user.
  • Snake_case consistently. Or camelCase consistently. Pick one project-wide.
  • Types matter. "duration_ms": 234 (number), not "duration_ms": "234" (string).

Quiz

KNOWLEDGE CHECK

You are adding logging to a service that handles payment transactions. Which of these is the correct log record?


What to take away

  • Structured logs (JSON) are the only viable format at scale. Text logs force regex and break silently.
  • A good log record has five parts: timing, severity, identity, correlation, and context.
  • The message field should be a constant string — the event type, not a formatted sentence.
  • Every log emitted during a request should carry the trace_id. That is what makes cross-service debugging possible.
  • Never log PII, secrets, or full bodies. Redact at the source.
  • Go: log/slog. Python: structlog. Node: pino. These are the defaults — do not write your own logger.

Next lesson: log levels — what they mean, what to use when, and how to keep noise out of your alert pipeline.