Identity and Trust for DevOps Engineers

Trust Models: How Systems Decide What to Believe

A new pod starts up and presents a token. Your service has to decide: trust it or reject it. The token has a signature, but signatures only matter if you trust the signer. The signer's certificate was issued by a CA, but CAs only matter if you trust the CA. Your trust has to bottom out somewhere. Where it bottoms out is the trust model.

Every authentication decision is downstream of a trust model. A trust model is the set of assumptions about which entities are authoritative, how their authority is established, and how trust propagates from them to other entities. There are only a handful of trust models in production use, and almost every identity system you encounter is one of them or a hybrid.

Understanding which trust model a system uses is the difference between debugging an authentication issue with first principles and guessing. This lesson is the four trust models that matter, when each fits, and the failure modes specific to each.

The problem

When system A receives a credential (a token, a certificate, a password) from system B, A has to decide: does this credential prove B is who B claims to be? The answer is "yes if I trust whoever vouched for B." The chain of vouching is the trust model.

Three failure modes drive the importance of getting the trust model right:

1. Trusting too narrowly. Only direct vouching counts. Every pair of systems that needs to talk has to exchange credentials manually. Does not scale: N systems requires N-squared credential exchanges. Most enterprise environments cannot operate this way for more than a few dozen systems.

2. Trusting too broadly. Anyone vouched for by anyone in a giant pool is trusted. A single compromised voucher contaminates everyone. The 2011 DigiNotar breach was this: one root CA was compromised, and every certificate it had ever signed (or could sign in the future) was suddenly suspect, including for sites the CA never legitimately issued certs for.

3. Trust without revocation. A vouching that was correct yesterday might be wrong today (key compromise, intent change, identity rotation). Revocation is structurally hard in distributed systems and most trust models handle it badly.

The motivating scenario at the top is the question every authenticator answers. Token presented; signed by an issuer; issuer's identity proven by a certificate; certificate signed by a CA; CA's authority established by being in a trust store. The chain bottoms out somewhere, and the safety of the system depends on what that "somewhere" is and how trustworthy it actually is.

KEY CONCEPT

The trust model of a system determines what attacks are even possible against it. PKI is vulnerable to root CA compromise. TOFU is vulnerable to first-connection MITM. Web of trust is vulnerable to social engineering of trusted nodes. Federation is vulnerable to issuer compromise. There is no trust model that is invulnerable; there are only trust models that match the threat environment they were designed for.

How it works

Four trust models cover almost every production system.

The four production trust models, side by side

PKI / hierarchical trust

A small set of root authorities; trust descends

ExamplesWeb TLS (Let Encrypt and other CAs), corporate PKI, Kubernetes cluster CA

Trust originA root CA whose cert is in your trust store

ScalingExcellent. One root can sign unlimited subordinates.

Failure modeRoot compromise contaminates everything signed.

RevocationCRLs and OCSP exist but rarely work in practice.

Federated identity

You trust an issuer; the issuer vouches for users

ExamplesOIDC (Sign in with Google), SAML, Workload Identity (GitHub Actions to AWS)

Trust originConfigured trust in a specific OIDC issuer or SAML IdP

ScalingGood. Many relying parties trust a few IdPs.

Failure modeIdP compromise grants the attacker every account on every relying party.

RevocationIdP can revoke at the source; relying parties pick it up on next token validation.

The other two trust models, less formally:

Web of Trust (WoT). Each user vouches for others they know. Trust propagates through chains: I trust Alice, Alice trusts Bob, therefore I might extend partial trust to Bob. PGP/GPG email signing is the canonical example. Almost never used in production infrastructure because it does not scale and does not have a clean revocation story. Worth knowing about for completeness; almost certainly not what your system uses.

Trust On First Use (TOFU). First time you see a credential, accept it as valid; on subsequent connections, refuse if it has changed. SSH host-key checking is the canonical example. Cheap to implement, no infrastructure needed; vulnerable to first-connection MITM. Used in casual or low-stakes contexts; rarely in serious production where the first-connection-window risk is unacceptable.

Most production systems mix and match. Kubernetes uses PKI for the cluster CA, federated identity for OIDC humans, and TOFU-like behavior for the kubelet's initial bootstrap (with a bootstrap token serving as the first-use credential).

In practice

Three patterns to recognize in your existing systems.

Pattern 1: PKI with a private root. Your organization has a root CA whose private key lives in an HSM. That CA signs intermediates for different uses (services, employees, devices). Endpoints trust the root. You have full control over issuance and revocation but have to operate the PKI yourself: key ceremonies, intermediate rotation, CRL publication, OCSP responder uptime.

This is the strongest model for service-to-service identity inside an organization. SPIFFE/SPIRE codifies this pattern with workload identities encoded as SPIFFE IDs.

Pattern 2: Federated identity through a SaaS IdP. Your humans authenticate against Okta, Auth0, Google Workspace, or similar. Every internal app integrates as an OIDC relying party (or SAML SP). The IdP is the single source of truth for who exists, who is in which group, and who has MFA enabled.

This is the modern enterprise standard for human authentication. Pros: one place to revoke, one place to enforce MFA, one place to audit. Cons: the IdP is the highest-leverage system in your infrastructure; an IdP outage takes everything down; an IdP compromise is total.

Pattern 3: Cloud workload identity via federation. Your CI runs in GitHub Actions. The CI needs to deploy to AWS. Instead of long-lived AWS credentials in CI secrets, GitHub Actions presents an OIDC token to AWS STS, which validates the token's signature against GitHub's public keys, then issues short-lived AWS credentials. The trust is configured at AWS via the OIDC provider's URL and an IAM role's trust policy.

This is federated identity applied to workloads. Same trust model as human OIDC, applied to machines. The important part: the AWS account explicitly trusts GitHub's OIDC issuer; that trust is the entire basis for accepting any token signed by GitHub.

Configuration examples

A SPIFFE/SPIRE workload identity (PKI with a private root):

SPIFFE ID for a workload:
spiffe://prod.example.com/ns/payments/sa/api-server

The full chain when this workload presents an SVID:
  1. Workload's leaf cert
  2. Signed by an intermediate CA (issued by SPIRE Server)
  3. Intermediate signed by SPIRE root CA
  4. SPIRE root cert is the trust anchor

An OIDC provider trust configuration in AWS (federated identity):

# Tell AWS to trust GitHub OIDC issuer
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1 \
  --client-id-list sts.amazonaws.com

The thumbprint is the SHA-1 of the OIDC server's TLS certificate. This pins the trust at the network layer; if GitHub rotates the cert (which they did in 2023), every AWS account that pinned the old thumbprint stops accepting tokens until updated.

A Kubernetes ServiceAccount token with audience scoping (mixed model):

apiVersion: v1
kind: Pod
metadata:
  name: vault-client
spec:
  serviceAccountName: vault-client
  containers:
    - name: app
      image: my-app
      volumeMounts:
        - name: vault-token
          mountPath: /var/run/secrets/vault
  volumes:
    - name: vault-token
      projected:
        sources:
          - serviceAccountToken:
              path: token
              expirationSeconds: 3600
              audience: vault

Trust here is layered. The K8s cluster CA signs the JWT (PKI). Vault is configured to trust the cluster OIDC issuer (federation). The audience claim restricts the token use (defense in depth against token replay). Three trust mechanisms compose to produce one access decision.

Common mistakes

Adding a CA to the trust store "temporarily" and forgetting. The trust store grows; auditing what is trusted becomes impossible; one of the long-trusted CAs has a bad day.
Pinning thumbprints without monitoring rotations. The thumbprint expires when the IdP rotates its cert. Without monitoring, you find out via outage.
Federating to an IdP that does not have MFA. Federation is only as strong as the IdP. An IdP without MFA is a username-and-password system in front of all your apps.
Using TOFU for production service-to-service. "We will accept whatever cert this service presents the first time" is not a production trust model. Use real PKI or federated identity.
No revocation plan. What happens when a workload is compromised and you need to invalidate its credentials? In a pure PKI model, you have to publish a CRL and hope clients respect it. In a federation model, you revoke at the IdP. Have an answer.
Trusting a CA that should not exist in your context. A web-PKI CA in your private mTLS chain is dangerous: it can issue valid certs for your internal services if compromised. Use private CAs for private use.
Mixing trust models without understanding. "We use PKI for services and OIDC for humans" is fine if the boundaries are clear. "Sometimes services use OIDC tokens too" is where the confusion starts; clarify which trust model applies where.

INTERVIEW QUESTION

A new service in your platform needs to authenticate to a third-party API. Walk through the trust-model decision: what are the options, what are the trade-offs, and which would you pick for a service that needs to call Stripe API?

Why Identity Is the Hardest Problem in DevOps

Continue

The Identity Stack: From Bytes to Business Logic

←→ navigateM toggle sidebar