Advanced|18 hours|36 lessons

Kubernetes Architecture & Chaos

How Kubernetes actually works under the hood, from API server request lifecycle to etcd Raft to the scheduler framework, paired with chaos engineering reasoning that turns architectural knowledge into operational confidence. Built for the interview question "walk me through what happens when you create a pod" and the production question "how do we test resilience without breaking customers?"

Text-based, no videos
12 modules, 36 lessons
Lifetime access

What you'll learn

The control-plane / data-plane split and why every Kubernetes component fits into one or the other
API server internals: request lifecycle, admission controllers, performance under load
etcd architecture: Raft consensus, watch streams, the failure modes that take clusters down
The scheduler framework: predicates, priorities, profiles, and scaling past 10,000 pods
How controllers and operators reconcile state, and what the controller-runtime library does for you
The kubelet's pod lifecycle, from API watch to running container via CRI
Networking internals: kube-proxy, CNI dataplane (Calico, Cilium), DNS and service discovery at scale
Mapping your cluster's failure domains: what survives an apiserver outage, an etcd loss, a control plane gone
Chaos engineering fundamentals: hypothesis-driven experiments, blast-radius scoping, game days
Pod, node, and cluster-level chaos for Kubernetes: kill, network delay, partition, control plane failure
Tools in practice: Chaos Mesh, Litmus, Chaos Monkey for K8s, and CI/CD-integrated chaos
Interview-ready architecture reasoning: "walk me through pod creation", "design a self-healing cluster", "test resilience without breaking production"

Curriculum

12 modules · 36 lessons
01

The Kubernetes Architecture Mental Model

The clean separation of control plane and data plane, the API server as universal bus, and reconciliation loops as the universal pattern.

3 lessons
02

API Server Internals

Request lifecycle, admission controller pipeline, and the performance characteristics that determine how the apiserver scales.

3 lessons
03

etcd Architecture

Raft consensus mechanics, the watch stream model, and the failure modes that take clusters offline.

3 lessons
04

The Scheduler

The scheduling framework, the predicate/priority pipeline, and the design choices that make the scheduler work past 10,000 pods.

3 lessons
05

Controllers and Operators

The controller pattern that makes Kubernetes a platform, the built-in controllers that run your workloads, and the operator pattern for everything else.

3 lessons
06

kubelet Deep Dive

The pod lifecycle from kubelet's perspective, the CRI shim model, and the kubelet failure modes that take nodes offline.

3 lessons
07

Networking Internals

How Services, CNI, and DNS actually work at the iptables/eBPF layer, with the scaling characteristics that matter at thousands of services.

3 lessons
08

Failure Domains

What survives which kind of failure. The bounded-impact reasoning that turns architecture knowledge into resilience design.

3 lessons
09

Chaos Engineering Fundamentals

The discipline of breaking things on purpose, why it works, and the hypothesis-driven approach that separates chaos engineering from random pod-killing.

3 lessons
10

Chaos Engineering for Kubernetes

The concrete experiments at the pod, node, and cluster level: what each one tests, what bugs it surfaces, and how to scope it.

3 lessons
11

Tools in Practice

The tooling landscape, the game day playbook, and the CI/CD integration that turns chaos from quarterly drill into continuous practice.

3 lessons
12

Interview-Ready Architecture Reasoning

The big interview questions answered with the reasoning frameworks the rest of the course built up.

3 lessons

About the Author

Sharon Sahadevan

Sharon Sahadevan

AI Infrastructure Engineer

Building production GPU clusters on Kubernetes. H100s, large-scale model serving, and end-to-end ML infrastructure across Azure and AWS.

10+ years designing cloud-native platforms with deep expertise in Kubernetes orchestration, GitOps (Argo CD), Terraform, and MLOps pipelines for LLM deployment.

Author of KubeNatives, a weekly newsletter read by 3,000+ DevOps and ML engineers for production insights on K8s internals, GPU scheduling, and model-serving patterns.

Ready to master this topic?

Start with the free preview lesson and see for yourself.