Advanced|18 hours|36 lessons

Kubernetes Architecture & Chaos

How Kubernetes actually works under the hood, from API server request lifecycle to etcd Raft to the scheduler framework — paired with chaos engineering reasoning that turns architectural knowledge into operational confidence. Built for the interview question "walk me through what happens when you create a pod" and the production question "how do we test resilience without breaking customers?"

Early Bird Pricing
$79$59Save $20

One-time payment. Lifetime access.

Text-based, no videos
12 modules, 36 lessons
Lifetime access

What you'll learn

The control-plane / data-plane split and why every Kubernetes component fits into one or the other
API server internals — request lifecycle, admission controllers, performance under load
etcd architecture — Raft consensus, watch streams, the failure modes that take clusters down
The scheduler framework — predicates, priorities, profiles, and scaling past 10,000 pods
How controllers and operators reconcile state, and what the controller-runtime library does for you
The kubelet's pod lifecycle — from API watch to running container via CRI
Networking internals — kube-proxy, CNI dataplane (Calico, Cilium), DNS and service discovery at scale
Mapping your cluster's failure domains — what survives an apiserver outage, an etcd loss, a control plane gone
Chaos engineering fundamentals — hypothesis-driven experiments, blast-radius scoping, game days
Pod, node, and cluster-level chaos for Kubernetes — kill, network delay, partition, control plane failure
Tools in practice — Chaos Mesh, Litmus, Chaos Monkey for K8s, and CI/CD-integrated chaos
Interview-ready architecture reasoning — "walk me through pod creation", "design a self-healing cluster", "test resilience without breaking production"

Curriculum

12 modules · 36 lessons
01

The Kubernetes Architecture Mental Model

The clean separation of control plane and data plane, the API server as universal bus, and reconciliation loops as the universal pattern.

3 lessons
02

API Server Internals

Request lifecycle, admission controller pipeline, and the performance characteristics that determine how the apiserver scales.

3 lessons
03

etcd Architecture

Raft consensus mechanics, the watch stream model, and the failure modes that take clusters offline.

3 lessons
04

The Scheduler

The scheduling framework, the predicate/priority pipeline, and the design choices that make the scheduler work past 10,000 pods.

3 lessons
05

Controllers and Operators

The controller pattern that makes Kubernetes a platform, the built-in controllers that run your workloads, and the operator pattern for everything else.

3 lessons
06

kubelet Deep Dive

The pod lifecycle from kubelet's perspective, the CRI shim model, and the kubelet failure modes that take nodes offline.

3 lessons
07

Networking Internals

How Services, CNI, and DNS actually work at the iptables/eBPF layer, with the scaling characteristics that matter at thousands of services.

3 lessons
08

Failure Domains

What survives which kind of failure. The bounded-impact reasoning that turns architecture knowledge into resilience design.

3 lessons
09

Chaos Engineering Fundamentals

The discipline of breaking things on purpose, why it works, and the hypothesis-driven approach that separates chaos engineering from random pod-killing.

3 lessons
10

Chaos Engineering for Kubernetes

The concrete experiments at the pod, node, and cluster level — what each one tests, what bugs it surfaces, and how to scope it.

3 lessons
11

Tools in Practice

The tooling landscape, the game day playbook, and the CI/CD integration that turns chaos from quarterly drill into continuous practice.

3 lessons
12

Interview-Ready Architecture Reasoning

The big interview questions answered with the reasoning frameworks the rest of the course built up.

3 lessons

About the Author

Sharon Sahadevan

Sharon Sahadevan

AI Infrastructure Engineer

Building production GPU clusters on Kubernetes — H100s, large-scale model serving, and end-to-end ML infrastructure across Azure and AWS.

10+ years designing cloud-native platforms with deep expertise in Kubernetes orchestration, GitOps (Argo CD), Terraform, and MLOps pipelines for LLM deployment.

Author of KubeNatives, a weekly newsletter read by 3,000+ DevOps and ML engineers for production insights on K8s internals, GPU scheduling, and model-serving patterns.

Ready to master this topic?

Start with the free preview lesson and see for yourself.