Kubernetes Self-Healing

Kubernetes is designed with self-healing capabilities that help maintain the health and availability of workloads. It automatically replaces failed containers, reschedules workloads when nodes become unavailable, and ensures that the desired state of the system is maintained.

Self-Healing capabilities

Container-level restarts: If a container inside a Pod fails, Kubernetes restarts it based on the restartPolicy.
Replica replacement: If a Pod in a Deployment or StatefulSet fails, Kubernetes creates a replacement Pod to maintain the specified number of replicas. If a Pod fails that is part of a DaemonSet fails, the control plane creates a replacement Pod to run on the same node.
Persistent storage recovery: If a node is running a Pod with a PersistentVolume (PV) attached, and the node fails, Kubernetes can reattach the volume to a new Pod on a different node.
Load balancing for Services: If a Pod behind a Service fails, Kubernetes automatically removes it from the Service's endpoints to route traffic only to healthy Pods.

Here are some of the key components that provide Kubernetes self-healing:

kubelet: Ensures that containers are running, and restarts those that fail.
ReplicaSet, StatefulSet and DaemonSet controller: Maintains the desired number of Pod replicas.
PersistentVolume controller: Manages volume attachment and detachment for stateful workloads.

Considerations

Storage Failures: If a persistent volume becomes unavailable, recovery steps may be required.
Application Errors: Kubernetes can restart containers, but underlying application issues must be addressed separately.

What's next

Read more about Pods
Learn about Kubernetes Controllers
Explore PersistentVolumes
Read about node autoscaling. Node autoscaling also provides automatic healing if or when nodes fail in your cluster.

Last modified April 15, 2025 at 5:54 PM PST: Move where landing page self-healing refers to (350a056f25)

Kubernetes Self-Healing

Self-Healing capabilities

Considerations

What's next

Feedback