Concepts

Edit This Page

Kubernetes Scheduler

In Kubernetes, scheduling refers to making sure that PodsThe smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster. are matched to NodesA node is a worker machine in Kubernetes. so that KubeletAn agent that runs on each node in the cluster. It makes sure that containers are running in a pod. can run them.

Scheduling overview

A scheduler watches for newly created Pods that have no Node assigned. For every Pod that the scheduler discovers, the scheduler becomes responsible for finding the best Node for that Pod to run on. The scheduler reaches this placement decision taking into account the scheduling principles described below.

If you want to understand why Pods are placed onto a particular Node, or if you’re planning to implement a custom scheduler yourself, this page will help you learn about scheduling.

kube-scheduler

kube-scheduler is the default scheduler for Kubernetes and runs as part of the control planeThe container orchestration layer that exposes the API and interfaces to define, deploy, and manage the lifecycle of containers. . kube-scheduler is designed so that, if you want and need to, you can write your own scheduling component and use that instead.

For every newly created pods or other unscheduled pods, kube-scheduler selects a optimal node for them to run on. However, every container in pods has different requirements for resources and every pod also has different requirements. Therefore, existing nodes need to be filtered according to the specific scheduling requirements.

In a cluster, Nodes that meet the scheduling requirements for a Pod are called feasible nodes. If none of the nodes are suitable, the pod remains unscheduled until the scheduler is able to place it.

The scheduler finds feasible Nodes for a Pod and then runs a set of functions to score the feasible Nodes and picks a Node with the highest score among the feasible ones to run the Pod. The scheduler then notifies the API server about this decision in a process called binding.

Factors that need taken into account for scheduling decisions include individual and collective resource requirements, hardware / software / policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and so on.

Scheduling with kube-scheduler

kube-scheduler selects a node for the pod in a 2-step operation:

  1. Filtering

  2. Scoring

The filtering step finds the set of Nodes where it’s feasible to schedule the Pod. For example, the PodFitsResources filter checks whether a candidate Node has enough available resource to meet a Pod’s specific resource requests. After this step, the node list contains any suitable Nodes; often, there will be more than one. If the list is empty, that Pod isn’t (yet) schedulable.

In the scoring step, the scheduler ranks the remaining nodes to choose the most suitable Pod placement. The scheduler assigns a score to each Node that survived filtering, basing this score on the active scoring rules.

Finally, kube-scheduler assigns the Pod to the Node with the highest ranking. If there is more than one node with equal scores, kube-scheduler selects one of these at random.

Default policies

kube-scheduler has a default set of scheduling policies.

Filtering

Scoring

  • SelectorSpreadPriority: Spreads Pods across hosts, considering Pods that belonging to the same ServiceA way to expose an application running on a set of Pods as a network service. , StatefulSetManages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. or ReplicaSetReplicaSet ensures that a specified number of Pod replicas are running at one time .

  • InterPodAffinityPriority: Computes a sum by iterating through the elements of weightedPodAffinityTerm and adding “weight” to the sum if the corresponding PodAffinityTerm is satisfied for that node; the node(s) with the highest sum are the most preferred.

  • LeastRequestedPriority: Favors nodes with fewer requested resources. In other words, the more Pods that are placed on a Node, and the more resources those Pods use, the lower the ranking this policy will give.

  • MostRequestedPriority: Favors nodes with most requested resources. This policy will fit the scheduled Pods onto the smallest number of Nodes needed to run your overall set of workloads.

  • RequestedToCapacityRatioPriority: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.

  • BalancedResourceAllocation: Favors nodes with balanced resource usage.

  • NodePreferAvoidPodsPriority: Priorities nodes according to the node annotation scheduler.alpha.kubernetes.io/preferAvoidPods. You can use this to hint that two different Pods shouldn’t run on the same Node.

  • NodeAffinityPriority: Prioritizes nodes according to node affinity scheduling preferences indicated in PreferredDuringSchedulingIgnoredDuringExecution. You can read more about this in Assigning Pods to Nodes

  • TaintTolerationPriority: Prepares the priority list for all the nodes, based on the number of intolerable taints on the node. This policy adjusts a node’s rank taking that list into account.

  • ImageLocalityPriority: Favors nodes that already have the container imagesStored instance of a container that holds a set of software needed to run an application. for that Pod cached locally.

  • ServiceSpreadingPriority: For a given Service, this policy aims to make sure that the Pods for the Service run on different nodes. It favouring scheduling onto nodes that don’t have Pods for the service already assigned there. The overall outcome is that the Service becomes more resilient to a single Node failure.

  • CalculateAntiAffinityPriorityMap: This policy helps implement pod anti-affinity.

  • EqualPriorityMap: Gives an equal weight of one to all nodes.

What's next

Feedback