Configure Multiple Schedulers
Kubernetes ships with a default scheduler that is described here. If the default scheduler does not suit your needs you can implement your own scheduler. Moreover, you can even run multiple schedulers simultaneously alongside the default scheduler and instruct Kubernetes what scheduler to use for each of your pods. Let's learn how to run multiple schedulers in Kubernetes with an example.
A detailed description of how to implement a scheduler is outside the scope of this document. Please refer to the kube-scheduler implementation in pkg/scheduler in the Kubernetes source directory for a canonical example.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Package the scheduler
Package your scheduler binary into a container image. For the purposes of this example, you can use the default scheduler (kube-scheduler) as your second scheduler. Clone the Kubernetes source code from GitHub and build the source.
git clone https://github.com/kubernetes/kubernetes.git cd kubernetes make
Create a container image containing the kube-scheduler binary. Here is the
to build the image:
FROM busybox ADD ./_output/local/bin/linux/amd64/kube-scheduler /usr/local/bin/kube-scheduler
Save the file as
Dockerfile, build the image and push it to a registry. This example
pushes the image to
Google Container Registry (GCR).
For more details, please read the GCR
docker build -t gcr.io/my-gcp-project/my-kube-scheduler:1.0 . gcloud docker -- push gcr.io/my-gcp-project/my-kube-scheduler:1.0
Define a Kubernetes Deployment for the scheduler
Now that you have your scheduler in a container image, create a pod
configuration for it and run it in your Kubernetes cluster. But instead of creating a pod
directly in the cluster, you can use a Deployment
for this example. A Deployment manages a
Replica Set which in turn manages the pods,
thereby making the scheduler resilient to failures. Here is the deployment
config. Save it as
apiVersion: v1 kind: ServiceAccount metadata: name: my-scheduler namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: my-scheduler-as-kube-scheduler subjects: - kind: ServiceAccount name: my-scheduler namespace: kube-system roleRef: kind: ClusterRole name: system:kube-scheduler apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: my-scheduler-as-volume-scheduler subjects: - kind: ServiceAccount name: my-scheduler namespace: kube-system roleRef: kind: ClusterRole name: system:volume-scheduler apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: my-scheduler-extension-apiserver-authentication-reader namespace: kube-system roleRef: kind: Role name: extension-apiserver-authentication-reader apiGroup: rbac.authorization.k8s.io subjects: - kind: ServiceAccount name: my-scheduler namespace: kube-system --- apiVersion: v1 kind: ConfigMap metadata: name: my-scheduler-config namespace: kube-system data: my-scheduler-config.yaml: | apiVersion: kubescheduler.config.k8s.io/v1beta2 kind: KubeSchedulerConfiguration profiles: - schedulerName: my-scheduler leaderElection: leaderElect: false --- apiVersion: apps/v1 kind: Deployment metadata: labels: component: scheduler tier: control-plane name: my-scheduler namespace: kube-system spec: selector: matchLabels: component: scheduler tier: control-plane replicas: 1 template: metadata: labels: component: scheduler tier: control-plane version: second spec: serviceAccountName: my-scheduler containers: - command: - /usr/local/bin/kube-scheduler - --config=/etc/kubernetes/my-scheduler/my-scheduler-config.yaml image: gcr.io/my-gcp-project/my-kube-scheduler:1.0 livenessProbe: httpGet: path: /healthz port: 10259 scheme: HTTPS initialDelaySeconds: 15 name: kube-second-scheduler readinessProbe: httpGet: path: /healthz port: 10259 scheme: HTTPS resources: requests: cpu: '0.1' securityContext: privileged: false volumeMounts: - name: config-volume mountPath: /etc/kubernetes/my-scheduler hostNetwork: false hostPID: false volumes: - name: config-volume configMap: name: my-scheduler-config
In the above manifest, you use a KubeSchedulerConfiguration
to customize the behavior of your scheduler implementation. This configuration has been passed to
kube-scheduler during initialization with the
--config option. The
my-scheduler-config ConfigMap stores the configuration file. The Pod of the
my-scheduler Deployment mounts the
my-scheduler-config ConfigMap as a volume.
In the aforementioned Scheduler Configuration, your scheduler implementation is represented via a KubeSchedulerProfile.
spec.schedulerNamefield in a PodTemplate or Pod manifest must match the
schedulerNamefield of the
KubeSchedulerProfile. All schedulers running in the cluster must have unique names.
Also, note that you create a dedicated service account
my-scheduler and bind the ClusterRole
system:kube-scheduler to it so that it can acquire the same privileges as
Please see the
kube-scheduler documentation for
detailed description of other command line arguments and
Scheduler Configuration reference for
detailed description of other customizable
Run the second scheduler in the cluster
In order to run your scheduler in a Kubernetes cluster, create the deployment specified in the config above in a Kubernetes cluster:
kubectl create -f my-scheduler.yaml
Verify that the scheduler pod is running:
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE .... my-scheduler-lnf4s-4744f 1/1 Running 0 2m ...
You should see a "Running" my-scheduler pod, in addition to the default kube-scheduler pod in this list.
Enable leader election
To run multiple-scheduler with leader election enabled, you must do the following:
Update the following fields for the KubeSchedulerConfiguration in the
my-scheduler-config ConfigMap in your YAML file:
If RBAC is enabled on your cluster, you must update the
system:kube-scheduler cluster role.
Add your scheduler name to the resourceNames of the rule applied for
leases resources, as in the following example:
kubectl edit clusterrole system:kube-scheduler
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kube-scheduler rules: - apiGroups: - coordination.k8s.io resources: - leases verbs: - create - apiGroups: - coordination.k8s.io resourceNames: - kube-scheduler - my-scheduler resources: - leases verbs: - get - update - apiGroups: - "" resourceNames: - kube-scheduler - my-scheduler resources: - endpoints verbs: - delete - get - patch - update
Specify schedulers for pods
Now that your second scheduler is running, create some pods, and direct them to be scheduled by either the default scheduler or the one you deployed. In order to schedule a given pod using a specific scheduler, specify the name of the scheduler in that pod spec. Let's look at three examples.
Pod spec without any scheduler name
apiVersion: v1 kind: Pod metadata: name: no-annotation labels: name: multischeduler-example spec: containers: - name: pod-with-no-annotation-container image: registry.k8s.io/pause:2.0
When no scheduler name is supplied, the pod is automatically scheduled using the default-scheduler.
Save this file as
pod1.yamland submit it to the Kubernetes cluster.
kubectl create -f pod1.yaml
Pod spec with
apiVersion: v1 kind: Pod metadata: name: annotation-default-scheduler labels: name: multischeduler-example spec: schedulerName: default-scheduler containers: - name: pod-with-default-annotation-container image: registry.k8s.io/pause:2.0
A scheduler is specified by supplying the scheduler name as a value to
spec.schedulerName. In this case, we supply the name of the default scheduler which is
Save this file as
pod2.yamland submit it to the Kubernetes cluster.
kubectl create -f pod2.yaml
Pod spec with
apiVersion: v1 kind: Pod metadata: name: annotation-second-scheduler labels: name: multischeduler-example spec: schedulerName: my-scheduler containers: - name: pod-with-second-annotation-container image: registry.k8s.io/pause:2.0
In this case, we specify that this pod should be scheduled using the scheduler that we deployed -
my-scheduler. Note that the value of
spec.schedulerNameshould match the name supplied for the scheduler in the
schedulerNamefield of the mapping
Save this file as
pod3.yamland submit it to the Kubernetes cluster.
kubectl create -f pod3.yaml
Verify that all three pods are running.
kubectl get pods
Verifying that the pods were scheduled using the desired schedulers
In order to make it easier to work through these examples, we did not verify that the
pods were actually scheduled using the desired schedulers. We can verify that by
changing the order of pod and deployment config submissions above. If we submit all the
pod configs to a Kubernetes cluster before submitting the scheduler deployment config,
we see that the pod
annotation-second-scheduler remains in "Pending" state forever
while the other two pods get scheduled. Once we submit the scheduler deployment config
and our new scheduler starts running, the
annotation-second-scheduler pod gets
scheduled as well.
Alternatively, you can look at the "Scheduled" entries in the event logs to verify that the pods were scheduled by the desired schedulers.
kubectl get events
You can also use a custom scheduler configuration or a custom container image for the cluster's main scheduler by modifying its static pod manifest on the relevant control plane nodes.