Kubernetes Blog

Kubernetes StatefulSets & DaemonSets Updates

September 27 2017

Editor’s note: today’s post is by Janet Kuo and Kenneth Owens, Software Engineers at Google.

This post talks about recent updates to the DaemonSet and StatefulSet API objects for Kubernetes. We explore these features using Apache ZooKeeper and Apache Kafka StatefulSets and a Prometheus node exporter DaemonSet.

In Kubernetes 1.6, we added the RollingUpdate update strategy to the DaemonSet API Object. Configuring your DaemonSets with the RollingUpdate strategy causes the DaemonSet controller to perform automated rolling updates to the Pods in your DaemonSets when their spec.template are updated.

In Kubernetes 1.7, we enhanced the DaemonSet controller to track a history of revisions to the PodTemplateSpecs of DaemonSets. This allows the DaemonSet controller to roll back an update. We also added the RollingUpdate strategy to the StatefulSet API Object, and implemented revision history tracking for the StatefulSet controller. Additionally, we added the Parallel pod management policy to support stateful applications that require Pods with unique identities but not ordered Pod creation and termination.

StatefulSet rolling update and Pod management policy

First, we’re going to demonstrate how to use StatefulSet rolling updates and Pod management policies by deploying a ZooKeeper ensemble and a Kafka cluster.


To follow along, you’ll need to set up a Kubernetes 1.7 cluster with at least 3 schedulable nodes. Each node needs 1 CPU and 2 GiB of memory available. You will also need either a dynamic provisioner to allow the StatefulSet controller to provision 6 persistent volumes (PVs) with 10 GiB each, or you will need to manually provision the PVs prior to deploying the ZooKeeper ensemble or deploying the Kafka cluster.

Deploying a ZooKeeper ensemble

Apache ZooKeeper is a strongly consistent, distributed system used by other distributed systems for cluster coordination and configuration management.

Note: You can create a ZooKeeper ensemble using this zookeeper_mini.yaml manifest. You can learn more about running a ZooKeeper ensemble on Kubernetes here, as well as a more in-depth explanation of the manifest and its contents.

When you apply the manifest, you will see output like the following.

$ kubectl apply -f zookeeper\_mini.yaml

service "zk-hs" created

service "zk-cs" created

poddisruptionbudget "zk-pdb" created

statefulset "zk" created

The manifest creates an ensemble of three ZooKeeper servers using a StatefulSet, zk; a Headless Service, zk-hs, to control the domain of the ensemble; a Service, zk-cs, that clients can use to connect to the ready ZooKeeper instances; and a PodDisruptionBugdet, zk-pdb, that allows for one planned disruption. (Note that while this ensemble is suitable for demonstration purposes, it isn’t sized correctly for production use.)

If you use kubectl get to watch Pod creation in another terminal you will see that, in contrast to the OrderedReady strategy (the default policy that implements the full version of the StatefulSet guarantees), all of the Pods in the zk StatefulSet are created in parallel.

$ kubectl get po -lapp=zk -w

NAME           READY         STATUS        RESTARTS     AGE

zk-0           0/1             Pending      0                   0s

zk-0           0/1             Pending     0                  0s

zk-1           0/1             Pending     0                  0s

zk-1           0/1             Pending     0                  0s

zk-0           0/1             ContainerCreating      0                  0s

zk-2           0/1             Pending      0                  0s

zk-1           0/1             ContainerCreating     0                  0s

zk-2           0/1             Pending      0                  0s

zk-2           0/1             ContainerCreating      0                  0s

zk-0           0/1             Running     0                  10s

zk-2           0/1             Running     0                  11s

zk-1           0/1             Running      0                  19s

zk-0           1/1             Running      0                  20s

zk-1           1/1             Running      0                  30s

zk-2           1/1             Running      0                  30s

This is because the zookeeper_mini.yaml manifest sets the podManagementPolicy of the StatefulSet to Parallel.

apiVersion: apps/v1beta1  
kind: StatefulSet  
   name: zk  

   serviceName: zk-hs  

   replicas: 3  


       type: RollingUpdate  

   podManagementPolicy: Parallel  


Many distributed systems, like ZooKeeper, do not require ordered creation and termination for their processes. You can use the Parallel Pod management policy to accelerate the creation and deletion of StatefulSets that manage these systems. Note that, when Parallel Pod management is used, the StatefulSet controller will not block when it fails to create a Pod. Ordered, sequential Pod creation and termination is performed when a StatefulSet’s podManagementPolicy is set to OrderedReady.

Deploying a Kafka Cluster

Apache Kafka is a popular distributed streaming platform. Kafka producers write data to partitioned topics which are stored, with a configurable replication factor, on a cluster of brokers. Consumers consume the produced data from the partitions stored on the brokers.

Note: Details of the manifests contents can be found here. You can learn more about running a Kafka cluster on Kubernetes here.

To create a cluster, you only need to download and apply the kafka_mini.yaml manifest. When you apply the manifest, you will see output like the following:

$ kubectl apply -f kafka\_mini.yaml

service "kafka-hs" created

poddisruptionbudget "kafka-pdb" created

statefulset "kafka" created

The manifest creates a three broker cluster using the kafka StatefulSet, a Headless Service, kafka-hs, to control the domain of the brokers; and a PodDisruptionBudget, kafka-pdb, that allows for one planned disruption. The brokers are configured to use the ZooKeeper ensemble we created above by connecting through the zk-cs Service. As with the ZooKeeper ensemble deployed above, this Kafka cluster is fine for demonstration purposes, but it’s probably not sized correctly for production use.

If you watch Pod creation, you will notice that, like the ZooKeeper ensemble created above, the Kafka cluster uses the Parallel podManagementPolicy.

$ kubectl get po -lapp=kafka -w

NAME           READY         STATUS        RESTARTS     AGE

kafka-0     0/1             Pending      0                   0s

kafka-0     0/1             Pending      0                  0s

kafka-1     0/1             Pending      0                  0s

kafka-1     0/1             Pending      0                  0s

kafka-2     0/1             Pending      0                  0s

kafka-0     0/1             ContainerCreating     0                  0s

kafka-2     0/1             Pending      0                  0s

kafka-1     0/1             ContainerCreating     0                  0s

kafka-1     0/1             Running     0                  11s

kafka-0     0/1             Running     0                  19s

kafka-1     1/1             Running     0                  23s

kafka-0     1/1             Running     0                  32s

Producing and consuming data

You can use kubectl run to execute the kafka-topics.sh script to create a topic named test.

$ kubectl run -ti --image=gcr.io/google\_containers/kubernetes-kafka:1.0-10.2.1 createtopic --restart=Never --rm -- kafka-topics.sh --create \

\> --topic test \

\> --zookeeper zk-cs.default.svc.cluster.local:2181 \

\> --partitions 1 \

\> --replication-factor 3

Now you can use kubectl run to execute the kafka-console-consumer.sh command to listen for messages.

$ kubectl run -ti --image=gcr.io/google\_containers/kubnetes-kafka:1.0-10.2.1 consume --restart=Never --rm -- kafka-console-consumer.sh --topic test --bootstrap-server kafka-0.kafka-hs.default.svc.cluster.local:9093

In another terminal, you can run the kafka-console-producer.sh command.

$kubectl run -ti --image=gcr.io/google\_containers/kubernetes-kafka:1.0-10.2.1 produce --restart=Never --rm \

\>   -- kafka-console-producer.sh --topic test --broker-list kafka-0.kafka-hs.default.svc.cluster.local:9093,kafka-1.kafka-hs.default.svc.cluster.local:9093,kafka-2.kafka-hs.default.svc.cluster.local:9093

Output from the second terminal appears in the first terminal. If you continue to produce and consume messages while updating the cluster, you will notice that no messages are lost. You may see error messages as the leader for the partition changes when individual brokers are updated, but the client retries until the message is committed. This is due to the ordered, sequential nature of StatefulSet rolling updates which we will explore further in the next section.

Updating the Kafka cluster

StatefulSet updates are like DaemonSet updates in that they are both configured by setting the spec.updateStrategy of the corresponding API object. When the update strategy is set to OnDelete, the respective controllers will only create new Pods when a Pod in the StatefulSet or DaemonSet has been deleted. When the update strategy is set to RollingUpdate, the controllers will delete and recreate Pods when a modification is made to the spec.template field of a DaemonSet or StatefulSet. You can use rolling updates to change the configuration (via environment variables or command line parameters), resource requests, resource limits, container images, labels, and/or annotations of the Pods in a StatefulSet or DaemonSet. Note that all updates are destructive, always requiring that each Pod in the DaemonSet or StatefulSet be destroyed and recreated. StatefulSet rolling updates differ from DaemonSet rolling updates in that Pod termination and creation is ordered and sequential.

You can patch the kafka StatefulSet to reduce the CPU resource request to 250m.

$ kubectl patch sts kafka --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"250m"}]'

statefulset "kafka" patched

If you watch the status of the Pods in the StatefulSet, you will see that each Pod is deleted and recreated in reverse ordinal order (starting with the Pod with the largest ordinal and progressing to the smallest). The controller waits for each updated Pod to be running and ready before updating the subsequent Pod.

$kubectl get po -lapp=kafka -w

NAME           READY         STATUS       RESTARTS     AGE

kafka-0     1/1             Running     0                   13m

kafka-1     1/1             Running     0                   13m

kafka-2     1/1             Running     0                   13m

kafka-2     1/1             Terminating     0                 14m

kafka-2     0/1             Terminating     0                 14m

kafka-2     0/1             Terminating     0                 14m

kafka-2     0/1             Terminating     0                 14m

kafka-2     0/1             Pending     0                 0s

kafka-2     0/1             Pending     0                 0s

kafka-2     0/1             ContainerCreating     0                 0s

kafka-2     0/1             Running     0                 10s

kafka-2     1/1             Running     0                 21s

kafka-1     1/1             Terminating     0                 14m

kafka-1     0/1             Terminating     0                 14m

kafka-1     0/1             Terminating     0                 14m

kafka-1     0/1             Terminating     0                 14m

kafka-1     0/1             Pending     0                 0s

kafka-1     0/1             Pending     0                 0s

kafka-1     0/1             ContainerCreating     0                 0s

kafka-1     0/1             Running     0                 11s

kafka-1     1/1             Running     0                 21s

kafka-0     1/1             Terminating     0                 14m

kafka-0     0/1             Terminating     0                 14m

kafka-0     0/1             Terminating     0                 14m

kafka-0     0/1             Terminating     0                 14m

kafka-0     0/1             Pending     0                 0s

kafka-0     0/1             Pending     0                 0s

kafka-0     0/1             ContainerCreating     0                 0s

kafka-0     0/1             Running     0                 10s

kafka-0     1/1             Running     0                 22s

Note that unplanned disruptions will not lead to unintentional updates during the update process. That is, the StatefulSet controller will always recreate the Pod at the correct version to ensure the ordering of the update is preserved. If a Pod is deleted, and if it has already been updated, it will be created from the updated version of the StatefulSet’s spec.template. If the Pod has not already been updated, it will be created from the previous version of the StatefulSet’s spec.template. We will explore this further in the following sections.

Staging an update

Depending on how your organization handles deployments and configuration modifications, you may want or need to stage updates to a StatefulSet prior to allowing the roll out to progress. You can accomplish this by setting a partition for the RollingUpdate. When the StatefulSet controller detects a partition in the updateStrategy of a StatefulSet, it will only apply the updated version of the StatefulSet’s spec.template to Pods whose ordinal is greater than or equal to the value of the partition.

You can patch the kafka StatefulSet to add a partition to the RollingUpdate update strategy. If you set the partition to a number greater than or equal to the StatefulSet’s spec.replicas (as below), any subsequent updates you perform to the StatefulSet’s spec.template will be staged for roll out, but the StatefulSet controller will not start a rolling update.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":3}}}}'

statefulset "kafka" patched

If you patch the StatefulSet to set the requested CPU to 0.3, you will notice that none of the Pods are updated.

$ kubectl patch sts kafka --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"0.3"}]'

statefulset "kafka" patched

Even if you delete a Pod and wait for the StatefulSet controller to recreate it, you will notice that the Pod is recreated with current CPU request.

$   kubectl delete po kafka-1

pod "kafka-1" deleted

$ kubectl get po kafka-1 -w

NAME           READY         STATUS                           RESTARTS     AGE

kafka-1     0/1             ContainerCreating     0                   10s

kafka-1     0/1             Running     0                 19s

kafka-1     1/1             Running     0                 21s

$ kubectl get po kafka-1 -o yaml

apiVersion: v1

kind: Pod





               cpu: 250m

               memory: 1Gi

Rolling out a canary

Often, we want to verify an image update or configuration change on a single instance of an application before rolling it out globally. If you modify the partition created above to be 2, the StatefulSet controller will roll out a canary that can be used to verify that the update is working as intended.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'

statefulset "kafka" patched

You can watch the StatefulSet controller update the kafka-2 Pod and pause after the update is complete.

$   kubectl get po -lapp=kafka -w

NAME           READY         STATUS       RESTARTS     AGE

kafka-0     1/1             Running     0                   50m

kafka-1     1/1             Running     0                   10m

kafka-2     1/1             Running     0                   29s

kafka-2     1/1             Terminating     0                 34s

kafka-2     0/1             Terminating     0                 38s

kafka-2     0/1             Terminating     0                 39s

kafka-2     0/1             Terminating     0                 39s

kafka-2     0/1             Pending     0                 0s

kafka-2     0/1             Pending     0                 0s

kafka-2     0/1             Terminating     0                 20s

kafka-2     0/1             Terminating     0                 20s

kafka-2     0/1             Pending     0                 0s

kafka-2     0/1             Pending     0                 0s

kafka-2     0/1             ContainerCreating     0                 0s

kafka-2     0/1             Running     0                 19s

kafka-2     1/1             Running     0                 22s

Phased roll outs

Similar to rolling out a canary, you can roll out updates based on a phased progression (e.g. linear, geometric, or exponential roll outs).

If you patch the kafka StatefulSet to set the partition to 1, the StatefulSet controller updates one more broker.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}'

statefulset "kafka" patched

If you set it to 0, the StatefulSet controller updates the final broker and completes the update.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'

statefulset "kafka" patched

Note that you don’t have to decrement the partition by one. For a larger StatefulSet–for example, one with 100 replicas–you might use a progression more like 100, 99, 90, 50, 0. In this case, you would stage your update, deploy a canary, roll out to 10 instances, update fifty percent of the Pods, and then complete the update.

Cleaning up

To delete the API Objects created above, you can use kubectl delete on the two manifests you used to create the ZooKeeper ensemble and the Kafka cluster.

$ kubectl delete -f kafka\_mini.yaml

service "kafka-hs" deleted

poddisruptionbudget "kafka-pdb" deleted

Statefulset “kafka” deleted

$ kubectl delete -f zookeeper\_mini.yaml

service "zk-hs" deleted

service "zk-cs" deleted

poddisruptionbudget "zk-pdb" deleted

statefulset "zk" deleted

By design, the StatefulSet controller does not delete any persistent volume claims (PVCs): the PVCs created for the ZooKeeper ensemble and the Kafka cluster must be manually deleted. Depending on the storage reclamation policy of your cluster, you many also need to manually delete the backing PVs.

DaemonSet rolling update, history, and rollback

In this section, we’re going to show you how to perform a rolling update on a DaemonSet, look at its history, and then perform a rollback after a bad rollout. We will use a DaemonSet to deploy a Prometheus node exporter on each Kubernetes node in the cluster. These node exporters export node metrics to the Prometheus monitoring system. For the sake of simplicity, we’ve omitted the installation of the Prometheus server and the service for communication with DaemonSet pods from this blogpost.


To follow along with this section of the blog, you need a working Kubernetes 1.7 cluster and kubectl version 1.7 or later. If you followed along with the first section, you can use the same cluster.

DaemonSet rolling update: Prometheus node exporters

First, prepare the node exporter DaemonSet manifest to run a v0.13 Prometheus node exporter on every node in the cluster:

$ cat \>\> node-exporter-v0.13.yaml \<\<EOF

apiVersion: extensions/v1beta1  
kind: DaemonSet  
   name: node-exporter  


       type: RollingUpdate  




               app: node-exporter  

           name: node-exporter  



           - image: prom/node-exporter:v0.13.0  

               name: node-exporter  


               - containerPort: 9100  

                   hostPort: 9100  

                   name: scrape  

           hostNetwork: true  

           hostPID: true


Note that you need to enable the DaemonSet rolling update feature by explicitly setting DaemonSet .spec.updateStrategy.type to RollingUpdate.

Apply the manifest to create the node exporter DaemonSet:

$ kubectl apply -f node-exporter-v0.13.yaml --record

daemonset "node-exporter" created

Wait for the first DaemonSet rollout to complete:

$ kubectl rollout status ds node-exporter  
daemon set "node-exporter" successfully rolled out

You should see each of your node runs one copy of the node exporter pod:

$ kubectl get pods -l app=node-exporter -o wide

To perform a rolling update on the node exporter DaemonSet, prepare a manifest that includes the v0.14 Prometheus node exporter:

$ cat node-exporter-v0.13.yaml ```  sed "s/v0.13.0/v0.14.0/g" \> node-exporter-v0.14.yaml

Then apply the v0.14 node exporter DaemonSet:

$ kubectl apply -f node-exporter-v0.14.yaml --record

daemonset "node-exporter" configured

Wait for the DaemonSet rolling update to complete:

$ kubectl rollout status ds node-exporter


Waiting for rollout to finish: 3 out of 4 new pods have been updated...  
Waiting for rollout to finish: 3 of 4 updated pods are available...  
daemon set "node-exporter" successfully rolled out

We just triggered a DaemonSet rolling update by updating the DaemonSet template. By default, one old DaemonSet pod will be killed and one new DaemonSet pod will be created at a time.

Now we’ll cause a rollout to fail by updating the image to an invalid value:

$ cat node-exporter-v0.13.yaml | sed "s/v0.13.0/bad/g" \> node-exporter-bad.yaml

$ kubectl apply -f node-exporter-bad.yaml --record

daemonset "node-exporter" configured

Notice that the rollout never finishes:

$ kubectl rollout status ds node-exporter   
Waiting for rollout to finish: 0 out of 4 new pods have been updated...  
Waiting for rollout to finish: 1 out of 4 new pods have been updated…

# Use ^C to exit

This behavior is expected. We mentioned earlier that a DaemonSet rolling update kills and creates one pod at a time. Because the new pod never becomes available, the rollout is halted, preventing the invalid specification from propagating to more than one node. StatefulSet rolling updates implement the same behavior with respect to failed deployments. Unsuccessful updates are blocked until it corrected via roll back or by rolling forward with a specification.

$ kubectl get pods -l app=node-exporter

NAME                                   READY         STATUS                 RESTARTS     AGE

node-exporter-f2n14     0/1             ErrImagePull     0                   3m


# N = number of nodes

$ kubectl get ds node-exporter  
NAME                       DESIRED     CURRENT     READY         UP-TO-DATE     AVAILABLE     NODE SELECTOR     AGE  

node-exporter     N                 N                 N-1             1                       N                     \<none\>                   46m

DaemonSet history, rollbacks, and rolling forward

Next, perform a rollback. Take a look at the node exporter DaemonSet rollout history:

$ kubectl rollout history ds node-exporter   
daemonsets "node-exporter"  

1                             kubectl apply --filename=node-exporter-v0.13.yaml --record=true  

2                             kubectl apply --filename=node-exporter-v0.14.yaml --record=true

3                             kubectl apply --filename=node-exporter-bad.yaml --record=true

Check the details of the revision you want to roll back to:

$ kubectl rollout history ds node-exporter --revision=2  
daemonsets "node-exporter" with revision #2  
Pod Template:  
   Labels:             app=node-exporter  



       Image:           prom/node-exporter:v0.14.0  

       Port:             9100/TCP  

       Environment:               \<none\>  

       Mounts:         \<none\>  

   Volumes:           \<none\>

You can quickly roll back to any DaemonSet revision you found through kubectl rollout history:

# Roll back to the last revision

$ kubectl rollout undo ds node-exporter   
daemonset "node-exporter" rolled back

# Or use --to-revision to roll back to a specific revision

$ kubectl rollout undo ds node-exporter --to-revision=2  
daemonset "node-exporter" rolled back

A DaemonSet rollback is done by rolling forward. Therefore, after the rollback, DaemonSet revision 2 becomes revision 4 (current revision):

$ kubectl rollout history ds node-exporter   
daemonsets "node-exporter"  

1                             kubectl apply --filename=node-exporter-v0.13.yaml --record=true  

3                             kubectl apply --filename=node-exporter-bad.yaml --record=true  

4                             kubectl apply --filename=node-exporter-v0.14.yaml --record=true

The node exporter DaemonSet is now healthy again:

$ kubectl rollout status ds node-exporter  
daemon set "node-exporter" successfully rolled out

# N = number of nodes

$ kubectl get ds node-exporter

NAME                       DESIRED     CURRENT     READY         UP-TO-DATE     AVAILABLE     NODE SELECTOR     AGE  

node-exporter     N                 N                 N                 N                       N                     \<none\>                   46m

If current DaemonSet revision is specified while performing a rollback, the rollback is skipped:

$ kubectl rollout undo ds node-exporter --to-revision=4  
daemonset "node-exporter" skipped rollback (current template already matches revision 4)

You will see this complaint from kubectl if the DaemonSet revision is not found:

$ kubectl rollout undo ds node-exporter --to-revision=10  
error: unable to find specified revision 10 in history

Note that kubectl rollout history and kubectl rollout status support StatefulSets, too!

Cleaning up

$ kubectl delete ds node-exporter

What’s next for DaemonSet and StatefulSet

Rolling updates and roll backs close an important feature gap for DaemonSets and StatefulSets. As we plan for Kubernetes 1.8, we want to continue to focus on advancing the core controllers to GA. This likely means that some advanced feature requests (e.g. automatic roll back, infant mortality detection) will be deferred in favor of ensuring the consistency, usability, and stability of the core controllers. We welcome feedback and contributions, so please feel free to reach out on Slack, to ask questions on Stack Overflow, or open issues or pull requests on GitHub.

- Post questions (or answer questions) on Stack Overflow - Join the community portal for advocates on K8sPort - Follow us on Twitter @Kubernetesio for latest updates - Connect with the community on Slack - Get involved with the Kubernetes project on GitHub

Introducing the Resource Management Working Group

September 21 2017

Editor’s note: today’s post is by Jeremy Eder, Senior Principal Software Engineer at Red Hat, on the formation of the Resource Management Working Group

Why are we here?

Kubernetes has evolved to support diverse and increasingly complex classes of applications. We can onboard and scale out modern, cloud-native web applications based on microservices, batch jobs, and stateful applications with persistent storage requirements.

However, there are still opportunities to improve Kubernetes; for example, the ability to run workloads that require specialized hardware or those that perform measurably better when hardware topology is taken into account. These conflicts can make it difficult for application classes (particularly in established verticals) to adopt Kubernetes.

We see an unprecedented opportunity here, with a high cost if it’s missed. The Kubernetes ecosystem must create a consumable path forward to the next generation of system architectures by catering to needs of as-yet unserviced workloads in meaningful ways. The Resource Management Working Group, along with other SIGs, must demonstrate the vision customers want to see, while enabling solutions to run well in a fully integrated, thoughtfully planned end-to-end stack.
Kubernetes Working Groups are created when a particular challenge requires cross-SIG collaboration. The Resource Management Working Group, for example, works primarily with sig-node and sig-scheduling to drive support for additional resource management capabilities in Kubernetes. We make sure that key contributors from across SIGs are frequently consulted because working groups are not meant to make system-level decisions on behalf of any SIG.
An example and key benefit of this is the working group’s relationship with sig-node.  We were able to ensure completion of several releases of node reliability work (complete in 1.6) before contemplating feature design on top. Those designs are use-case driven: research into technical requirements for a variety of workloads, then sorting based on measurable impact to the largest cross-section.

Target Workloads and Use-cases

One of the working group’s key design tenets is that user experience must remain clean and portable, while still surfacing infrastructure capabilities that are required by businesses and applications.
While not representing any commitment, we hope in the fullness of time that Kubernetes can optimally run financial services workloads, machine learning/training, grid schedulers, map-reduce, animation workloads, and more. As a use-case driven group, we account for potential application integration that can also facilitate an ecosystem of complementary independent software vendors to flourish on top of Kubernetes.


Why do this?

Kubernetes covers generic web hosting capabilities very well, so why go through the effort of expanding workload coverage for Kubernetes at all? The fact is that workloads elegantly covered by Kubernetes today, only represent a fraction of the world’s compute usage. We have a tremendous opportunity to safely and methodically expand upon the set of workloads that can run optimally on Kubernetes.

To date, there’s demonstrable progress in the areas of expanded workload coverage:

  • Stateful applications such as Zookeeper, etcd, MySQL, Cassandra, ElasticSearch
  • Jobs, such as timed events to process the day’s logs or any other batch processing
  • Machine Learning and compute-bound workload acceleration through Alpha GPU support Collectively, the folks working on Kubernetes are hearing from their customers that we need to go further. Following the tremendous popularity of containers in 2014, industry rhetoric circled around a more modern, container-based, datacenter-level workload orchestrator as folks looked to plan their next architectures.

As a consequence, we began advocating for increasing the scope of workloads covered by Kubernetes, from overall concepts to specific features. Our aim is to put control and choice in users hands, helping them move with confidence towards whatever infrastructure strategy they choose. In this advocacy, we quickly found a large group of like-minded companies interested in broadening the types of workloads that Kubernetes can orchestrate. And thus the working group was born.

Genesis of the Resource Management Working Group

After extensive development/feature discussions during the Kubernetes Developer Summit 2016 after CloudNativeCon | KubeCon Seattle, we decided to formalize our loosely organized group. In January 2017, the Kubernetes Resource Management Working Group was formed. This group (led by Derek Carr from Red Hat and Vishnu Kannan from Google) was originally cast as a temporary initiative to provide guidance back to sig-node and sig-scheduling (primarily). However, due to the cross-cutting nature of the goals within the working group, and the depth of roadmap quickly uncovered, the Resource Management Working Group became its own entity within the first few months.

Recently, Brian Grant from Google (@bgrant0607) posted the following image on his Twitter feed. This image helps to explain the role of each SIG, and shows where the Resource Management Working Group fits into the overall project organization.


To help bootstrap this effort, the Resource Management Working Group had its first face-to-face kickoff meeting in May 2017. Thanks to Google for hosting!


Folks from Intel, NVIDIA, Google, IBM, Red Hat. and Microsoft (among others) participated. 
You can read the outcomes of that 3-day meeting here.

The group’s prioritized list of features for increasing workload coverage on Kubernetes enumerated in the charter of the Resource Management Working group includes:

  • Support for performance sensitive workloads (exclusive cores, cpu pinning strategies, NUMA)
  • Integrating new hardware devices (GPUs, FPGAs, Infiniband, etc.)
  • Improving resource isolation (local storage, hugepages, caches, etc.)
  • Improving Quality of Service (performance SLOs)
  • Performance benchmarking
  • APIs and extensions related to the features mentioned above The discussions made it clear that there was tremendous overlap between needs for various workloads, and that we ought to de-duplicate requirements, and plumb generically.

Workload Characteristics

The set of initially targeted use-cases share one or more of the following characteristics:

  • Deterministic performance (address long tail latencies)
  • Isolation within a single node, as well as within groups of nodes sharing a control plane
  • Requirements on advanced hardware and/or software capabilities
  • Predictable, reproducible placement: applications need granular guarantees around placement  The Resource Management Working Group is spearheading the feature design and development in support of these workload requirements. Our goal is to provide best practices and patterns for these scenarios.

Initial Scope

In the months leading up to our recent face-to-face, we had discussed how to safely abstract resources in a way that retains portability and clean user experience, while still meeting application requirements. The working group came away with a multi-release roadmap that included 4 short- to mid-term targets with great overlap between target workloads:

  • Device Manager (Plugin) Proposal

    • Kubernetes should provide access to hardware devices such as NICs, GPUs, FPGA, Infiniband and so on.
  • CPU Manager

    • Kubernetes should provide a way for users to request static CPU assignment via the Guaranteed QoS tier. No support for NUMA in this phase.
  • HugePages support in Kubernetes

    • Kubernetes should provide a way for users to consume huge pages of any size.
  • Resource Class proposal

    • Kubernetes should implement an abstraction layer (analogous to StorageClasses) for devices other than CPU and memory that allows a user to consume a resource in a portable way. For example, how can a pod request a GPU that has a minimum amount of memory?

Getting Involved & Summary

Our charter document includes a Contact Us section with links to our mailing list, Slack channel, and Zoom meetings. Recordings of previous meetings are uploaded to Youtube. We plan to discuss these topics and more at the 2017 Kubernetes Developer Summit at CloudNativeCon | KubeCon in Austin. Please come and join one of our meetings (users, customers, software and hardware vendors are all welcome) and contribute to the working group!

Windows Networking at Parity with Linux for Kubernetes

September 08 2017

Editor’s note: today’s post is by Jason Messer, Principal PM Manager at Microsoft, on improvements to the Windows network stack to support the Kubernetes CNI model.

Since I last blogged about Kubernetes Networking for Windows four months ago, the Windows Core Networking team has made tremendous progress in both the platform and open source Kubernetes projects. With the updates, Windows is now on par with Linux in terms of networking. Customers can now deploy mixed-OS, Kubernetes clusters in any environment including Azure, on-premises, and on 3rd-party cloud stacks with the same network primitives and topologies supported on Linux without any workarounds, “hacks”, or 3rd-party switch extensions.

“So what?”, you may ask. There are multiple application and infrastructure-related reasons why these platform improvements make a substantial difference in the lives of developers and operations teams wanting to run Kubernetes. Read on to learn more!

Tightly-Coupled Communication

These improvements enable tightly-coupled communication between multiple Windows Server containers (without Hyper-V isolation) within a single “Pod”. Think of Pods as the scheduling unit for the Kubernetes cluster, inside of which, one or more application containers are co-located and able to share storage and networking resources. All containers within a Pod shared the same IP address and port range and are able to communicate with each other using localhost. This enables applications to easily leverage “helper” programs for tasks such as monitoring, configuration updates, log management, and proxies. Another way to think of a Pod is as a compute host with the app containers representing processes.

Simplified Network Topology

We also simplified the network topology on Windows nodes in a Kubernetes cluster by reducing the number of endpoints required per container (or more generally, per pod) to one. Previously, Windows containers (pods) running in a Kubernetes cluster required two endpoints - one for external (internet) communication and a second for intra-cluster communication between between other nodes or pods in the cluster. This was due to the fact that external communication from containers attached to a host network with local scope (i.e. not publicly routable) required a NAT operation which could only be provided through the Windows NAT (WinNAT) component on the host. Intra-cluster communication required containers to be attached to a separate network with “global” (cluster-level) scope through a second endpoint. Recent platform improvements now enable NAT’‘ing to occur directly on a container endpoint which is implemented with the Microsoft Virtual Filtering Platform (VFP) Hyper-V switch extension. Now, both external and intra-cluster traffic can flow through a single endpoint.

Load-Balancing using VFP in Windows kernel

Kubernetes worker nodes rely on the kube-proxy to load-balance ingress network traffic to Service IPs between pods in a cluster. Previous versions of Windows implemented the Kube-proxy’s load-balancing through a user-space proxy. We recently added support for “Proxy mode: iptables” which is implemented using VFP in the Windows kernel so that any IP traffic can be load-balanced more efficiently by the Windows OS kernel. Users can also configure an external load balancer by specifying the externalIP parameter in a service definition. In addition to the aforementioned improvements, we have also added platform support for the following:

  • Support for DNS search suffixes per container / Pod (Docker improvement - removes additional work previously done by kube-proxy to append DNS suffixes) 
  • [Platform Support] 5-tuple rules for creating ACLs (Looking for help from community to integrate this with support for K8s Network Policy)

Now that Windows Server has joined the Windows Insider Program, customers and partners can take advantage of these new platform features today which accrue value to eagerly anticipated, new feature release later this year and new build after six months. The latest Windows Server insider build now includes support for all of these platform improvements.

In addition to the platform improvements for Windows, the team submitted code (PRs) for CNI, kubelet, and kube-proxy with the goal of mainlining Windows support into the Kubernetes v1.8 release. These PRs remove previous work-arounds required on Windows for items such as user-mode proxy for internal load balancing, appending additional DNS suffixes to each Kube-DNS request, and a separate container endpoint for external (internet) connectivity.

These new platform features and work on kubelet and kube-proxy align with the CNI network model used by Kubernetes on Linux and simplify the deployment of a K8s cluster without additional configuration or custom (Azure) resource templates. To this end, we completed work on CNI network and IPAM plugins to create/remove endpoints and manage IP addresses. The CNI plugin works through kubelet to target the Windows Host Networking Service (HNS) APIs to create an ‘l2bridge’ network (analogous to macvlan on Linux) which is enforced by the VFP switch extension.

The ‘l2bridge’ network driver re-writes the MAC address of container network traffic on ingress and egress to use the container host’s MAC address. This obviates the need for multiple MAC addresses (one per container running on the host) to be “learned” by the upstream network switch port to which the container host is connected. This preserves memory space in physical switch TCAM tables and relies on the Hyper-V virtual switch to do MAC address translation in the host to forward traffic to the correct container. IP addresses are managed by a default, Windows IPAM plug-in which requires that POD CIDR IPs be taken from the container host’s network IP space.

The team demoed (link to video) these new platform features and open-source updates to the SIG-Windows group on 8/8. We are working with the community to merge the kubelet and kube-proxy PRs to mainline these changes in time for the Kubernetes v1.8 release due out this September. These capabilities can then be used on current Windows Server insider builds and the Windows Server, version 1709.

Soon after RTM, we will also introduce these improvements into the Azure Container Service (ACS) so that Windows worker nodes and the containers hosted are first-class, Azure VNet citizens. An Azure IPAM plugin for Windows CNI will enable these endpoints to directly attach to Azure VNets with network policies for Windows containers enforced the same way as VMs.

Feature Windows Server 2016 (In-Market) Next Windows Server Feature Release, Semi-Annual Channel Linux
Multiple Containers per Pod with shared network namespace (Compartment) One Container per Pod
Single (Shared) Endpoint per Pod Two endpoints: WinNAT (External) + Transparent (Intra-Cluster)
User-Mode, Load Balancing
Kernel-Mode, Load Balancing Not Supported
Support for DNS search suffixes per Pod (Docker update) Kube-Proxy  added multiple DNS suffixes to each request
CNI Plugin Support Not Supported

The Kubernetes SIG Windows group meets bi-weekly on Tuesdays at 12:30 PM ET. To join or view notes from previous meetings, check out this document.

Kubernetes Meets High-Performance Computing

August 22 2017

Editor’s note: today’s post is by Robert Lalonde, general manager at Univa, on supporting mixed HPC and containerized applications  

Anyone who has worked with Docker can appreciate the enormous gains in efficiency achievable with containers. While Kubernetes excels at orchestrating containers, high-performance computing (HPC) applications can be tricky to deploy on Kubernetes.

In this post, I discuss some of the challenges of running HPC workloads with Kubernetes, explain how organizations approach these challenges today, and suggest an approach for supporting mixed workloads on a shared Kubernetes cluster. We will also provide information and links to a case study on a customer, IHME, showing how Kubernetes is extended to service their HPC workloads seamlessly while retaining scalability and interfaces familiar to HPC users.

HPC workloads unique challenges

In Kubernetes, the base unit of scheduling is a Pod: one or more Docker containers scheduled to a cluster host. Kubernetes assumes that workloads are containers. While Kubernetes has the notion of Cron Jobs and Jobs that run to completion, applications deployed on Kubernetes are typically long-running services, like web servers, load balancers or data stores and while they are highly dynamic with pods coming and going, they differ greatly from HPC application patterns.

Traditional HPC applications often exhibit different characteristics:

- In financial or engineering simulations, a job may be comprised of tens of thousands of short-running tasks, demanding low-latency and high-throughput scheduling to complete a simulation in an acceptable amount of time. - A computational fluid dynamics (CFD) problem may execute in parallel across many hundred or even thousands of nodes using a message passing library to synchronize state. This requires specialized scheduling and job management features to allocate and launch such jobs and then to checkpoint, suspend/resume or backfill them. - Other HPC workloads may require specialized resources like GPUs or require access to limited software licenses. Organizations may enforce policies around what types of resources can be used by whom to ensure projects are adequately resourced and deadlines are met. HPC workload schedulers have evolved to support exactly these kinds of workloads. Examples include Univa Grid Engine, IBM Spectrum LSF and Altair’s PBS Professional. Sites managing HPC workloads have come to rely on capabilities like array jobs, configurable pre-emption, user, group or project based quotas and a variety of other features.

Blurring the lines between containers and HPC

HPC users believe containers are valuable for the same reasons as other organizations. Packaging logic in a container to make it portable, insulated from environmental dependencies, and easily exchanged with other containers clearly has value. However, making the switch to containers can be difficult.

HPC workloads are often integrated at the command line level. Rather than requiring coding, jobs are submitted to queues via the command line as binaries or simple shell scripts that act as wrappers. There are literally hundreds of engineering, scientific and analytic applications used by HPC sites that take this approach and have mature and certified integrations with popular workload schedulers.

While the notion of packaging a workload into a Docker container, publishing it to a registry, and submitting a YAML description of the workload is second nature to users of Kubernetes, this is foreign to most HPC users. An analyst running models in R, MATLAB or Stata simply wants to submit their simulation quickly, monitor their execution, and get a result as quickly as possible.

Existing approaches

To deal with the challenges of migrating to containers, organizations running container and HPC workloads have several options:

- Maintain separate infrastructures

For sites with sunk investments in HPC, this may be a preferred approach. Rather than disrupt existing environments, it may be easier to deploy new containerized applications on a separate cluster and leave the HPC environment alone. The challenge is that this comes at the cost of siloed clusters, increasing infrastructure and management cost.

- Run containerized workloads under an existing HPC workload manager

For sites running traditional HPC workloads, another approach is to use existing job submission mechanisms to launch jobs that in turn instantiate Docker containers on one or more target hosts. Sites using this approach can introduce containerized workloads with minimal disruption to their environment. Leading HPC workload managers such as Univa Grid Engine Container Edition and IBM Spectrum LSF are adding native support for Docker containers. Shifter and Singularity are important open source tools supporting this type of deployment also. While this is a good solution for sites with simple requirements that want to stick with their HPC scheduler, they will not have access to native Kubernetes features, and this may constrain flexibility in managing long-running services where Kubernetes excels.

- Use native job scheduling features in Kubernetes

Sites less invested in existing HPC applications can use existing scheduling facilities in Kubernetes for jobs that run to completion. While this is an option, it may be impractical for many HPC users. HPC applications are often either optimized towards massive throughput or large scale parallelism. In both cases startup and teardown latencies have a discriminating impact. Latencies that appear to be acceptable for containerized microservices today would render such applications unable to scale to the required levels.

All of these solutions involve tradeoffs. The first option doesn’t allow resources to be shared (increasing costs) and the second and third options require customers to pick a single scheduler, constraining future flexibility.

Mixed workloads on Kubernetes

A better approach is to support HPC and container workloads natively in the same shared environment. Ideally, users should see the environment appropriate to their workload or workflow type.

One approach to supporting mixed workloads is to allow Kubernetes and the HPC workload manager to co-exist on the same cluster, throttling resources to avoid conflicts. While simple, this means that neither workload manager can fully utilize the cluster.

Another approach is to use a peer scheduler that coordinates with the Kubernetes scheduler. Navops Command by Univa is a solution that takes this third approach, augmenting the functionality of the Kubernetes scheduler. Navops Command provides its own web interface and CLI and allows additional scheduling policies to be enabled on Kubernetes without impacting the operation of the Kubernetes scheduler and existing containerized applications. Navops Command plugs into the Kubernetes architecture via the ‘schedulerName’ attribute in the pod spec as a peer scheduler that workloads can choose to use instead of the Kubernetes stock scheduler as shown below.

Screen Shot 2017-08-15 at 9.15.45 AM.png

With this approach, Kubernetes acts as a resource manager, making resources available to a separate HPC scheduler. Cluster administrators can use a visual interface to allocate resources based on policy or simply drag sliders via a web UI to allocate different proportions of the Kubernetes environment to non-container (HPC) workloads, and native Kubernetes applications and services.

From a client perspective, the HPC scheduler runs as a service deployed in Kubernetes pods, operating just as it would on a bare metal cluster. Navops Command provides additional scheduling features including things like resource reservation, run-time quotas, workload preemption and more. This environment works equally well for on-premise, cloud-based or hybrid deployments.

Deploying mixed workloads at IHME

One client having success with mixed workloads is the Institute for Health Metrics & Evaluation (IHME), an independent health research center at the University of Washington. In support of their globally recognized Global Health Data Exchange (GHDx), IHME operates a significantly sized environment comprised of 500 nodes and 20,000 cores running a mix of analytic, HPC, and container-based applications on Kubernetes. This case study describes IHME’s success hosting existing HPC workloads on a shared Kubernetes cluster using Navops Command.

For sites deploying new clusters that want access to the rich capabilities in Kubernetes but need the flexibility to run non-containerized workloads, this approach is worth a look. It offers the opportunity for sites to share infrastructure between Kubernetes and HPC workloads without disrupting existing applications and businesses processes. It also allows them to migrate their HPC workloads to use Docker containers at their own pace.

High Performance Networking with EC2 Virtual Private Clouds

August 11 2017

One of the most popular platforms for running Kubernetes is Amazon Web Services’ Elastic Compute Cloud (AWS EC2). With more than a decade of experience delivering IaaS, and expanding over time to include a rich set of services with easy to consume APIs, EC2 has captured developer mindshare and loyalty worldwide.

When it comes to networking, however, EC2 has some limits that hinder performance and make deploying Kubernetes clusters to production unnecessarily complex. The preview release of Romana v2.0, a network and security automation solution for Cloud Native applications, includes features that address some well known network issues when running Kubernetes in EC2.

Traditional VPC Networking Performance Roadblocks

A Kubernetes pod network is separate from an Amazon Virtual Private Cloud (VPC) instance network; consequently, off-instance pod traffic needs a route to the destination pods. Fortunately, VPCs support setting these routes. When building a cluster network with the kubenet plugin, whenever new nodes are added, the AWS cloud provider will automatically add a VPC route to the pods running on that node.

Using kubenet to set routes provides native VPC network performance and visibility. However, since kubenet does not support more advanced network functions like network policy for pod traffic isolation, many users choose to run a Container Network Interface (CNI) provider on the back end.

Before Romana v2.0, all CNI network providers required an overlay when used across Availability Zones (AZs), leaving CNI users who want to deploy HA clusters unable to get the performance of native VPC networking.

Even users who don’t need advanced networking encounter restriction, since the VPC route tables support a maximum of 50 entries, which limits the size of a cluster to 50 nodes (or less, if some VPC routes are needed for other purposes). Until Romana v2.0, users also needed to run an overlay network to get around this limit.

Whether you were interested in advanced networking for traffic isolation or running large production HA clusters (or both), you were unable to get the performance and visibility of native VPC networking.

Kubernetes on Multi-Segment Networks

The way to avoid running out of VPC routes is to use them sparingly by making them forward pod traffic for multiple instances. From a networking perspective, what that means is that the VPC route needs to forward to a router, which can then forward traffic on to the final destination instance.

Romana is a CNI network provider that configures routes on the host to forward pod network traffic without an overlay. Since inter-node routes are installed on hosts, no VPC routes are necessary at all. However, when the VPC is split into subnets for an HA deployment across zones, VPC routes are necessary.

Fortunately, inter-node routes on hosts allows them to act as a network router and forward traffic inbound from another zone just as it would for traffic from local pods. This makes any Kubernetes node configured by Romana able to accept inbound pod traffic from other zones and forward it to the proper destination node on the subnet.

Because of this local routing function, top-level routes to pods on other instances on the subnet can be aggregated, collapsing the total number of routes necessary to as few as one per subnet. To avoid using a single instance to forward all traffic, more routes can be used to spread traffic across multiple instances, up to the maximum number of available routes (i.e. equivalent to kubenet).

The net result is that you can now build clusters of any size across AZs without an overlay. Romana clusters also support network policies for better security through network isolation.

Making it All Work

While the combination of aggregated routes and node forwarding on a subnet eliminates overlays and avoids the VPC 50 route limitation, it imposes certain requirements on the CNI provider. For example, hosts should be configured with inter-node routes only to other nodes in the same zone on the local subnet. Traffic to all other hosts must use the default route off host, then use the (aggregated) VPC route to forward traffic out of the zone. Also: when adding a new host, in order to maintain aggregated VPC routes, the CNI plugin needs to use IP addresses for pods that are reachable on the new host.

The latest release of Romana also addresses questions about how VPC routes are installed; what happens when a node that is forwarding traffic fails; how forwarding node failures are detected; and how routes get updated and the cluster recovers.

Romana v2.0 includes a new AWS route configuration function to set VPC routes. This is part of a new set of network advertising features that automate route configuration in L3 networks. Romana v2.0 includes topology-aware IP address management (IPAM) that enables VPC route aggregation to stay within the 50 route limit as described here, as well as new health checks to update VPC routes when a routing instance fails. For smaller clusters, Romana configures VPC routes as kubenet does, with a route to each instance, taking advantage of every available VPC route.

Native VPC Networking Everywhere

When using Romana v2.0, native VPC networking is now available for clusters of any size, with or without network policies and for HA production deployment split across multiple zones.

The preview release of Romana v2.0 is available here. We welcome comments and feedback so we can make EC2 deployments of Kubernetes as fast and reliable as possible.

Juergen Brendel and Chris Marino, co-founders of Pani Networks, sponsor of the Romana project

Kompose Helps Developers Move Docker Compose Files to Kubernetes

August 10 2017

Editor’s note: today’s post is by Charlie Drage, Software Engineer at Red Hat giving an update about the Kubernetes project Kompose.

I’m pleased to announce that Kompose, a conversion tool for developers to transition Docker Compose applications to Kubernetes, has graduated from the Kubernetes Incubator to become an official part of the project.

Since our first commit on June 27, 2016, Kompose has achieved 13 releases over 851 commits, gaining 21 contributors since the inception of the project. Our work started at Skippbox (now part of Bitnami) and grew through contributions from Google and Red Hat.

The Kubernetes Incubator allowed contributors to get to know each other across companies, as well as collaborate effectively under guidance from Kubernetes contributors and maintainers. Our incubation led to the development and release of a new and useful tool for the Kubernetes ecosystem.

We’ve created a reliable, scalable Kubernetes environment from an initial Docker Compose file. We worked hard to convert as many keys as possible to their Kubernetes equivalent. Running a single command gets you up and running on Kubernetes: kompose up.

We couldn’t have done it without feedback and contributions from the community!

If you haven’t yet tried Kompose on GitHub check it out!

Kubernetes guestbook

The go-to example for Kubernetes is the famous guestbook, which we use as a base for conversion.

Here is an example from the official kompose.io site, starting with a simple Docker Compose file).

First, we’ll retrieve the file:

$ wget https://raw.githubusercontent.com/kubernetes/kompose/master/examples/docker-compose.yaml

You can test it out by first deploying to Docker Compose:

$ docker-compose up -d

Creating network "examples\_default" with the default driver

Creating examples\_redis-slave\_1

Creating examples\_frontend\_1

Creating examples\_redis-master\_1

And when you’re ready to deploy to Kubernetes:

$ kompose up

We are going to create Kubernetes Deployments, Services and PersistentVolumeClaims for your Dockerized application.

If you need different kind of resources, use the kompose convert and kubectl create -f commands instead.

INFO Successfully created Service: redis          

INFO Successfully created Service: web            

INFO Successfully created Deployment: redis       

INFO Successfully created Deployment: web         

Your application has been deployed to Kubernetes. You can run kubectl get deployment,svc,pods,pvc for details

Check out other examples of what Kompose can do.

Converting to alternative Kubernetes controllers

Kompose can also convert to specific Kubernetes controllers with the use of flags:

$ kompose convert --help  


  kompose convert [file] [flags]

Kubernetes Flags:

      --daemon-set               Generate a Kubernetes daemonset object

  -d, --deployment               Generate a Kubernetes deployment object

  -c, --chart                    Create a Helm chart for converted objects

      --replication-controller   Generate a Kubernetes replication controller object


For example, let’s convert our guestbook example to a DaemonSet:

$ kompose convert --daemon-set

INFO Kubernetes file "frontend-service.yaml" created

INFO Kubernetes file "redis-master-service.yaml" created

INFO Kubernetes file "redis-slave-service.yaml" created

INFO Kubernetes file "frontend-daemonset.yaml" created

INFO Kubernetes file "redis-master-daemonset.yaml" created

INFO Kubernetes file "redis-slave-daemonset.yaml" created

Key Kompose 1.0 features

With our graduation, comes the release of Kompose 1.0.0, here’s what’s new:

- Docker Compose Version 3: Kompose now supports Docker Compose Version 3. New keys such as ‘deploy’ now convert to their Kubernetes equivalent. - Docker Push and Build Support: When you supply a ‘build’ key within your docker-compose.yaml file, Kompose will automatically build and push the image to the respective Docker repository for Kubernetes to consume. - New Keys: With the addition of version 3 support, new keys such as pid and deploy are supported. For full details on what Kompose supports, view our conversion document. - Bug Fixes: In every release we fix any bugs related to edge-cases when converting. This release fixes issues relating to converting volumes with ‘./’ in the target name.

What’s ahead?

As we continue development, we will strive to convert as many Docker Compose keys as possible for all future and current Docker Compose releases, converting each one to their Kubernetes equivalent. All future releases will be backwards-compatible.

- Install Kompose - Kompose Quick Start Guide - Kompose Web Site - Kompose Documentation

–Charlie Drage, Software Engineer, Red Hat

- Post questions (or answer questions) onStack Overflow - Join the community portal for advocates onK8sPort - Follow us on Twitter@Kubernetesio for latest updates - Connect with the community onSlack - Get involved with the Kubernetes project onGitHub

Happy Second Birthday: A Kubernetes Retrospective

July 28 2017

As we do every July, we’re excited to celebrate Kubernetes 2nd birthday! In the two years since GA 1.0 launched as an open source project, Kubernetes (abbreviated as K8s) has grown to become the highest velocity cloud-related project. With more than 2,611 diverse contributors, from independents to leading global companies, the project has had 50,685 commits in the last 12 months. Of the 54 million projects on GitHub, Kubernetes is in the top 5 for number of unique developers contributing code. It also has more pull requests and issue comments than any other project on GitHub.  

Screen Shot 2017-07-18 at 9.39.42 AM.png

Figure 1: Kubernetes Rankings

At the center of the community are Special Interest Groups with members from different companies and organizations, all with a common interest in a specific topic. Given how fast Kubernetes is growing, SIGs help nurture and distribute leadership, while advancing new proposals, designs and release updates. Here’s a look at the SIG building blocks supporting Kubernetes:

Screen Shot 2017-07-18 at 11.32.07 AM.png

Kubernetes has also earned the trust of many Fortune 500 companies with deployments at Box, Comcast, Pearson, GolfNow, eBay, Ancestry.com and contributions from CoreOS, Fujitsu, Google, Huawei, Mirantis, Red Hat, Weaveworks and ZTE Company and others. Today, on the second anniversary of the Kubernetes 1.0 launch, we take a look back at some of the major accomplishments of the last year:

July 2016

- Kubernauts celebrated its first anniversary of the Kubernetes 1.0 launch with 20 #k8sbday parties hosted worldwide - Kubernetes v1.3 release

September 2016

- Kubernetes v1.4 release - Launch of kubeadm, a tool that makes Kubernetes dramatically easier to install - Pokemon Go - one of the largest installs of Kubernetes ever

October 2016

- Introduced Kubernetes service partners program and a redesigned partners page

November 2016

- CloudNativeCon/KubeCon Seattle - Cloud Native Computing Foundation partners with The Linux Foundation to launch a new Kubernetes certification, training and managed service provider program

December 2016

- Kubernetes v1.5 release

January 2017

- Survey from CloudNativeCon + KubeCon Seattle showcases the maturation of Kubernetes deployment

March 2017

- CloudNativeCon/KubeCon Europe - Kubernetesv1.6 release

April 2017

- The Battery Open Source Software (BOSS) Index lists Kubernetes as #33 in the top 100 popular open-source software projects

May 2017

- Four Kubernetes projects accepted to The Google Summer of Code (GSOC) 2017 program - Stutterstock and Kubernetes appear in The Wall Street Journal: “On average we [Shutterstock] deploy 45 different releases into production a day using that framework. We use Docker, Kubernetes and Jenkins [to build and run containers and automate development,” said CTO Marty Brodbeck on the company’s IT overhaul and adoption of containerization.

June 2017

- Kubernetes v1.7 release - Survey from CloudNativeCon + KubeCon Europe shows Kubernetes leading as the orchestration platform of choice - Kubernetes ranked #4 in the 30 highest velocity open source projects

Figure 2: The 30 highest velocity open source projects. Source: https://github.com/cncf/velocity

July 2017

- Kubernauts celebrate the second anniversary of the Kubernetes 1.0 launch with #k8sbday parties worldwide!

At the one year anniversary of the Kubernetes 1.0 launch, there were 130 Kubernetes-related Meetup groups. Today, there are more than 322 Meetup groups with 104,195 members. Local Meetups around the world joined the #k8sbday celebration! Take a look at some of the pictures from their celebrations. We hope you’ll join us at CloudNativeCon + KubeCon, December 6- 8 in Austin, TX.

Celebrating at the K8s birthday party in San Francisco


Celebrating in RTP, NC with a presentation from Jason McGee, VP and CTO, IBM Cloud Platform. Photo courtesy of @FranklyBriana


The Kubernetes Singapore meetup celebrating with an intro to GKE. Photo courtesy of @hunternield

New York celebrated with mini k8s cupcakes and a presentation on the history of cloud native from CNCF Executive Director, Dan Kohn. Photo courtesy of @arieljatib and @coreos

Quebec City had custom k8s cupcakes too! Photo courtesy of @zig_max

Beijing celebrated with custom k8s lollipops. Photo courtesy of @maxwell9215

– Sarah Novotny, Program Manager, Kubernetes Community 

@Kubernetesio View on Github #kubernetes-users Stack Overflow Download Kubernetes