Kubernetes Blog

Updates to Performance and Scalability in Kubernetes 1.3 -- 2,000 node 60,000 pod clusters

July 07 2016

We are proud to announce that with the release of version 1.3, Kubernetes now supports 2000-node clusters with even better end-to-end pod startup time. The latency of our API calls are within our one-second Service Level Objective (SLO) and most of them are even an order of magnitude better than that. It is possible to run larger deployments than a 2,000 node cluster, but performance may be degraded and it may not meet our strict SLO.

In this blog post we discuss the detailed performance results from Kubernetes 1.3 and what changes we made from version 1.2 to achieve these results. We also describe Kubemark, a performance testing tool that we’ve integrated into our continuous testing framework to detect performance and scalability regressions.

Evaluation Methodology

We have described our test scenarios in a previous blog post. The biggest change since the 1.2 release is that in our API responsiveness tests we now create and use multiple namespaces. In particular for the 2000-node/60000 pod cluster tests we create 8 namespaces. The change was done because we believe that users of such very large clusters are likely to use many namespaces, certainly at least 8 in the cluster in total.

Metrics from Kubernetes 1.3

So, what is the performance of Kubernetes version 1.3? The following graph shows the end-to-end pod startup latency with a 2000 and 1000 node cluster. For comparison we show the same metric from Kubernetes 1.2 with a 1000-node cluster.

The next graphs show API response latency for a v1.3 2000-node cluster.

How did we achieve these improvements?

The biggest change that we made for scalability in Kubernetes 1.3 was adding an efficient Protocol Buffer-based serialization format to the API as an alternative to JSON. It is primarily intended for communication between Kubernetes control plane components, but all API server clients can use this format. All Kubernetes control plane components now use it for their communication, but the system continues to support JSON for backward compatibility.

We didn’t change the format in which we store cluster state in etcd to Protocol Buffers yet, as we’re still working on the upgrade mechanism. But we’re very close to having this ready, and we expect to switch the storage format to Protocol Buffers in Kubernetes 1.4. Our experiments show that this should reduce pod startup end-to-end latency by another 30%.

How do we test Kubernetes at scale?

Spawning clusters with 2000 nodes is expensive and time-consuming. While we need to do this at least once for each release to collect real-world performance and scalability data, we also need a lighter-weight mechanism that can allow us to quickly evaluate our ideas for different performance improvements, and that we can run continuously to detect performance regressions. To address this need we created a tool call “Kubemark.”

What is “Kubemark”?

Kubemark is a performance testing tool which allows users to run experiments on emulated clusters. We use it for measuring performance in large clusters.

A Kubemark cluster consists of two parts: a real master node running the normal master components, and a set of “hollow” nodes. The prefix “hollow” means an implementation/instantiation of a component with some “moving parts” mocked out. The best example is hollow-kubelet, which pretends to be an ordinary Kubelet, but doesn’t start any containers or mount any volumes. It just claims it does, so from master components’ perspective it behaves like a real Kubelet.

Since we want a Kubemark cluster to be as similar to a real cluster as possible, we use the real Kubelet code with an injected fake Docker client. Similarly hollow-proxy (KubeProxy equivalent) reuses the real KubeProxy code with injected no-op Proxier interface (to avoid mutating iptables).

Thanks to those changes

  • many hollow-nodes can run on a single machine, because they are not modifying the environment in which they are running
  • without real containers running and the need for a container runtime (e.g. Docker), we can run up to 14 hollow-nodes on a 1-core machine.
  • yet hollow-nodes generate roughly the same load on the API server as their “whole” counterparts, so they provide a realistic load for performance testing [the only fundamental difference is that we are not simulating any errors that can happens in reality (e.g. failing containers) - adding support for this is a potential extension to the framework in the future]

How do we set up Kubemark clusters?

To create a Kubemark cluster we use the power the Kubernetes itself gives us - we run Kubemark clusters on Kubernetes. Let’s describe this in detail.

In order to create a N-node Kubemark cluster, we:

  • create a regular Kubernetes cluster where we can run N hollow-nodes [e.g. to create 2000-node Kubemark cluster, we create a regular Kubernetes cluster with 22 8-core nodes]
  • create a dedicated VM, where we start all master components for our Kubemark cluster (etcd, apiserver, controllers, scheduler, …). 
  • schedule N “hollow-node” pods on the base Kubernetes cluster. Those hollow-nodes are configured to talk to the Kubemark API server running on the dedicated VM
  • finally, we create addon pods (currently just Heapster) by scheduling them on the base cluster and configuring them to talk to the Kubemark API server Once this done, you have a usable Kubemark cluster that you can run your (performance) tests on. We have scripts for doing all of this on Google Compute Engine (GCE). For more details, take a look at our guide.

One thing worth mentioning here is that while running Kubemark, underneath we’re also testing Kubernetes correctness. Obviously your Kubemark cluster will not work correctly if the base Kubernetes cluster under it doesn’t work. 

Performance measured in real clusters vs Kubemark

Crucially, the performance of Kubemark clusters is mostly similar to the performance of real clusters. For the pod startup end-to-end latency, as shown in the graph below, the difference is negligible:

For the API-responsiveness, the differences are higher, though generally less than 2x. However, trends are exactly the same: an improvement/regression in a real cluster is visible as a similar percentage drop/increase in metrics in Kubemark.

Conclusion

We continue to improve the performance and scalability of Kubernetes. In this blog post we 
showed that the 1.3 release scales to 2000 nodes while meeting our responsiveness SLOs
explained the major change we made to improve scalability from the 1.2 release, and 
described Kubemark, our emulation framework that allows us to quickly evaluate the performance impact of code changes, both when experimenting with performance improvement ideas and to detect regressions as part of our continuous testing infrastructure.

Please join our community and help us build the future of Kubernetes! If you’re particularly interested in scalability, participate by:

For more information about the Kubernetes project, visit kubernetes.io and follow us on Twitter @Kubernetesio

– Wojciech Tyczynski, Software Engineer, Google

Kubernetes 1.3: Bridging Cloud Native and Enterprise Workloads

July 06 2016

Nearly two years ago, when we officially kicked off the Kubernetes project, we wanted to simplify distributed systems management and provide the core technology required to everyone. The community’s response to this effort has blown us away. Today, thousands of customers, partners and developers are running clusters in production using Kubernetes and have joined the cloud native revolution. 

Thanks to the help of over 800 contributors, we are pleased to announce today the availability of Kubernetes 1.3, our most robust and feature-rich release to date.

As our users scale their production deployments we’ve heard a clear desire to deploy services across cluster, zone and cloud boundaries. We’ve also heard a desire to run more workloads in containers, including stateful services. In this release, we’ve worked hard to address these two problems, while making it easier for new developers and enterprises to use Kubernetes to manage distributed systems at scale.

Product highlights in Kubernetes 1.3 include the ability to bridge services across multiple clouds (including on-prem), support for multiple node types, integrated support for stateful services (such as key-value stores and databases), and greatly simplified cluster setup and deployment on your laptop. Now, developers at organizations of all sizes can build production scale apps more easily than ever before.

What’s new:

  • Increased scale and automation - Customers want to scale their services up and down automatically in response to application demand. In 1.3 we have made it easier to autoscale clusters up and down while doubling the maximum number of nodes per cluster. Customers no longer need to think about cluster size, and can allow the underlying cluster to respond to demand.

  • Cross-cluster federated services - Customers want their services to span one or more (possibly remote) clusters, and for them to be reachable in a consistent manner from both within and outside their clusters. Services that span clusters have higher availability, provide geographic distribution and enable hybrid and multi-cloud scenarios. Kubernetes 1.3 introduces cross-cluster service discovery so containers, and external clients can consistently resolve to services irrespective of whether they are running partially or completely in other clusters.

  • Stateful applications - Customers looking to use containers for stateful workloads (such as databases or key value stores) will find a new ‘PetSet’ object with raft of alpha features, including:

    • Permanent hostnames that persist across restarts
    • Automatically provisioned persistent disks per container that live beyond the life of a container
    • Unique identities in a group to allow for clustering and leader election
    • Initialization containers which are critical for starting up clustered applications
  • Ease of use for local development - Developers want an easy way to learn to use Kubernetes. In Kubernetes 1.3 we are introducing Minikube, where with one command a developer can start a local Kubernetes cluster on their laptop that is API compatible with a full Kubernetes cluster. This enable developers to test locally, and push to their Kubernetes clusters when they are ready.
  • Support for rkt and container standards OCI & CNI - Kubernetes is an extensible and modular orchestration platform. Part of what has made Kubernetes successful is our commitment to giving customers access to the latest container technologies that best suit their environment. In Kubernetes 1.3 we support emerging standards such as the Container Network Interface (CNI) natively, and have already taken steps to the Open Container Initiative (OCI), which is still being ratified. We are also introducing rkt as an alternative container runtime in Kubernetes node, with a first-class integration between rkt and the kubelet. This allows Kubernetes users to take advantage of some of rkt’s unique features.
  • Updated Kubernetes dashboard UI - Customers can now use the Kubernetes open source dashboard for the majority of interactions with their clusters, rather than having to use the CLI. The updated UI lets users control, edit and create all workload resources (including Deployments and PetSets).
  • And many more. For a complete list of updates, see the release notes on GitHub.

Community

We could not have achieved this milestone without the tireless effort of countless people that are part of the Kubernetes community. We have 19 different Special Interest Groups, and over 100 meetups around the world. Kubernetes is a community project, built in the open, and it truly would not be possible without the over 233 person-years of effort the community has put in to date. Woot!

Availability

Kubernetes 1.3 is available for download at get.k8s.io and via the open source repository hosted on GitHub. To get started with Kubernetes try our Hello World app.

To learn the latest about the project, we encourage everyone to join the weekly community meeting or watch a recorded hangout

Connect

We’d love to hear from you and see you participate in this growing community:

  • Get involved with the Kubernetes project on GitHub 
  • Post questions (or answer questions) on Stackoverflow 
  • Connect with the community on Slack
  • Follow us on Twitter @Kubernetesio for latest updates

Thank you for your support! 

– Aparna Sinha, Product Manager, Google

Container Design Patterns

June 21 2016

Kubernetes automates deployment, operations, and scaling of applications, but our goals in the Kubernetes project extend beyond system management – we want Kubernetes to help developers, too. Kubernetes should make it easy for them to write the distributed applications and services that run in cloud and datacenter environments. To enable this, Kubernetes defines not only an API for administrators to perform management actions, but also an API for containerized applications to interact with the management platform.

Our work on the latter is just beginning, but you can already see it manifested in a few features of Kubernetes. For example:

  • The “graceful termination” mechanism provides a callback into the container a configurable amount of time before it is killed (due to a rolling update, node drain for maintenance, etc.). This allows the application to cleanly shut down, e.g. persist in-memory state and cleanly conclude open connections.
  • Liveness and readiness probes check a configurable application HTTP endpoint (other probe types are supported as well) to determine if the container is alive and/or ready to receive traffic. The response determines whether Kubernetes will restart the container, include it in the load-balancing pool for its Service, etc.
  • ConfigMap allows applications to read their configuration from a Kubernetes resource rather than using command-line flags.

More generally, we see Kubernetes enabling a new generation of design patterns, similar to object oriented design patterns, but this time for containerized applications. That design patterns would emerge from containerized architectures is not surprising – containers provide many of the same benefits as software objects, in terms of modularity/packaging, abstraction, and reuse. Even better, because containers generally interact with each other via HTTP and widely available data formats like JSON, the benefits can be provided in a language-independent way.

This week Kubernetes co-founder Brendan Burns is presenting a paper outlining our thoughts on this topic at the 8th Usenix Workshop on Hot Topics in Cloud Computing (HotCloud ‘16), a venue where academic researchers and industry practitioners come together to discuss ideas at the forefront of research in private and public cloud technology. The paper describes three classes of patterns: management patterns (such as those described above), patterns involving multiple cooperating containers running on the same node, and patterns involving containers running across multiple nodes. We don’t want to spoil the fun of reading the paper, but we will say that you’ll see that the Pod abstraction is a key enabler for the last two types of patterns.

As the Kubernetes project continues to bring our decade of experience with Borg to the open source community, we aim not only to make application deployment and operations at scale simple and reliable, but also to make it easy to create “cloud-native” applications in the first place. Our work on documenting our ideas around design patterns for container-based services, and Kubernetes’s enabling of such patterns, is a first step in this direction. We look forward to working with the academic and practitioner communities to identify and codify additional patterns, with the aim of helping containers fulfill the promise of bringing increased simplicity and reliability to the entire software lifecycle, from development, to deployment, to operations.

To learn more about the Kubernetes project visit kubernetes.io or chat with us on Slack at slack.kubernetes.io.

-_-Brendan Burns and David Oppenheimer, Software Engineers, Google_

The Illustrated Children's Guide to Kubernetes

June 09 2016

Kubernetes is an open source project with a growing community. We love seeing the ways that our community innovates inside and on top of Kubernetes. Deis is an excellent example of company who understands the strategic impact of strong container orchestration. They contribute directly to the project; in associated subprojects; and, delightfully, with a creative endeavor to help our user community understand more about what Kubernetes is. Want to contribute to Kubernetes? One way is to get involved here and help us with code. But, please don’t consider that the only way to contribute. This little adventure that Deis takes us is an example of how open source isn’t only code. 

Have your own Kubernetes story you’d like to tell, let us know!
– @sarahnovotny Community Wonk, Kubernetes project.

Guest post is by Beau Vrolyk, CEO of Deis, the open source Kubernetes-native PaaS.

Over at Deis, we’ve been busy building open source tools for Kubernetes. We’re just about to finish up moving our easy-to-use application platform to Kubernetes and couldn’t be happier with the results. In the Kubernetes project we’ve found not only a growing and vibrant community but also a well-architected system, informed by years of experience running containers at scale. 

But that’s not all! As we’ve decomposed, ported, and reborn our PaaS as a Kubernetes citizen; we found a need for tools to help manage all of the ephemera that comes along with building and running Kubernetes-native applications. The result has been open sourced as Helm and we’re excited to see increasing adoption and growing excitement around the project.

There’s fun in the Deis offices too – we like to add some character to our  architecture diagrams and pull requests. This time, literally. Meet Phippy–the intrepid little PHP app–and her journey to Kubernetes. What better way to talk to your parents, friends, and co-workers about this Kubernetes thing you keep going on about, than a little story time. We give to you The Illustrated Children’s Guide to Kubernetes, conceived of and narrated by our own Matt Butcher and lovingly illustrated by Bailey Beougher. Join the fun on YouTube and tweet @opendeis to win your own copy of the book or a squishy little Phippy of your own.

Bringing End-to-End Kubernetes Testing to Azure (Part 1)

June 06 2016

Today’s guest post is by Travis Newhouse, Chief Architect at AppFormix, writing about their experiences bringing Kubernetes to Azure.

At AppFormix, continuous integration testing is part of our culture. We see many benefits to running end-to-end tests regularly, including minimizing regressions and ensuring our software works together as a whole. To ensure a high quality experience for our customers, we require the ability to run end-to-end testing not just for our application, but for the entire orchestration stack. Our customers are adopting Kubernetes as their container orchestration technology of choice, and they demand choice when it comes to where their containers execute, from private infrastructure to public providers, including Azure. After several weeks of work, we are pleased to announce we are contributing a nightly, continuous integration job that executes e2e tests on the Azure platform. After running the e2e tests each night for only a few weeks, we have already found and fixed two issues in Kubernetes. We hope our contribution of an e2e job will help the community maintain support for the Azure platform as Kubernetes evolves.

In this blog post, we describe the journey we took to implement deployment scripts for the Azure platform. The deployment scripts are a prerequisite to the e2e test job we are contributing, as the scripts make it possible for our e2e test job to test the latest commits to the Kubernetes master branch. In a subsequent blog post, we will describe details of the e2e tests that will help maintain support for the Azure platform, and how to contribute federated e2e test results to the Kubernetes project.

BACKGROUND

While Kubernetes is designed to operate on any IaaS, and solution guides exist for many platforms including Google Compute Engine, AWS, Azure, and Rackspace, the Kubernetes project refers to these as “versioned distros,” as they are only tested against a particular binary release of Kubernetes. On the other hand, “development distros” are used daily by automated, e2e tests for the latest Kubernetes source code, and serve as gating checks to code submission.

When we first surveyed existing support for Kubernetes on Azure, we found documentation for running Kubernetes on Azure using CoreOS and Weave. The documentation includes scripts for deployment, but the scripts do not conform to the cluster/kube-up.sh framework for automated cluster creation required by a “development distro.” Further, there did not exist a continuous integration job that utilized the scripts to validate Kubernetes using the end-to-end test scenarios (those found in test/e2e in the Kubernetes repository).

With some additional investigation into the project history (side note: git log –all –grep=’azure’ –oneline was quite helpful), we discovered that there previously existed a set of scripts that integrated with the cluster/kube-up.sh framework. These scripts were discarded on October 16, 2015 (commit 8e8437d) because the scripts hadn’t worked since before Kubernetes version 1.0. With these commits as a starting point, we set out to bring the scripts up to date, and create a supported continuous integration job that will aid continued maintenance.

CLUSTER DEPLOYMENT SCRIPTS

To setup a Kubernetes cluster with Ubuntu VMs on Azure, we followed the groundwork laid by the previously abandoned commit, and tried to leverage the existing code as much as possible. The solution uses SaltStack for deployment and OpenVPN for networking between the master and the minions. SaltStack is also used for configuration management by several other solutions, such as AWS, GCE, Vagrant, and Vsphere. Resurrecting the discarded commit was a starting point, but we soon realized several key elements that needed attention:

  • Install Docker and Kubernetes on the nodes using SaltStack
  • Configure authentication for services
  • Configure networking

The cluster setup scripts ensure Docker is installed, copy the Kubernetes Docker images to the master and minions nodes, and load the images. On the master node, SaltStack launches kubelet, which in turn launches the following Kubernetes services running in containers: kube-api-server, kube-scheduler, and kube-controller-manager. On each of the minion nodes, SaltStack launches kubelet, which starts kube-proxy.

Kubernetes services must authenticate when communicating with each other. For example, minions register with the kube-api service on the master. On the master node, scripts generate a self-signed certificate and key that kube-api uses for TLS. Minions are configured to skip verification of the kube-api’s (self-signed) TLS certificate. We configure the services to use username and password credentials. The username and password are generated by the cluster setup scripts, and stored in the kubeconfig file on each node.

Finally, we implemented the networking configuration. To keep the scripts parameterized and minimize assumptions about the target environment, the scripts create a new Linux bridge device (cbr0), and ensure that all containers use that interface to access the network. To configure networking, we use OpenVPN to establish tunnels between master and minion nodes. For each minion, we reserve a /24 subnet to use for its pods. Azure assigned each node its own IP address. We also added the necessary routing table entries for this bridge to use OpenVPN interfaces. This is required to ensure pods in different hosts can communicate with each other. The routes on the master and minion are the following:

master
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

10.8.0.0        10.8.0.2        255.255.255.0   UG    0      0        0 tun0

10.8.0.2        0.0.0.0         255.255.255.255 UH    0      0        0 tun0

10.244.1.0      10.8.0.2        255.255.255.0   UG    0      0        0 tun0

10.244.2.0      10.8.0.2        255.255.255.0   UG    0      0        0 tun0

172.18.0.0      0.0.0.0         255.255.0.0     U     0      0        0 cbr0
minion-1
10.8.0.0        10.8.0.5        255.255.255.0   UG    0      0        0 tun0

10.8.0.5        0.0.0.0         255.255.255.255 UH    0      0        0 tun0

10.244.1.0      0.0.0.0         255.255.255.0   U     0      0        0 cbr0

10.244.2.0      10.8.0.5        255.255.255.0   UG    0      0        0 tun0
minion-2
10.8.0.0        10.8.0.9        255.255.255.0   UG    0      0        0 tun0

10.8.0.9        0.0.0.0         255.255.255.255 UH    0      0        0 tun0

10.244.1.0      10.8.0.9        255.255.255.0   UG    0      0        0 tun0

10.244.2.0      0.0.0.0         255.255.255.0   U     0      0        0 cbr0  
 
  Figure 1 - OpenVPN network configuration

FUTURE WORK With the deployment scripts implemented, a subset of e2e test cases are passing on the Azure platform. Nightly results are published to the Kubernetes test history dashboard. Weixu Zhuang made a pull request on Kubernetes GitHub, and we are actively working with the Kubernetes community to merge the Azure cluster deployment scripts necessary for a nightly e2e test job. The deployment scripts provide a minimal working environment for Kubernetes on Azure. There are several next steps to continue the work, and we hope the community will get involved to achieve them.

  • Only a subset of the e2e scenarios are passing because some cloud provider interfaces are not yet implemented for Azure, such as load balancer and instance information. To this end, we seek community input and help to define an Azure implementation of the cloudprovider interface (pkg/cloudprovider/). These interfaces will enable features such as Kubernetes pods being exposed to the external network and cluster DNS.
  • Azure has new APIs for interacting with the service. The submitted scripts currently use the Azure Service Management APIs, which are deprecated. The Azure Resource Manager APIs should be used in the deployment scripts. The team at AppFormix is pleased to contribute support for Azure to the Kubernetes community. We look forward to feedback about how we can work together to improve Kubernetes on Azure.

Editor’s Note: Want to _contribute to Kubernetes, get involved here. Have your own Kubernetes story you’d like to tell, let us know!_

Part II is available here.

Hypernetes: Bringing Security and Multi-tenancy to Kubernetes

May 24 2016

Today’s guest post is written by Harry Zhang and Pengfei Ni, engineers at HyperHQ, describing a new hypervisor based container called HyperContainer

While many developers and security professionals are comfortable with Linux containers as an effective boundary, many users need a stronger degree of isolation, particularly for those running in a multi-tenant environment. Sadly, today, those users are forced to run their containers inside virtual machines, even one VM per container.

Unfortunately, this results in the loss of many of the benefits of a cloud-native deployment: slow startup time of VMs; a memory tax for every container; low utilization resulting in wasting resources.

In this post, we will introduce HyperContainer, a hypervisor based container and see how it naturally fits into the Kubernetes design, and enables users to serve their customers directly with virtualized containers, instead of wrapping them inside of full blown VMs.

HyperContainer

HyperContainer is a hypervisor-based container, which allows you to launch Docker images with standard hypervisors (KVM, Xen, etc.). As an open-source project, HyperContainer consists of an OCI compatible runtime implementation, named runV, and a management daemon named hyperd. The idea behind HyperContainer is quite straightforward: to combine the best of both virtualization and container.

We can consider containers as two parts (as Kubernetes does). The first part is the container runtime, where HyperContainer uses virtualization to achieve execution isolation and resource limitation instead of namespaces and cgroups. The second part is the application data, where HyperContainer leverages Docker images. So in HyperContainer, virtualization technology makes it possible to build a fully isolated sandbox with an independent guest kernel (so things like top and /proc all work), but from developer’s view, it’s portable and behaves like a standard container.

HyperContainer as Pod

The interesting part of HyperContainer is not only that it is secure enough for multi-tenant environments (such as a public cloud), but also how well it fits into the Kubernetes philosophy.

One of the most important concepts in Kubernetes is Pods. The design of Pods is a lesson learned (Borg paper section 8.1) from real world workloads, where in many cases people want an atomic scheduling unit composed of multiple containers (please check this example for further information). In the context of Linux containers, a Pod wraps and encapsulates several containers into a logical group. But in HyperContainer, the hypervisor serves as a natural boundary, and Pods are introduced as first-class objects:

HyperContainer wraps a Pod of light-weight application containers and exposes the container interface at Pod level. Inside the Pod, a minimalist Linux kernel called HyperKernel is booted. This HyperKernel is built with a tiny Init service called HyperStart. It will act as the PID 1 process and creates the Pod, setup Mount namespace, and launch apps from the loaded images.

This model works nicely with Kubernetes. The integration of HyperContainer with Kubernetes, as we indicated in the title, is what makes up the Hypernetes project.

Hypernetes

One of the best parts of Kubernetes is that it is designed to support multiple container runtimes, meaning users are not locked-in to a single vendor. We are very pleased to announce that we have already begun working with the Kubernetes team to integrate HyperContainer into Kubernetes upstream. This integration involves:

  1. container runtime optimizing and refactoring
  2. new client-server mode runtime interface
  3. containerd integration to support runV

The OCI standard and kubelet’s multiple runtime architecture make this integration much easier even though HyperContainer is not based on Linux container technology stack.

On the other hand, in order to run HyperContainers in multi-tenant environment, we also created a new network plugin and modified an existing volume plugin. Since Hypernetes runs Pod as their own VMs, it can make use of your existing IaaS layer technologies for multi-tenant network and persistent volumes. The current Hypernetes implementation uses standard Openstack components.

Below we go into further details about how all those above are implemented.

Identity and Authentication

In Hypernetes we chose Keystone to manage different tenants and perform identification and authentication for tenants during any administrative operation. Since Keystone comes from the OpenStack ecosystem, it works seamlessly with the network and storage plugins we used in Hypernetes.

Multi-tenant Network Model

For a multi-tenant container cluster, each tenant needs to have strong network isolation from each other tenant. In Hypernetes, each tenant has its own Network. Instead of configuring a new network using OpenStack, which is complex, with Hypernetes, you just create a Network object like below.

apiVersion: v1  
kind: Network  
metadata:  
  name: net1  
spec:  
  tenantID: 065f210a2ca9442aad898ab129426350  
  subnets:  
    subnet1:  
      cidr: 192.168.0.0/24  
      gateway: 192.168.0.1

Note that the tenantID is supplied by Keystone. This yaml will automatically create a new Neutron network with a default router and a subnet 192.168.0.0/24.

A Network controller will be responsible for the life-cycle management of any Network instance created by the user. This Network can be assigned to one or more Namespaces, and any Pods belonging to the same Network can reach each other directly through IP address.

apiVersion: v1  
kind: Namespace  
metadata:  
  name: ns1  
spec:  
  network: net1

If a Namespace does not have a Network spec, it will use the default Kubernetes network model instead, including the default kube-proxy. So if a user creates a Pod in a Namespace with an associated Network, Hypernetes will follow the Kubernetes Network Plugin Model to set up a Neutron network for this Pod. Here is a high level example:

A Hypernetes Network Workflow.png

Hypernetes uses a standalone gRPC handler named kubestack to translate the Kubernetes Pod request into the Neutron network API. Moreover, kubestack is also responsible for handling another important networking feature: a multi-tenant Service proxy.

In a multi-tenant environment, the default iptables-based kube-proxy can not reach the individual Pods, because they are isolated into different networks. Instead, Hypernetes uses a built-in HAproxy in every HyperContainer as the portal. This HAproxy will proxy all the Service instances in the namespace of that Pod. Kube-proxy will be responsible for updating these backend servers by following the standard OnServiceUpdate and OnEndpointsUpdate processes, so that users will not notice any difference. A downside of this method is that HAproxy has to listen to some specific ports which may conflicts with user’s containers.That’s why we are planning to use LVS to replace this proxy in the next release.

With the help of the Neutron based network plugin, the Hypernetes Service is able to provide an OpenStack load balancer, just like how the “external” load balancer does on GCE. When user creates a Service with external IPs, an OpenStack load balancer will be created and endpoints will be automatically updated through the kubestack workflow above.

Persistent Storage

When considering storage, we are actually building a tenant-aware persistent volume in Kubernetes. The reason we decided not to use existing Cinder volume plugin of Kubernetes is that its model does not work in the virtualization case. Specifically:

The Cinder volume plugin requires OpenStack as the Kubernetes provider.

The OpenStack provider will find on which VM the target Pod is running on

Cinder volume plugin will mount a Cinder volume to a path inside the host VM of Kubernetes.

The kubelet will bind mount this path as a volume into containers of target Pod.

But in Hypernetes, things become much simpler. Thanks to the physical boundary of Pods, HyperContainer can mount Cinder volumes directly as block devices into Pods, just like a normal VM. This mechanism eliminates extra time to query Nova to find out the VM of target Pod in the existing Cinder volume workflow listed above.

The current implementation of the Cinder plugin in Hypernetes is based on Ceph RBD backend, and it works the same as all other Kubernetes volume plugins, one just needs to remember to create the Cinder volume (referenced by volumeID below) beforehand.

apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx  
  labels:  
    app: nginx  
spec:  
  containers:  
  - name: nginx  
    image: nginx  
    ports:  
    - containerPort: 80  
    volumeMounts:  
    - name: nginx-persistent-storage  
      mountPath: /var/lib/nginx  
  volumes:  
  - name: nginx-persistent-storage  
    cinder:  
      volumeID: 651b2a7b-683e-47e1-bdd6-e3c62e8f91c0  
      fsType: ext4

So when the user provides a Pod yaml with a Cinder volume, Hypernetes will check if kubelet is using the Hyper container runtime. If so, the Cinder volume can be mounted directly to the Pod without any extra path mapping. Then the volume metadata will be passed to the Kubelet RunPod process as part of HyperContainer spec. Done!

Thanks to the plugin model of Kubernetes network and volume, we can easily build our own solutions above for HyperContainer though it is essentially different from the traditional Linux container. We also plan to propose these solutions to Kubernetes upstream by following the CNI model and volume plugin standard after the runtime integration is completed.

We believe all of these open source projects are important components of the container ecosystem, and their growth depends greatly on the open source spirit and technical vision of the Kubernetes team.

Conclusion

This post introduces some of the technical details about HyperContainer and the Hypernetes project. We hope that people will be interested in this new category of secure container and its integration with Kubernetes. If you are looking to try out Hypernetes and HyperContainer, we have just announced the public beta of our new secure container cloud service (Hyper_), which is built on these technologies. But even if you are running on-premise, we believe that Hypernetes and HyperContainer will let you run Kubernetes in a more secure way.

~Harry Zhang and Pengfei Ni, engineers at HyperHQ

CoreOS Fest 2016: CoreOS and Kubernetes Community meet in Berlin (& San Francisco)

May 03 2016

CoreOS Fest 2016 will bring together the container and open source distributed systems community, including many thought leaders in the Kubernetes space. It is the second annual CoreOS community conference, held for the first time in Berlin on May 9th and 10th. CoreOS believes Kubernetes is the container orchestration component to deliver GIFEE (Google’s Infrastructure for Everyone Else).

At this year’s CoreOS Fest, there are tracks dedicated to Kubernetes where you’ll hear about various topics ranging from Kubernetes performance and scalability, continuous delivery and Kubernetes, rktnetes, stackanetes and more. In addition, there will be a variety of talks, from introductory workshops to deep-dives into all things containers and related software.

Don’t miss these great speaker sessions at the conference in Berlin :

If you can’t make it to Berlin, Kubernetes is also a focal point at the CoreOS Fest San Franciscosatellite event, a one day event dedicated to CoreOS and Kubernetes. In fact, Tim Hockin, senior staff engineer at Google and one of the creators of Kubernetes, will be kicking off the day with a keynote dedicated to Kubernetes updates.

San Francisco sessions dedicated to Kubernetes include:

  • Tim Hockin’s keynote address, Senior Staff Engineer at Google
  • When rkt meets Kubernetes: a troubleshooting tale by Loris Degioanni, CEO of Sysdig
  • rktnetes: what’s new with container runtimes and Kubernetes by Derek Gonyeo, Software Engineer at CoreOS
  • Magical Security Sprinkles: Secure, Resilient Microservices on CoreOS and Kubernetes by Oliver Gould, CTO of Buoyant

Kubernetes Workshop in SF : Getting Started with Kubernetes, hosted at Google San Francisco office (345 Spear St - 7th floor) by Google Developer Program Engineers Carter Morgan and Bill Prin on Tuesday May 10th from 9:00am to 1:00pm, lunch will be served afterwards. Limited seats, please RSVP for free here.

Get your tickets :

Learn more at: coreos.com/fest/ and on Twitter @CoreOSFest #CoreOSFest

– Sarah Novotny, Kubernetes Community Manager

@Kubernetesio View on Github #kubernetes-users Stack Overflow Download Kubernetes