Kubernetes Blog

Running MongoDB on Kubernetes with StatefulSets

January 30 2017

Editor’s note: Today’s post is by Sandeep Dinesh, Developer Advocate, Google Cloud Platform, showing how to run a database in a container.

Conventional wisdom says you can’t run a database in a container. “Containers are stateless!” they say, and “databases are pointless without state!”

Of course, this is not true at all. At Google, everything runs in a container, including databases. You just need the right tools. Kubernetes 1.5 includes the new StatefulSet API object (in previous versions, StatefulSet was known as PetSet). With StatefulSets, Kubernetes makes it much easier to run stateful workloads such as databases.

If you’ve followed my previous posts, you know how to create a MEAN Stack app with Docker, then migrate it to Kubernetes to provide easier management and reliability, and create a MongoDB replica set to provide redundancy and high availability.

While the replica set in my previous blog post worked, there were some annoying steps that you needed to follow. You had to manually create a disk, a ReplicationController, and a service for each replica. Scaling the set up and down meant managing all of these resources manually, which is an opportunity for error, and would put your stateful application at risk In the previous example, we created a Makefile to ease the management of of these resources, but it would have been great if Kubernetes could just take care of all of this for us.

With StatefulSets, these headaches finally go away. You can create and manage your MongoDB replica set natively in Kubernetes, without the need for scripts and Makefiles. Let’s take a look how.

Note: StatefulSets are currently a beta resource. The sidecar container used for auto-configuration is also unsupported.

Prerequisites and Setup

Before we get started, you’ll need a Kubernetes 1.5+ and the Kubernetes command line tool. If you want to follow along with this tutorial and use Google Cloud Platform, you also need the Google Cloud SDK.

Once you have a Google Cloud project created and have your Google Cloud SDK setup (hint: gcloud init), we can create our cluster.

To create a Kubernetes 1.5 cluster, run the following command:

gcloud container clusters create "test-cluster"

This will make a three node Kubernetes cluster. Feel free to customize the command as you see fit.

Then, authenticate into the cluster:

gcloud container clusters get-credentials test-cluster

Setting up the MongoDB replica set

To set up the MongoDB replica set, you need three things: A StorageClass, a Headless Service, and a StatefulSet.

I’ve created the configuration files for these already, and you can clone the example from GitHub:

git clone https://github.com/thesandlord/mongo-k8s-sidecar.git

cd /mongo-k8s-sidecar/example/StatefulSet/

To create the MongoDB replica set, run these two commands:

kubectl apply -f googlecloud\_ssd.yaml

kubectl apply -f mongo-statefulset.yaml

That’s it! With these two commands, you have launched all the components required to run an highly available and redundant MongoDB replica set.

At an high level, it looks something like this:

Let’s examine each piece in more detail.

StorageClass

The storage class tells Kubernetes what kind of storage to use for the database nodes. You can set up many different types of StorageClasses in a ton of different environments. For example, if you run Kubernetes in your own datacenter, you can use GlusterFS. On GCP, your storage choices are SSDs and hard disks. There are currently drivers for AWS, Azure, Google Cloud, GlusterFS, OpenStack Cinder, vSphere, Ceph RBD, and Quobyte.

The configuration for the StorageClass looks like this:

kind: StorageClass  
apiVersion: storage.k8s.io/v1beta1  
metadata:  
 name: fast  
provisioner: kubernetes.io/gce-pd  
parameters:  
 type: pd-ssd

This configuration creates a new StorageClass called “fast” that is backed by SSD volumes. The StatefulSet can now request a volume, and the StorageClass will automatically create it!

Deploy this StorageClass:

kubectl apply -f googlecloud\_ssd.yaml

Headless Service

Now you have created the Storage Class, you need to make a Headless Service. These are just like normal Kubernetes Services, except they don’t do any load balancing for you. When combined with StatefulSets, they can give you unique DNS addresses that let you directly access the pods! This is perfect for creating MongoDB replica sets, because our app needs to connect to all of the MongoDB nodes individually.

The configuration for the Headless Service looks like this:

apiVersion: v1

kind: Service

metadata:

 name: mongo

 labels:

 name: mongo

spec:

 ports:

 - port: 27017

 targetPort: 27017

 clusterIP: None

 selector:

 role: mongo

You can tell this is a Headless Service because the clusterIP is set to “None.” Other than that, it looks exactly the same as any normal Kubernetes Service.

StatefulSet

The pièce de résistance. The StatefulSet actually runs MongoDB and orchestrates everything together. StatefulSets differ from Kubernetes ReplicaSets (not to be confused with MongoDB replica sets!) in certain ways that makes them more suited for stateful applications. Unlike Kubernetes ReplicaSets, pods created under a StatefulSet have a few unique attributes. The name of the pod is not random, instead each pod gets an ordinal name. Combined with the Headless Service, this allows pods to have stable identification. In addition, pods are created one at a time instead of all at once, which can help when bootstrapping a stateful system. You can read more about StatefulSets in the documentation.

Just like before, this “sidecar” container will configure the MongoDB replica set automatically. A “sidecar” is a helper container which helps the main container do its work.

The configuration for the StatefulSet looks like this:

apiVersion: apps/v1beta1

kind: StatefulSet

metadata:

 name: mongo

spec:

 serviceName: "mongo"

 replicas: 3

 template:

 metadata:

 labels:

   role: mongo

   environment: test

 spec:

 terminationGracePeriodSeconds: 10

 containers:

   - name: mongo

     image: mongo

     command:

       - mongod

       - "--replSet"

       - rs0

       - "--smallfiles"

       - "--noprealloc"

     ports:

       - containerPort: 27017

     volumeMounts:

       - name: mongo-persistent-storage

         mountPath: /data/db

   - name: mongo-sidecar

     image: cvallance/mongo-k8s-sidecar

     env:

       - name: MONGO\_SIDECAR\_POD\_LABELS

         value: "role=mongo,environment=test"

 volumeClaimTemplates:

 - metadata:

 name: mongo-persistent-storage

 annotations:

   volume.beta.kubernetes.io/storage-class: "fast"

 spec:

 accessModes: ["ReadWriteOnce"]

 resources:

   requests:

     storage: 100Gi

It’s a little long, but fairly straightforward.

The first second describes the StatefulSet object. Then, we move into the Metadata section, where you can specify labels and the number of replicas.

Next comes the pod spec. The terminationGracePeriodSeconds is used to gracefully shutdown the pod when you scale down the number of replicas, which is important for databases! Then the configurations for the two containers is shown. The first one runs MongoDB with command line flags that configure the replica set name. It also mounts the persistent storage volume to /data/db, the location where MongoDB saves its data. The second container runs the sidecar.

Finally, there is the volumeClaimTemplates. This is what talks to the StorageClass we created before to provision the volume. It will provision a 100 GB disk for each MongoDB replica.

Using the MongoDB replica set

At this point, you should have three pods created in your cluster. These correspond to the three nodes in your MongoDB replica set. You can see them with this command:

kubectl get pods

NAME   READY STATUS RESTARTS AGE

mongo-0 2/2  Running 0     3m

mongo-1 2/2  Running 0     3m

mongo-2 2/2  Running 0     3m

Each pod in a StatefulSet backed by a Headless Service will have a stable DNS name. The template follows this format: <pod-name>.<service-name>

This means the DNS names for the MongoDB replica set are:

mongo-0.mongo

mongo-1.mongo

mongo-2.mongo

You can use these names directly in the connection string URI of your app.

In this case, the connection string URI would be:

“mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/dbname\_?”

That’s it!

Scaling the MongoDB replica set

A huge advantage of StatefulSets is that you can scale them just like Kubernetes ReplicaSets. If you want 5 MongoDB Nodes instead of 3, just run the scale command:

kubectl scale --replicas=5 statefulset mongo

The sidecar container will automatically configure the new MongoDB nodes to join the replica set.

Include the two new nodes (mongo-3.mongo & mongo-4.mongo) in your connection string URI and you are good to go. Too easy!

Cleaning Up

To clean up the deployed resources, delete the StatefulSet, Headless Service, and the provisioned volumes.

Delete the StatefulSet:

kubectl delete statefulset mongo

Delete the Service:

kubectl delete svc mongo

Delete the Volumes:

kubectl delete pvc -l role=mongo

Finally, you can delete the test cluster:

gcloud container clusters delete "test-cluster"

Happy Hacking!

For more cool Kubernetes and Container blog posts, follow me on Twitter and Medium.

–Sandeep Dinesh, Developer Advocate, Google Cloud Platform.

Fission: Serverless Functions as a Service for Kubernetes

January 30 2017

Editor’s note: Today’s post is by Soam Vasani, Software Engineer at Platform9 Systems, talking about a new open source Serverless Function (FaaS) framework for Kubernetes. 

Fission is a Functions as a Service (FaaS) / Serverless function framework built on Kubernetes.

Fission allows you to easily create HTTP services on Kubernetes from functions. It works at the source level and abstracts away container images (in most cases). It also simplifies the Kubernetes learning curve, by enabling you to make useful services without knowing much about Kubernetes.

To use Fission, you simply create functions and add them with a CLI. You can associate functions with HTTP routes, Kubernetes events, or other triggers. Fission supports NodeJS and Python today.

Functions are invoked when their trigger fires, and they only consume CPU and memory when they’re running. Idle functions don’t consume any resources except storage.

Why make a FaaS framework on Kubernetes?

We think there’s a need for a FaaS framework that can be run on diverse infrastructure, both in public clouds and on-premise infrastructure. Next, we had to decide whether to build it from scratch, or on top of an existing orchestration system. It was quickly clear that we shouldn’t build it from scratch – we would just end up having to re-invent cluster management, scheduling, network management, and lots more.

Kubernetes offered a powerful and flexible orchestration system with a comprehensive API backed by a strong and growing community. Building on it meant Fission could leave container orchestration functionality to Kubernetes, and focus on FaaS features.

The other reason we don’t want a separate FaaS cluster is that FaaS works best in combination with other infrastructure. For example, it may be the right fit for a small REST API, but it needs to work with other services to store state. FaaS also works great as a mechanism for event handlers to handle notifications from storage, databases, and from Kubernetes itself. Kubernetes is a great platform for all these services to interoperate on top of.

Deploying and Using Fission

Fission can be installed with a kubectl create command: see the project README for instructions.

Here’s how you’d write a “hello world” HTTP service:

$ cat \> hello.py

def main(context):

 &nbsp;&nbsp;&nbsp;print "Hello, world!"


$ fission function create --name hello --env python --code hello.py --route /hello


$ curl http://\<fission router\>/hello

Hello, world!

Fission takes care of loading the function into a container, routing the request to it, and so on. We go into more details in the next section.

How Fission Is Implemented on Kubernetes

At its core, a FaaS framework must (1) turn functions into services and (2) manage the lifecycle of these services.

There are a number of ways to achieve these goals, and each comes with different tradeoffs. Should the framework operate at the source-level, or at the level of Docker images (or something in-between, like “buildpacks”)? What’s an acceptable amount of overhead the first time a function runs? Choices made here affect platform flexibility, ease of use, resource usage and costs, and of course, performance. 

Packaging, source code, and images

One of our goals is to make Fission very easy to use for new users. We chose to operate
at the source level, so that users can avoid having to deal with container image building, pushing images to registries, managing registry credentials, image versioning, and so on.

However, container images are the most flexible way to package apps. A purely source-level interface wouldn’t allow users to package binary dependencies, for example.

So, Fission goes with a hybrid approach – container images that contain a dynamic loader for functions. This approach allows most users to use Fission purely at the source level, but enables them to customize the container image when needed.

These images, called “environment images” in Fission, contain the runtime for the language (such as NodeJS or Python), a set of commonly used dependencies and a dynamic loader for functions. If these dependencies are sufficient for the function the user is writing, no image rebuilding is needed. Otherwise, the list of dependencies can be modified, and the image rebuilt.

These environment images are the only language-specific parts of Fission. They present a uniform interface to the rest of the framework. This design allows Fission to be easily extended to more languages.

Cold start performance

One of the goals of the serverless functions is that functions use CPU/memory resources only when running. This optimizes the resource cost of functions, but it comes at the cost of some performance overhead when starting from idle (the “cold start” overhead).

Cold start overhead is important in many use cases. In particular, with functions used in an interactive use case – like a web or mobile app, where a user is waiting for the action to complete – several-second cold start latencies would be unacceptable.

To optimize cold start overheads, Fission keeps a running pool of containers for each environment. When a request for a function comes in, Fission doesn’t have to deploy a new container – it just chooses one that’s already running, copies the function into the container, loads it dynamically, and routes the request to that instance. The overhead of this process takes on the order of 100msec for NodeJS and Python functions.

How Fission works on Kubernetes

fission-arch.png Fission is designed as a set of microservices. A Controller keeps track of functions, HTTP
routes, event triggers, and environment images. A Pool Manager manages pools of idle environment containers, the loading of functions into these containers, and the killing of function instances when they’re idle. A Router receives HTTP requests and routes them to function instances, requesting an instance from the Pool Manager if necessary.

The controller serves the fission API. All the other components watch the controller for updates. The router is exposed as a Kubernetes Service of the LoadBalancer or NodePort type, depending on where the Kubernetes cluster is hosted.

When the router gets a request, it looks up a cache to see if this request already has a service it should be routed to. If not, it looks up the function to map the request to, and requests the poolmgr for an instance. The poolmgr has a pool of idle pods; it picks one, loads the function into it (by sending a request into a sidecar container in the pod), and returns the address of the pod to the router. The router  proxies over the request to this pod. The pod is cached for subsequent requests, and if it’s been idle for a few minutes, it is killed.

(For now, Fission maps one function to one container; autoscaling to multiple instances is planned for a later release. Re-using function pods to host multiple functions is also planned, for cases where isolation isn’t a requirement.)

Use Cases for Fission

Bots, Webhooks, REST APIs 
Fission is a good framework to make small REST APIs, implement webhooks, and write chatbots for Slack or other services.

As an example of a simple REST API, we’ve made a small guestbook app that uses functions for reading and writing to guestbook, and works with a redis instance to keep track of state. You can find the app in the Fission GitHub repo.

The app contains two end points – the GET endpoint lists out guestbook entries from redis and renders them into HTML. The POST endpoint adds a new entry to the guestbook list in redis. That’s all there is – there’s no Dockerfile to manage, and updating the app is as simple as calling fission function update. 

Handling Kubernetes Events
Fission also supports triggering functions based on Kubernetes watches. For example, you can setup a function to watch for all pods in a certain namespace matching a certain label. The function gets the serialized object and the watch event type (added/removed/updated) in its context.

These event handler functions could be used for simple monitoring – for example, you could send a slack message whenever a new service is added to the cluster. There are also more complex use cases, such as writing a custom controller by watching Kubernetes’ Third Party Resources.

Status and Roadmap

Fission is in early alpha for now (Jan 2017). It’s not ready for production use just yet. We’re looking for early adopters and feedback.

What’s ahead for Fission? We’re working on making FaaS on Kubernetes more convenient, easy to use and easier to integrate with. In the coming months we’re working on adding support for unit testing, integration with Git, function monitoring and log aggregation. We’re also working on integration with other sources of events.

Creating more language environments is also in the works. NodeJS and Python are supported today. Preliminary support for C# .NET has been contributed by Klavs Madsen.

You can find our current roadmap on our GitHub issues and projects.

Get Involved

Fission is open source and developed in the open by Platform9 Systems. Check us out on GitHub, and join our slack channel if you’d like to chat with us. We’re also on twitter at @fissionio.

–Soam Vasani, Software Engineer, Platform9 Systems

How we run Kubernetes in Kubernetes aka Kubeception

January 20 2017

Editor’s note: Today’s post is by the team at Giant Swarm, showing how they run Kubernetes in Kubernetes.

Giant Swarm’s container infrastructure started out with the goal to be an easy way for developers to deploy containerized microservices. Our first generation was extensively using fleet as a base layer for our infrastructure components as well as for scheduling user containers.

In order to give our users a more powerful way to manage their containers we introduced Kubernetes into our stack in early 2016. However, as we needed a quick way to flexibly spin up and manage different users’ Kubernetes clusters resiliently we kept the underlying fleet layer.

As we insist on running all our underlying infrastructure components in containers, fleet gave us the flexibility of using systemd unit files to define our infrastructure components declaratively. Our self-developed deployment tooling allowed us to deploy and manage the infrastructure without the need for imperative configuration management tools.

However, fleet is just a distributed init and not a complete scheduling and orchestration system. Next to a lot of work on our tooling, it required significant improvements in terms of communication between peers, its reconciliation loop, and stability that we had to work on. Also the uptake in Kubernetes usage would ensure that issues are found and fixed faster.

As we had made good experience with introducing Kubernetes on the user side and with recent developments like rktnetes and stackanetes it felt like time for us to also move our base layer to Kubernetes.

Why Kubernetes in Kubernetes

Now, you could ask, why would anyone want to run multiple Kubernetes clusters inside of a Kubernetes cluster? Are we crazy? The answer is advanced multi-tenancy use cases as well as operability and automation thereof.

Kubernetes comes with its own growing feature set for multi-tenancy use cases. However, we had the goal of offering our users a fully-managed Kubernetes without any limitations to the functionality they would get using any vanilla Kubernetes environment, including privileged access to the nodes. Further, in bigger enterprise scenarios a single Kubernetes cluster with its inbuilt isolation mechanisms is often not sufficient to satisfy compliance and security requirements. More advanced (firewalled) zoning or layered security concepts are tough to reproduce with a single installation. With namespace isolation both privileged access as well as firewalled zones can hardly be implemented without sidestepping security measures.

Now you could go and set up multiple completely separate (and federated) installations of Kubernetes. However, automating the deployment and management of these clusters would need additional tooling and complex monitoring setups. Further, we wanted to be able to spin clusters up and down on demand, scale them, update them, keep track of which clusters are available, and be able to assign them to organizations and teams flexibly. In fact this setup can be combined with a federation control plane to federate deployments to the clusters over one API endpoint.

And wouldn’t it be nice to have an API and frontend for that?

Enter Giantnetes

Based on the above requirements we set out to build what we call Giantnetes - or if you’re into movies, Kubeception. At the most basic abstraction it is an outer Kubernetes cluster (the actual Giantnetes), which is used to run and manage multiple completely isolated user Kubernetes clusters.

The physical machines are bootstrapped by using our CoreOS Container Linux bootstrapping tool, Mayu. The Giantnetes components themselves are self-hosted, i.e. a kubelet is in charge of automatically bootstrapping the components that reside in a manifests folder. You could call this the first level of Kubeception.

Once the Giantnetes cluster is running we use it to schedule the user Kubernetes clusters as well as our tooling for managing and securing them.

We chose Calico as the Giantnetes network plugin to ensure security, isolation, and the right performance for all the applications running on top of Giantnetes.

Then, to create the inner Kubernetes clusters, we initiate a few pods, which configure the network bridge, create certificates and tokens, and launch virtual machines for the future cluster. To do so, we use lightweight technologies such as KVM and qemu to provision CoreOS Container Linux VMs that become the nodes of an inner Kubernetes cluster. You could call this the second level of Kubeception. 

Currently this means we are starting Pods with Docker containers that in turn start VMs with KVM and qemu. However, we are looking into doing this with rkt qemu-kvm, which would result in using a rktnetes setup for our Giantnetes.

The networking solution for the inner Kubernetes clusters has two levels. It on a combination of flannel’s server/client architecture model and Calico BGP. While a flannel client is used to create the network bridge between the VMs of each virtualized inner Kubernetes cluster, Calico is running inside the virtual machines to connect the different Kubernetes nodes and create a single network for the inner Kubernetes. By using Calico, we mimic the Giantnetes networking solution inside of each Kubernetes cluster and provide the primitives to secure and isolate workloads through the Kubernetes network policy API.

Regarding security, we aim for separating privileges as much as possible and making things auditable. Currently this means we use certificates to secure access to the clusters and encrypt communication between all the components that form a cluster is (i.e. VM to VM, Kubernetes components to each other, etcd master to Calico workers, etc). For this we create a PKI backend per cluster and then issue certificates per service in Vault on-demand. Every component uses a different certificate, thus, avoiding to expose the whole cluster if any of the components or nodes gets compromised. We further rotate the certificates on a regular basis.

For ensuring access to the API and to services of each inner Kubernetes cluster from the outside we run a multi-level HAproxy ingress controller setup in the Giantnetes that connects the Kubernetes VMs to hardware load balancers.

Looking into Giantnetes with kubectl

Let’s have a look at a minimal sample deployment of Giantnetes.

Screen Shot 2016-11-14 at 12.08.40 PM.png

In the above example you see a user Kubernetes cluster customera running in VM-containers on top of Giantnetes. We currently use Jobs for the network and certificate setups.

Peeking inside the user cluster, you see the DNS pods and a helloworld running.

Screen Shot 2016-11-14 at 12.07.28 PM.png

Each one of these user clusters can be scheduled and used independently. They can be spun up and down on-demand.

Conclusion

To sum up, we could show how Kubernetes is able to easily not only self-host but also flexibly schedule a multitude of inner Kubernetes clusters while ensuring higher isolation and security aspects. A highlight in this setup is the composability and automation of the installation and the robust coordination between the Kubernetes components. This allows us to easily create, destroy, and reschedule clusters on-demand without affecting users or compromising the security of the infrastructure. It further allows us to spin up clusters with varying sizes and configurations or even versions by just changing some arguments at cluster creation. 

This setup is still in its early days and our roadmap is planning for improvements in many areas such as transparent upgrades, dynamic reconfiguration and scaling of clusters, performance improvements, and (even more) security. Furthermore, we are looking forward to improve on our setup by making use of the ever advancing state of Kubernetes operations tooling and upcoming features, such as Init Containers, Scheduled Jobs, Pod and Node affinity and anti-affinity, etc.

Most importantly, we are working on making the inner Kubernetes clusters a third party resource that can then be managed by a custom controller. The result would be much like the Operator concept by CoreOS. And to ensure that the community at large can benefit from this project we will be open sourcing this in the near future.

– Hector Fernandez, Software Engineer & Puja Abbassi, Developer Advocate, Giant Swarm

Scaling Kubernetes deployments with Policy-Based Networking

January 19 2017

Editor’s note: Today’s post is by Harmeet Sahni, Director of Product Management, at Nuage Networks, writing about their contributions to Kubernetes and insights on policy-based networking.  

Although it’s just been eighteen-months since Kubernetes 1.0 was released, we’ve seen Kubernetes emerge as the leading container orchestration platform for deploying distributed applications. One of the biggest reasons for this is the vibrant open source community that has developed around it. The large number of Kubernetes contributors come from diverse backgrounds means we, and the community of users, are assured that we are investing in an open platform. Companies like Google (Container Engine), Red Hat (OpenShift), and CoreOS (Tectonic) are developing their own commercial offerings based on Kubernetes. This is a good thing since it will lead to more standardization and offer choice to the users. 

Networking requirements for Kubernetes applications

For companies deploying applications on Kubernetes, one of biggest questions is how to deploy and orchestrate containers at scale. They’re aware that the underlying infrastructure, including networking and storage, needs to support distributed applications. Software-defined networking (SDN) is a great fit for such applications because the flexibility and agility of the networking infrastructure can match that of the applications themselves. The networking requirements of such applications include:

  • Network automation 
  • Distributed load balancing and service discovery
  • Distributed security with fine-grained policies
  • QoS Policies
  • Scalable Real-time Monitoring
  • Hybrid application environments with Services spread across Containers, VMs and Bare Metal Servers
  • Service Insertion (e.g. firewalls)
  • Support for Private and Public Cloud deployments

Kubernetes Networking

Kubernetes provides a core set of platform services exposed through APIs. The platform can be extended in several ways through the extensions API, plugins and labels. This has allowed a wide variety integrations and tools to be developed for Kubernetes. Kubernetes recognizes that the network in each deployment is going to be unique. Instead of trying to make the core system try to handle all those use cases, Kubernetes chose to make the network pluggable.

With Nuage Networks we provide a scalable policy-based SDN platform. The platform is managed by a Network Policy Engine that abstracts away the complexity associated with configuring the system. There is a separate SDN Controller that comes with a very rich routing feature set and is designed to scale horizontally. Nuage uses the open source Open vSwitch (OVS) for the data plane with some enhancements in the OVS user space. Just like Kubernetes, Nuage has embraced openness as a core tenet for its platform. Nuage provides open APIs that allow users to orchestrate their networks and integrate network services such as firewalls, load balancers, IPAM tools etc. Nuage is supported in a wide variety of cloud platforms like OpenStack and VMware as well as container platforms like Kubernetes and others.

The Nuage platform implements a Kubernetes network plugin that creates VXLAN overlays to provide seamless policy-based networking between Kubernetes Pods and non-Kubernetes environments (VMs and bare metal servers). Each Pod is given an IP address from a network that belongs to a Namespace and is not tied to the Kubernetes node.

As cloud applications are built using microservices, the ability to control traffic among these microservices is a fundamental requirement. It is important to point out that these network policies also need to control traffic that is going to/coming from external networks and services. Nuage’s policy abstraction model makes it easy to declare fine-grained ingress/egress policies for applications. Kubernetes has a beta Network Policy API implemented using the Kubernetes Extensions API. Nuage implements this Network Policy API to address a wide variety of policy use cases such as:

  • Kubernetes Namespace isolation
  • Inter-Namespace policies
  • Policies between groups of Pods (Policy Groups) for Pods in same or different Namespaces
  • Policies between Kubernetes Pods/Namespaces and external Networks/Services

A key question for users to consider is the scalability of the policy implementation. Some networking setups require creating access control list (ACL) entries telling Pods how they can interact with one another. In most cases, this eventually leads to an n-squared pileup of ACL entries. The Nuage platform avoids this problem and can quickly assign a policy that applies to a whole group of Pods. The Nuage platform implements these policies using a fully distributed stateful firewall based on OVS.

Being able to monitor the traffic flowing between Kubernetes Pods is very useful to both development and operations teams. The Nuage platform’s real-time analytics engine enables visibility and security monitoring for Kubernetes applications. Users can get a visual representation of the traffic flows between groups of Pods, making it easy to see how the network policies are taking effect. Users can also get a rich set of traffic and policy statistics. Further, users can set alerts to be triggered based on policy event thresholds.

Conclusion

Even though we started working on our integration with Kubernetes over a year ago, it feels we are just getting started. We have always felt that this is a truly open community and we want to be an integral part of it. You can find out more about our Kubernetes integration on our GitHub page.

–Harmeet Sahni, Director of Product Management, Nuage Networks

A Stronger Foundation for Creating and Managing Kubernetes Clusters

January 12 2017

Editor’s note: Today’s post is by Lucas Käldström an independent Kubernetes maintainer and SIG-Cluster-Lifecycle member, sharing what the group has been building and what’s upcoming. 

Last time you heard from us was in September, when we announced kubeadm. The work on making kubeadm a first-class citizen in the Kubernetes ecosystem has continued and evolved. Some of us also met before KubeCon and had a very productive meeting where we talked about what the scopes for our SIG, kubeadm, and kops are. 

Continuing to Define SIG-Cluster-Lifecycle

What is the scope for kubeadm?
We want kubeadm to be a common set of building blocks for all Kubernetes deployments; the piece that provides secure and recommended ways to bootstrap Kubernetes. Since there is no one true way to setup Kubernetes, kubeadm will support more than one method for each phase. We want to identify the phases every deployment of Kubernetes has in common and make configurable and easy-to-use kubeadm commands for those phases. If your organization, for example, requires that you distribute the certificates in the cluster manually or in a custom way, skip using kubeadm just for that phase. We aim to keep kubeadm useable for all other phases in that case. We want you to be able to pick which things you want kubeadm to do and let you do the rest yourself.

Therefore, the scope for kubeadm is to be easily extendable, modular and very easy to use. Right now, with this v1.5 release we have, kubeadm can only do the “full meal deal” for you. In future versions that will change as kubeadm becomes more componentized, while still leaving the opportunity to do everything for you. But kubeadm will still only handle the bootstrapping of Kubernetes; it won’t ever handle provisioning of machines for you since that can be done in many more ways. In addition, we want kubeadm to work everywhere, even on multiple architectures, therefore we built in multi-architecture support from the beginning.

What is the scope for kops?
The scope for kops is to automate full cluster operations: installation, reconfiguration of your cluster, upgrading kubernetes, and eventual cluster deletion. kops has a rich configuration model based on the Kubernetes API Machinery, so you can easily customize some parameters to your needs. kops (unlike kubeadm) handles provisioning of resources for you. kops aims to be the ultimate out-of-the-box experience on AWS (and perhaps other providers in the future). In the future kops will be adopting more and more of kubeadm for the bootstrapping phases that exist. This will move some of the complexity inside kops to a central place in the form of kubeadm.

What is the scope for SIG-Cluster-Lifecycle?
The SIG-Cluster-Lifecycle actively tries to simplify the Kubernetes installation and management story. This is accomplished by modifying Kubernetes itself in many cases, and factoring out common tasks. We are also trying to address common problems in the cluster lifecycle (like the name says!). We maintain and are responsible for kubeadm and kops. We discuss problems with the current way to bootstrap clusters on AWS (and beyond) and try to make it easier. We hangout on Slack in the #sig-cluster-lifecycle and #kubeadm channels. We meet and discuss current topics once a week on Zoom. Feel free to come and say hi! Also, don’t be shy to contribute; we’d love your comments and insight!

Looking forward to v1.6

Our goals for v1.6 are centered around refactoring, stabilization and security. 

First and foremost, we want to get kubeadm and its composable configuration experience to beta. We will refactor kubeadm so each phase in the bootstrap process is invokable separately. We want to bring the TLS Bootstrap API, the Certificates API and the ComponentConfig API to beta, and to get kops (and other tools) using them. 

We will also graduate the token discovery we’re using now (aka. the gcr.io/google_containers/kube-discovery:1.0 image) to beta by adding a new controller to the controller manager: the BootstrapSigner. Using tokens managed as Secrets, that controller will sign the contents (a kubeconfig file) of a well known ConfigMap in a new kube-public namespace. This object will be available to unauthenticated users in order to enable a secure bootstrap with a simple and short shared token.You can read the full proposal here.

In addition to making it possible to invoke phases separately, we will also add a new phase for bringing up the control plane in a self-hosted mode (as opposed to the current static pod technique). The self-hosted technique was developed by CoreOS in the form of bootkube, and will now be incorporated as an alternative into an official Kubernetes product. Thanks to CoreOS for pushing that paradigm forward! This will be done by first setting up a temporary control plane with static pods, injecting the Deployments, ConfigMaps and DaemonSets as necessary, and lastly turning down the temporary control plane. For now, etcd will still be in a static pod by default. 

We are supporting self hosting, initially, because we want to support doing patch release upgrades with kubeadm. It should be easy to upgrade from v1.6.2 to v1.6.4 for instance. We consider the built-in upgrade support a critical capability for a real cluster lifecycle tool. It will still be possible to upgrade without self-hosting but it will require more manual work.

On the stabilization front, we want to start running kubeadm e2e tests. In this v1.5 timeframe, we added unit tests and we will continue to increase that coverage. We want to expand this to per-PR e2e tests as well that spin up a cluster with kubeadm init and kubeadm join; runs some kubeadm-specific tests and optionally the Conformance test suite.

Finally, on the security front, we also want to kubeadm to be as secure as possible by default. We look to enable RBAC for v1.6, lock down what kubelet and built-in services like kube-dns and kube-proxy can do, and maybe create specific user accounts that have different permissions.

Regarding releasing, we want to have the official kubeadm v1.6 binary in the kubernetes v1.6 tarball. This means syncing our release with the official one. More details on what we’ve done so far can be found here. As it becomes possible, we aim to move the kubeadm code out to the kubernetes/kubeadm repo (This is blocked on some Kubernetes code-specific infrastructure issues that may take some time to resolve.)

Nice-to-haves for v1.6 would include an official CoreOS Container Linux installer container that does what the debs/rpms are doing for Ubuntu/CentOS. In general, it would be nice to extend the distro support. We also want to adopt Kubelet Dynamic Settings so configuration passed to kubeadm init flows down to nodes automatically (it requires manual configuration currently). We want it to be possible to test Kubernetes from HEAD by using kubeadm.

Through 2017 and beyond

Apart from everything mentioned above, we want kubeadm to simply be a production grade (GA) tool you can use for bootstrapping a Kubernetes cluster. We want HA/multi-master to be much easier to achieve generally than it is now across platforms (though kops makes this easy on AWS today!). We want cloud providers to be out-of-tree and installable separately. kubectl apply -f my-cloud-provider-here.yaml should just work. The documentation should be more robust and should go deeper. Container Runtime Interface (CRI) and Federation should work well with kubeadm. Outdated getting started guides should be removed so new users aren’t mislead.

Refactoring the cloud provider integration plugins
Right now, the cloud provider integrations are built into the controller-manager, the kubelet and the API Server. This combined with the ever-growing interest for Kubernetes makes it unmaintainable to have the cloud provider integrations compiled into the core. Features that are clearly vendor-specific should not be a part of the core Kubernetes project, rather available as an addon from third party vendors. Everything cloud-specific should be moved into one controller, or a few if there’s need. This controller will be maintained by a third-party (usually the company behind the integration) and will implement cloud-specific features. This migration from in-core to out-of-core is disruptive yes, but it has very good side effects: leaner core, making it possible for more than the seven existing clouds to be integrated with Kubernetes and much easier installation. For example, you could run the cloud controller binary in a Deployment and install it with kubectl apply easily.

The plan for v1.6 is to make it possible to:

  • Create and run out-of-core cloud provider integration controllers
  • Ship a new and temporary binary in the Kubernetes release: the cloud-controller-manager. This binary will include the seven existing cloud providers and will serve as a way of validating, testing and migrating to the new flow. In a future release (v1.9 is proposed), the --cloud-provider flag will stop working, and the temporary cloud-controller-manager binary won’t be shipped anymore. Instead, a repository called something like kubernetes/cloud-providers will serve as a place for officially-validated cloud providers to evolve and exist, but all providers there will be independent to each other. (issue #2770; proposal #128; code #3473.)

Changelogs from v1.4 to v1.5

kubeadm  

v1.5 is a stabilization release for kubeadm. We’ve worked on making kubeadm more user-friendly, transparent and stable. Some new features have been added making it more configurable.

Here’s a very short extract of what’s changed:

  • Made the console output of kubeadm cleaner and more user-friendly #37568
  • Implemented kubeadm reset and to drain and cleanup a node #34807 and #37831
  • Preflight checks implementation that fails fast if the environment is invalid #34341 and #36334
  • kubectl logs and kubectl exec can now be used with kubeadm #37568
  • and a lot of other improvements, please read the full changelog.

kops

Here’s a short extract of what’s changed:

  • Support for CNI network plugins (Weave, Calico, Kope.io)
  • Fully private deployments, where nodes and masters do not have public IPs
  • Improved rolling update of clusters, in particular of HA clusters
  • OS support for CentOS / RHEL / Ubuntu along with Debian, and support for sysdig & perf tools

Go and check out the kops releases page in order to get information about the latest and greatest kops release.

Summary

In short, we’re excited on the roadmap ahead in bringing a lot of these improvements to you in the coming releases. Which we hope will make the experience to start much easier and lead to increased adoption of Kubernetes.

Thank you for all the feedback and contributions. I hope this has given you some insight in what we’re doing and encouraged you to join us at our meetings to say hi!

Lucas Käldström, Independent Kubernetes maintainer and SIG-Cluster-Lifecycle member

Kubernetes UX Survey Infographic

January 09 2017

Editor’s note: Today’s post is by Dan Romlein, UX Designer at Apprenda and member of the SIG-UI, sharing UX survey results from the Kubernetes community. 

The following infographic summarizes the findings of a survey that the team behind Dashboard, the official web UI for Kubernetes, sent during KubeCon in November 2016. Following the KubeCon launch of the survey, it was promoted on Twitter and various Slack channels over a two week period and generated over 100 responses. We’re delighted with the data it provides us to now make feature and roadmap decisions more in-line with the needs of you, our users.

Satisfaction with Dashboard

Less than a year old, Dashboard is still very early in its development and we realize it has a long way to go, but it was encouraging to hear it’s tracking on the axis of MVP and even with its basic feature set is adding value for people. Respondents indicated that they like how quickly the Dashboard project is moving forward and the activity level of its contributors. Specific appreciation was given for the value Dashboard brings to first-time Kubernetes users and encouraging exploration. Frustration voiced around Dashboard centered on its limited capabilities: notably, the lack of RBAC and limited visualization of cluster objects and their relationships.

Respondent Demographics

Kubernetes Usage

People are using Dashboard in production, which is fantastic; it’s that setting that the team is focused on optimizing for.

Feature Priority

In building Dashboard, we want to continually make alignments between the needs of Kubernetes users and our product. Feature areas have intentionally been kept as high-level as possible, so that UX designers on the Dashboard team can creatively transform those use cases into specific features. While there’s nothing wrong with “faster horses”, we want to make sure we’re creating an environment for the best possible innovation to flourish.

Troubleshooting & Debugging as a strong frontrunner in requested feature area is consistent with the previous KubeCon survey, and this is now our top area of investment. Currently in-progress is the ability to be able to exec into a Pod, and next up will be providing aggregated logs views across objects. One of a UI’s strengths over a CLI is its ability to show things, and the troubleshooting and debugging feature area is a prime application of this capability.

In addition to a continued ongoing investment in troubleshooting and debugging functionality, the other focus of the Dashboard team’s efforts currently is RBAC / IAM within Dashboard. Though #4 on the ranking of feature areas, In various conversations at KubeCon and the days following, this emerged as a top-requested feature of Dashboard, and the one people were most passionate about. This is a deal-breaker for many companies, and we’re confident its enablement will open many doors for Dashboard’s use in production.

Conclusion

It’s invaluable to have data from Kubernetes users on how they’re putting Dashboard to use and how well it’s serving their needs. If you missed the survey response window but still have something you’d like to share, we’d love to connect with you and hear feedback or answer questions: 

Kubernetes supports OpenAPI

December 23 2016

Editor’s note: this post is part of a series of in-depth articles on what’s new in Kubernetes 1.5

OpenAPI allows API providers to define their operations and models, and enables developers to automate their tools and generate their favorite language’s client to talk to that API server. Kubernetes has supported swagger 1.2 (older version of OpenAPI spec) for a while, but the spec was incomplete and invalid, making it hard to generate tools/clients based on it.

In Kubernetes 1.4, we introduced alpha support for the OpenAPI spec (formerly known as swagger 2.0 before it was donated to the Open API Initiative) by upgrading the current models and operations. Beginning in Kubernetes 1.5, the support for the OpenAPI spec has been completed by auto-generating the spec directly from Kubernetes source, which will keep the spec–and documentation–completely in sync with future changes in operations/models.

The new spec enables us to have better API documentation and we have even introduced a supported python client.

The spec is modular, divided by GroupVersion: this is future-proof, since we intend to allow separate GroupVersions to be served out of separate API servers.

The structure of spec is explained in detail in OpenAPI spec definition. We used operation’s tags to separate each GroupVersion and filled as much information as we can about paths/operations and models. For a specific operation, all parameters, method of call, and responses are documented.

For example, OpenAPI spec for reading a pod information is:

{

...  
  "paths": {

"/api/v1/namespaces/{namespace}/pods/{name}": {  
    "get": {  
     "description": "read the specified Pod",  
     "consumes": [  
      "\*/\*"  
     ],  
     "produces": [  
      "application/json",  
      "application/yaml",  
      "application/vnd.kubernetes.protobuf"  
     ],  
     "schemes": [  
      "https"  
     ],  
     "tags": [  
      "core\_v1"  
     ],  
     "operationId": "readCoreV1NamespacedPod",  
     "parameters": [  
      {  
       "uniqueItems": true,  
       "type": "boolean",  
       "description": "Should the export be exact.  Exact export maintains cluster-specific fields like 'Namespace'.",  
       "name": "exact",  
       "in": "query"  
      },  
      {  
       "uniqueItems": true,  
       "type": "boolean",  
       "description": "Should this value be exported.  Export strips fields that a user can not specify.",  
       "name": "export",  
       "in": "query"  
      }  
     ],  
     "responses": {  
      "200": {  
       "description": "OK",  
       "schema": {  
        "$ref": "#/definitions/v1.Pod"  
       }  
      },  
      "401": {  
       "description": "Unauthorized"  
      }  
     }  
    },

…

}

…

Using this information and the URL of kube-apiserver, one should be able to make the call to the given url (/api/v1/namespaces/{namespace}/pods/{name}) with parameters such as name, exact, export, etc. to get pod’s information. Client libraries generators would also use this information to create an API function call for reading pod’s information. For example, python client makes it easy to call this operation like this:

from kubernetes import client

ret = client.CoreV1Api().read\_namespaced\_pod(name="pods\_name", namespace="default")

A simplified version of generated read_namespaced_pod, can be found here.

Swagger-codegen document generator would also be able to create documentation using the same information:

GET /api/v1/namespaces/{namespace}/pods/{name}

(readCoreV1NamespacedPod)

read the specified Pod

Path parameters

name (required)

Path Parameter — name of the Pod

namespace (required)

Path Parameter — object name and auth scope, such as for teams and projects

Consumes

This API call consumes the following media types via the Content-Type request header:

-
\*/\*


Query parameters

pretty (optional)

Query Parameter — If 'true', then the output is pretty printed.

exact (optional)

Query Parameter — Should the export be exact. Exact export maintains cluster-specific fields like 'Namespace'.

export (optional)

Query Parameter — Should this value be exported. Export strips fields that a user can not specify.

Return type

v1.Pod


Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

-
application/json
-
application/yaml
-
application/vnd.kubernetes.protobuf

Responses

200

OK v1.Pod

401

Unauthorized

There are two ways to access OpenAPI spec:

  • From kuber-apiserver/swagger.json. This file will have all enabled GroupVersions routes and models and would be most up-to-date file with an specific kube-apiserver.
  • From Kubernetes GitHub repository with all core GroupVersions enabled. You can access it on master or an specific release (for example 1.5 release).

There are numerous tools that works with this spec. For example, you can use the swagger editor to open the spec file and render documentation, as well as generate clients; or you can directly use swagger codegen to generate documentation and clients. The clients this generates will mostly work out of the box–but you will need some support for authorization and some Kubernetes specific utilities. Use python client as a template to create your own client.

If you want to get involved in development of OpenAPI support, client libraries, or report a bug, you can get in touch with developers at SIG-API-Machinery.

–Mehdy Bohlool, Software Engineer, Google

@Kubernetesio View on Github #kubernetes-users Stack Overflow Download Kubernetes