Kubernetes Blog

How Qbox Saved 50% per Month on AWS Bills Using Kubernetes and Supergiant

September 27 2016

Editor’s Note: Today’s post is by the team at Qbox, a hosted Elasticsearch provider sharing their experience with Kubernetes and how it helped save them fifty-percent off their cloud bill. 

A little over a year ago, we at Qbox faced an existential problem. Just about all of the major IaaS providers either launched or acquired services that competed directly with our Hosted Elasticsearch service, and many of them started offering it for free. The race to zero was afoot unless we could re-engineer our infrastructure to be more performant, more stable, and less expensive than the VM approach we had had before, and the one that is in use by our IaaS brethren. With the help of Kubernetes, Docker, and Supergiant (our own hand-rolled layer for managing distributed and stateful data), we were able to deliver 50% savings, a mid-five figure sum. At the same time, support tickets plummeted. We were so pleased with the results that we decided to open source Supergiant as its own standalone product. This post will demonstrate how we accomplished it.

Back in 2013, when not many were even familiar with Elasticsearch, we launched our as-a-service offering with a dedicated, direct VM model. We hand-selected certain instance types optimized for Elasticsearch, and users configured single-tenant, multi-node clusters running on isolated virtual machines in any region. We added a markup on the per-compute-hour price for the DevOps support and monitoring, and all was right with the world for a while as Elasticsearch became the global phenomenon that it is today.

As we grew to thousands of clusters, and many more thousands of nodes, it wasn’t just our AWS bill getting out of hand. We had 4 engineers replacing dead nodes and answering support tickets all hours of the day, every day. What made matters worse was the volume of resources allocated compared to the usage. We had thousands of servers with a collective CPU utilization under 5%. We were spending too much on processors that were doing absolutely nothing. 

How we got there was no great mystery. VM’s are a finite resource, and with a very compute-intensive, burstable application like Elasticsearch, we would be juggling the users that would either undersize their clusters to save money or those that would over-provision and overspend. When the aforementioned competitive pressures forced our hand, we had to re-evaluate everything.

Adopting Docker and Kubernetes
Our team avoided Docker for a while, probably on the vague assumption that the network and disk performance we had with VMs wouldn’t be possible with containers. That assumption turned out to be entirely wrong.

To run performance tests, we had to find a system that could manage networked containers and volumes. That’s when we discovered Kubernetes. It was alien to us at first, but by the time we had familiarized ourselves and built a performance testing tool, we were sold. It was not just as good as before, it was better.

The performance improvement we observed was due to the number of containers we could “pack” on a single machine. Ironically, we began the Docker experiment wanting to avoid “noisy neighbor,” which we assumed was inevitable when several containers shared the same VM. However, that isolation also acted as a bottleneck, both in performance and cost. To use a real-world example, If a machine has 2 cores and you need 3 cores, you have a problem. It’s rare to come across a public-cloud VM with 3 cores, so the typical solution is to buy 4 cores and not utilize them fully.

This is where Kubernetes really starts to shine. It has the concept of requests and limits, which provides granular control over resource sharing. Multiple containers can share an underlying host VM without the fear of “noisy neighbors”. They can request exclusive control over an amount of RAM, for example, and they can define a limit in anticipation of overflow. It’s practical, performant, and cost-effective multi-tenancy. We were able to deliver the best of both the single-tenant and multi-tenant worlds.

Kubernetes + Supergiant
We built Supergiant originally for our own Elasticsearch customers. Supergiant solves Kubernetes complications by allowing pre-packaged and re-deployable application topologies. In more specific terms, Supergiant lets you use Components, which are somewhat similar to a microservice. Components represent an almost-uniform set of Instances of software (e.g., Elasticsearch, MongoDB, your web application, etc.). They roll up all the various Kubernetes and cloud operations needed to deploy a complex topology into a compact entity that is easy to manage.

For Qbox, we went from needing 1:1 nodes to approximately 1:11 nodes. Sure, the nodes were larger, but the utilization made a substantial difference. As in the picture below, we could cram a whole bunch of little instances onto one big instance and not lose any performance. Smaller users would get the added benefit of higher network throughput by virtue of being on bigger resources, and they would also get greater CPU and RAM bursting.


Adding Up the Cost Savings
The packing algorithm in Supergiant, with its increased utilization, resulted in an immediate 25% drop in our infrastructure footprint. Remember, this came with better performance and fewer support tickets. We could dial up the packing algorithm and probably save even more money. Meanwhile, because our nodes were larger and far more predictable, we could much more fully leverage the economic goodness that is AWS Reserved Instances. We went with 1-year partial RI’s, which cut the remaining costs by 40%, give or take. Our customers still had the flexibility to spin up, down, and out their Elasticsearch nodes, without forcing us to constantly juggle, combine, split, and recombine our reservations. At the end of the day, we saved 50%. That is $600k per year that can go towards engineering salaries instead of enriching our IaaS provider. 

Kubernetes 1.4: Making it easy to run on Kubernetes anywhere

September 26 2016

Today we’re happy to announce the release of Kubernetes 1.4.

Since the release to general availability just over 15 months ago, Kubernetes has continued to grow and achieve broad adoption across the industry. From brand new startups to large-scale businesses, users have described how big a difference Kubernetes has made in building, deploying and managing distributed applications. However, one of our top user requests has been making Kubernetes itself easier to install and use. We’ve taken that feedback to heart, and 1.4 has several major improvements.

These setup and usability enhancements are the result of concerted, coordinated work across the community - more than 20 contributors from SIG-Cluster-Lifecycle came together to greatly simplify the Kubernetes user experience, covering improvements to installation, startup, certificate generation, discovery, networking, and application deployment.

Additional product highlights in this release include simplified cluster deployment on any cloud, easy installation of stateful apps, and greatly expanded Cluster Federation capabilities, enabling a straightforward deployment across multiple clusters, and multiple clouds.

What’s new:

Cluster creation with two commands - To get started with Kubernetes a user must provision nodes, install Kubernetes and bootstrap the cluster. A common request from users is to have an easy, portable way to do this on any cloud (public, private, or bare metal).

  • Kubernetes 1.4 introduces ‘kubeadm’ which reduces bootstrapping to two commands, with no complex scripts involved. Once kubernetes is installed, kubeadm init starts the master while kubeadm join joins the nodes to the cluster.
  • Installation is also streamlined by packaging Kubernetes with its dependencies, for most major Linux distributions including Red Hat and Ubuntu Xenial. This means users can now install Kubernetes using familiar tools such as apt-get and yum.
  • Add-on deployments, such as for an overlay network, can be reduced to one command by using a DaemonSet.
  • Enabling this simplicity is a new certificates API and its use for kubelet TLS bootstrap, as well as a new discovery API.

Expanded stateful application support - While cloud-native applications are built to run in containers, many existing applications need additional features to make it easy to adopt containers. Most commonly, these include stateful applications such as batch processing, databases and key-value stores. In Kubernetes 1.4, we have introduced a number of features simplifying the deployment of such applications, including: 

  • ScheduledJob is introduced as Alpha so users can run batch jobs at regular intervals.
  • Init-containers are Beta, addressing the need to run one or more containers before starting the main application, for example to sequence dependencies when starting a database or multi-tier app.
  • Dynamic PVC Provisioning moved to Beta. This feature now enables cluster administrators to expose multiple storage provisioners and allows users to select them using a new Storage Class API object.  
  • Curated and pre-tested Helm charts for common stateful applications such as MariaDB, MySQL and Jenkins will be available for one-command launches using version 2 of the Helm Package Manager.

Cluster federation API additions - One of the most requested capabilities from our global customers has been the ability to build applications with clusters that span regions and clouds. 

  • Federated Replica Sets Beta - replicas can now span some or all clusters enabling cross region or cross cloud replication. The total federated replica count and relative cluster weights / replica counts are continually reconciled by a federated replica-set controller to ensure you have the pods you need in each region / cloud.
  • Federated Services are now Beta, and secrets, events and namespaces have also been added to the federation API.
  • Federated Ingress Alpha - starting with Google Cloud Platform (GCP), users can create a single L7 globally load balanced VIP that spans services deployed across a federation of clusters within GCP. With Federated Ingress in GCP, external clients point to a single IP address and are sent to the closest cluster with usable capacity in any region or zone of the federation in GCP.

Container security support - Administrators of multi-tenant clusters require the ability to provide varying sets of permissions among tenants, infrastructure components, and end users of the system.

  • Pod Security Policy is a new object that enables cluster administrators to control the creation and validation of security contexts for pods/containers. Admins can associate service accounts, groups, and users with a set of constraints to define a security context.
  • AppArmor support is added, enabling admins to run a more secure deployment, and provide better auditing and monitoring of their systems. Users can configure a container to run in an AppArmor profile by setting a single field.

Infrastructure enhancements -  We continue adding to the scheduler, storage and client capabilities in Kubernetes based on user and ecosystem needs.

  • Scheduler - introducing inter-pod affinity and anti-affinity Alpha for users who want to customize how Kubernetes co-locates or spreads their pods. Also priority scheduling capability for cluster add-ons such as DNS, Heapster, and the Kube Dashboard.
  • Disruption SLOs - Pod Disruption Budget is introduced to limit impact of pods deleted by cluster management operations (such as node upgrade) at any one time.
  • Storage - New volume plugins for Quobyte and Azure Data Disk have been added.
  • Clients - Swagger 2.0 support is added, enabling non-Go clients.

Kubernetes Dashboard UI - lastly, a great looking Kubernetes Dashboard UI with 90% CLI parity for at-a-glance management.

For a complete list of updates see the release notes on GitHub. Apart from features the most impressive aspect of Kubernetes development is the community of contributors. This is particularly true of the 1.4 release, the full breadth of which will unfold in upcoming weeks.

Kubernetes 1.4 is available for download at get.k8s.io and via the open source repository hosted on GitHub. To get started with Kubernetes try the Hello World app.

To get involved with the project, join the weekly community meeting or start contributing to the project here (marked help). 

Users and Case Studies
Over the past fifteen months since the Kubernetes 1.0 GA release, the adoption and enthusiasm for this project has surpassed everyone’s imagination. Kubernetes runs in production at hundreds of organization and thousands more are in development. Here are a few unique highlights of companies running Kubernetes: 

  • Box accelerated their time to delivery from six months to launch a service to less than a week. Read more on how Box runs mission critical production services on Kubernetes.
  • Pearson minimized complexity and increased their engineer productivity. Read how Pearson is using Kubernetes to reinvent the world’s largest educational company. 
  • OpenAI a non-profit artificial intelligence research company, built infrastructure for deep learning with Kubernetes to maximize productivity for researchers allowing them to focus on the science.

We’re very grateful to our community of over 900 contributors who contributed more than 5,000 commits to make this release possible. To get a closer look on how the community is using Kubernetes, join us at the user conference KubeCon to hear directly from users and contributors.


Thank you for your support! 

– Aparna Sinha, Product Manager, Google

High performance network policies in Kubernetes clusters

September 21 2016

Editor’s note: today’s post is by Juergen Brendel, Pritesh Kothari and Chris Marino co-founders of Pani Networks, the sponsor of the Romana project, the network policy software used for these benchmark tests.

Network Policies

Since the release of Kubernetes 1.3 back in July, users have been able to define and enforce network policies in their clusters. These policies are firewall rules that specify permissible types of traffic to, from and between pods. If requested, Kubernetes blocks all traffic that is not explicitly allowed. Policies are applied to groups of pods identified by common labels. Labels can then be used to mimic traditional segmented networks often used to isolate layers in a multi-tier application: You might identify your front-end and back-end pods by a specific “segment” label, for example. Policies control traffic between those segments and even traffic to or from external sources.

Segmenting traffic

What does this mean for the application developer? At last, Kubernetes has gained the necessary capabilities to provide “defence in depth”. Traffic can be segmented and different parts of your application can be secured independently. For example, you can very easily protect each of your services via specific network policies: All the pods identified by a Replication Controller behind a service are already identified by a specific label. Therefore, you can use this same label to apply a policy to those pods.

Defense in depth has long been recommended as best practice. This kind of isolation between different parts or layers of an application is easily achieved on AWS and OpenStack by applying security groups to VMs.

However, prior to network policies, this kind of isolation for containers was not possible. VXLAN overlays can provide simple network isolation, but application developers need more fine grained control over the traffic accessing pods. As you can see in this simple example, Kubernetes network policies can manage traffic based on source and origin, protocol and port.

apiVersion: extensions/v1beta1  
kind: NetworkPolicy  
 name: pol1  
     role: backend  
 - from:  
   - podSelector:  
       role: frontend  
   - protocol: tcp  
     port: 80

Not all network backends support policies

Network policies are an exciting feature, which the Kubernetes community has worked on for a long time. However, it requires a networking backend that is capable of applying the policies. By themselves, simple routed networks or the commonly used flannel network driver, for example, cannot apply network policy.

There are only a few policy-capable networking backends available for Kubernetes today: Romana, Calico, and Canal; with Weave indicating support in the near future. Red Hat’s OpenShift includes network policy features as well.

We chose Romana as the back-end for these tests because it configures pods to use natively routable IP addresses in a full L3 configuration. Network policies, therefore, can be applied directly by the host in the Linux kernel using iptables rules. This results is a high performance, easy to manage network.

Testing performance impact of network policies

After network policies have been applied, network packets need to be checked against those policies to verify that this type of traffic is permissible. But what is the performance penalty for applying a network policy to every packet? Can we use all the great policy features without impacting application performance? We decided to find out by running some tests.

Before we dive deeper into these tests, it is worth mentioning that ‘performance’ is a tricky thing to measure, network performance especially so.

Throughput (i.e. data transfer speed measured in Gpbs) and latency (time to complete a request) are common measures of network performance. The performance impact of running an overlay network on throughput and latency has been examined previously here and here. What we learned from these tests is that Kubernetes networks are generally pretty fast, and servers have no trouble saturating a 1G link, with or without an overlay. It’s only when you have 10G networks that you need to start thinking about the overhead of encapsulation.

This is because during a typical network performance benchmark, there’s no application logic for the host CPU to perform, leaving it available for whatever network processing is required. For this reason we ran our tests in an operating range that did not saturate the link, or the CPU. This has the effect of isolating the impact of processing network policy rules on the host. For these tests we decided to measure latency as measured by the average time required to complete an HTTP request across a range of response sizes.

Test setup

  • Hardware: Two servers with Intel Core i5-5250U CPUs (2 core, 2 threads per core) running at 1.60GHz, 16GB RAM and 512GB SSD. NIC: Intel Ethernet Connection I218-V (rev 03)
  • Ubuntu 14.04.5
  • Kubernetes 1.3 for data collection (verified samples on v1.4.0-beta.5)
  • Romana v0.9.3.1
  • Client and server load test software

For the tests we had a client pod send 2,000 HTTP requests to a server pod. HTTP requests were sent by the client pod at a rate that ensured that neither the server nor network ever saturated. We also made sure each request started a new TCP session by disabling persistent connections (i.e. HTTP keep-alive). We ran each test with different response sizes and measured the average request duration time (how long does it take to complete a request of that size). Finally, we repeated each set of measurements with different policy configurations.

Romana detects Kubernetes network policies when they’re created, translates them to Romana’s own policy format, and then applies them on all hosts. Currently, Kubernetes network policies only apply to ingress traffic. This means that outgoing traffic is not affected.

First, we conducted the test without any policies to establish a baseline. We then ran the test again, increasing numbers of policies for the test’s network segment. The policies were of the common “allow traffic for a given protocol and port” format. To ensure packets had to traverse all the policies, we created a number of policies that did not match the packet, and finally a policy that would result in acceptance of the packet.

The table below shows the results, measured in milliseconds for different request sizes and numbers of policies:

Response Size

Policies .5k 1k 10k 100k 1M
0 0.732 0.738 1.077 2.532 10.487
10 0.744 0.742 1.084 2.570 10.556
50 0.745 0.755 1.086 2.580 10.566
100 0.762 0.770 1.104 2.640 10.597
200 0.783 0.783 1.147 2.652 10.677

What we see here is that, as the number of policies increases, processing network policies introduces a very small delay, never more than 0.2ms, even after applying 200 policies. For all practical purposes, no meaningful delay is introduced when network policy is applied. Also worth noting is that doubling the response size from 0.5k to 1.0k had virtually no effect. This is because for very small responses, the fixed overhead of creating a new connection dominates the overall response time (i.e. the same number of packets are transferred).

Note: .5k and 1k lines overlap at ~.8ms in the chart above

Even as a percentage of baseline performance, the impact is still very small. The table below shows that for the smallest response sizes, the worst case delay remains at 7%, or less, up to 200 policies. For the larger response sizes the delay drops to about 1%.

Response Size

Policies .5k 1k 10k 100k 1M
0 0.0% 0.0% 0.0% 0.0% 0.0%
10 -1.6% -0.5% -0.6% -1.5% -0.7%
50 -1.8% -2.3% -0.8% -1.9% -0.8%
100 -4.1% -4.3% -2.5% -4.3% -1.0%
200 -7.0% -6.1% -6.5% -4.7% -1.8%

What is also interesting in these results is that as the number of policies increases, we notice that larger requests experience a smaller relative (i.e. percentage) performance degradation.

This is because when Romana installs iptables rules, it ensures that packets belonging to established connection are evaluated first. The full list of policies only needs to be traversed for the first packets of a connection. After that, the connection is considered ‘established’ and the connection’s state is stored in a fast lookup table. For larger requests, therefore, most packets of the connection are processed with a quick lookup in the ‘established’ table, rather than a full traversal of all rules. This iptables optimization results in performance that is largely independent of the number of network policies.

Such ‘flow tables’ are common optimizations in network equipment and it seems that iptables uses the same technique quite effectively.

Its also worth noting that in practise, a reasonably complex application may configure a few dozen rules per segment. It is also true that common network optimization techniques like Websockets and persistent connections will improve the performance of network policies even further (especially for small request sizes), since connections are held open longer and therefore can benefit from the established connection optimization.

These tests were performed using Romana as the backend policy provider and other network policy implementations may yield different results. However, what these tests show is that for almost every application deployment scenario, network policies can be applied using Romana as a network back end without any negative impact on performance.

If you wish to try it for yourself, we invite you to check out Romana. In our GitHub repo you can find an easy to use installer, which works with AWS, Vagrant VMs or any other servers. You can use it to quickly get you started with a Romana powered Kubernetes or OpenStack cluster.

Creating a PostgreSQL Cluster using Helm

September 09 2016

Editor’s note: Today’s guest post is by Jeff McCormick, a developer at Crunchy Data, showing how to deploy a PostgreSQL cluster using Helm, a Kubernetes package manager.

Crunchy Data supplies a set of open source PostgreSQL and PostgreSQL related containers. The Crunchy PostgreSQL Container Suite includes containers that deploy, monitor, and administer the open source PostgreSQL database, for more details view this GitHub repository.

In this post we’ll show you how to deploy a PostgreSQL cluster using Helm, a Kubernetes package manager. For reference, the Crunchy Helm Chart examples used within this post are located here, and the pre-built containers can be found on DockerHub at this location.

This example will create the following in your Kubernetes cluster:

  • postgres master service
  • postgres replica service
  • postgres 9.5 master database (pod)
  • postgres 9.5 replica database (replication controller)


This example creates a simple Postgres streaming replication deployment with a master (read-write), and a single asynchronous replica (read-only). You can scale up the number of replicas dynamically.


The example is made up of various Chart files as follows:

values.yaml This file contains values which you can reference within the database templates allowing you to specify in one place values like database passwords
templates/master-pod.yaml The postgres master database pod definition. This file causes a single postgres master pod to be created.
templates/master-service.yaml The postgres master database has a service created to act as a proxy. This file causes a single service to be created to proxy calls to the master database.
templates/replica-rc.yaml The postgres replica database is defined by this file. This file causes a replication controller to be created which allows the postgres replica containers to be scaled up on-demand.
templates/replica-service.yaml This file causes the service proxy for the replica database container(s) to be created.


Install Helm according to their GitHub documentation and then install the examples as follows:

helm init

cd crunchy-containers/examples/kubehelm

helm install ./crunchy-postgres


After installing the Helm chart, you will see the following services:

kubectl get services  
crunchy-master   \<none\>        5432/TCP   1h  
crunchy-replica    \<none\>        5432/TCP   1h  
kubernetes     \<none\>        443/TCP    1h

It takes about a minute for the replica to begin replicating with the master. To test out replication, see if replication is underway with this command, enter password for the password when prompted:

psql -h crunchy-master -U postgres postgres -c 'table pg\_stat\_replication'

If you see a line returned from that query it means the master is replicating to the slave. Try creating some data on the master:

psql -h crunchy-master -U postgres postgres -c 'create table foo (id int)'

psql -h crunchy-master -U postgres postgres -c 'insert into foo values (1)'

Then verify that the data is replicated to the slave:

psql -h crunchy-replica -U postgres postgres -c 'table foo'

You can scale up the number of read-only replicas by running the following kubernetes command:

kubectl scale rc crunchy-replica --replicas=2

It takes 60 seconds for the replica to start and begin replicating from the master.

The Kubernetes Helm and Charts projects provide a streamlined way to package up complex applications and deploy them on a Kubernetes cluster. Deploying PostgreSQL clusters can sometimes prove challenging, but the task is greatly simplified using Helm and Charts.

–Jeff McCormick, Developer, Crunchy Data

Deploying to Multiple Kubernetes Clusters with kit

September 06 2016

Editor’s note: today’s guest post is by Chesley Brown, Full-Stack Engineer, at InVision, talking about how they build and open sourced kit to help them to continuously deploy updates to multiple clusters.

Our Docker journey at InVision may sound familiar. We started with Docker in our development environments, trying to get consistency there first. We wrangled our legacy monolith application into Docker images and streamlined our Dockerfiles to minimize size and amp the efficiency. Things were looking good. Did we learn a lot along the way? For sure. But at the end of it all, we had our entire engineering team working with Docker locally for their development environments. Mission accomplished! Well, not quite. Development was one thing, but moving to production was a whole other ballgame.

Along Came Kubernetes

Kubernetes came into our lives during our evaluation of orchestrators and schedulers last December. AWS ECS was still fresh and Docker had just released 1.9 (networking overlay release). We spent the month evaluating our choices, narrowing it down to native Docker tooling (Machine, Swarm, Compose), ECS and Kubernetes. Well, needless to say, Kubernetes was our clear winner and we started the new year moving headlong to leverage Kubernetes to get us to production. But it wasn’t long when we ran into a tiny complication…

Automated Deployments With A Catch

Here at InVision, we have a unique challenge. We just don’t have a single production environment running Kubernetes, but several, all needing automated updates via our CI/CD process. And although the code running on these environments was similar, the configurations were not. Things needed to work smoothly, automatically, as we couldn’t afford to add friction to the deploy process or encumber our engineering teams.

Having several near duplicate clusters could easily turn into a Kubernetes manifest nightmare. Anti-patterns galore, as we copy and paste 95% of the manifests to get a new cluster. Scalable? No. Headache? Yes. Keeping those manifests up-to-date and accurate would be a herculean (and error-prone) task. We needed something easier, something that allows reuse, keeping the maintenance low, and that we could incorporate into our CI/CD system.

So after looking for a project or tooling that could fit our needs, we came up empty. At InVision, we love to create tools to help us solve problems, and figuring we may not be the only team in this situation we decided to roll up our sleeves and created something of our own. The result is our open-source tool, kit! (short for Kubernetes + git)

Hello kit!

kit is a suite of components that, when plugged into your CI/CD system and source control, allows you to continuously deploy updates (or entirely new services!) to as many clusters as needed, all leveraging webhooks and without having to host an external service.

Using kit’s templating format, you can define your service files once and have them reused across multiple clusters. It works by building on top of your usual Kubernetes manifest files allowing them to be defined once and then reused across clusters by only defining the unique configuration needed for that specific cluster. This allows you to easily build the orchestration for your application and deploy it to as many clusters as needed. It also allows the ability to group variations of your application so you could have clusters that run the “development” version of your application while others run the “production” version and so on.

Developers simply commit code to their branches as normal and kit deploys to all clusters running that service. Kit then manages updating the image and tag that is used for a given service directly to the repository containing all your kit manifest templates. This means any and all changes to your clusters, from environment variables, or configurations to image updates are all tracked under source control history providing you with an audit trail for every cluster you have.

We made all of this Open Source so you can check out the kit repo!

Is kit Right For Us?

If you are running Kubernetes across several clusters (or namespaces) all needing to continuously deploy, you bet! Because using kit doesn’t require hosting any external server, your team can leverage the webhooks you probably already have with github and your CI/CD system to get started. From there you create a repo to host your Kubernetes manifest files which tells what services are deployed to which clusters. Complexity of these files is greatly simplified thanks to kit’s templating engine.The kit-image-deployer component is incorporated into the CI/CD process and whenever a developer commits code to master and the build passes, it’s automatically deployed to all configured clusters.

So What Are The Components?

kit is comprised of several components each building on the next. The general flow is a developer commits code to their repository, an image is built and then kit-image-deployer commits the new image and tag to your manifests repository. From there the kit-deploymentizer runs, parsing all your manifest templates to generate the raw Kubernetes manifest files. Finally the kit-deployer runs and takes all the built manifest files and deploys them to all the appropriate clusters. Here is a summary of the components and the flow:

A service that can be used to update given yaml files within a git repository with a new Docker image path. This can be used in collaboration with kit-deploymentizer and kit-deployer to automatically update the images used for a service across multiple clusters.

This service intelligently builds deployment files as to allow reusability of environment variables and other forms of configuration. It also supports aggregating these deployments for multiple clusters. In the end, it generates a list of clusters and a list of deployment files for each of these clusters. Best used in collaboration with kit-deployer and kit-image-deployer to achieve a continuous deployment workflow.

Use this service to deploy files to multiple Kubernetes clusters. Just organize your manifest files into directories that match the names of your clusters (the name defined in your kubeconfig files). Then you provide a directory of kubeconfig files and the kit-deployer will asynchronously send all manifests up to their corresponding clusters.

So What’s Next?

In the near future, we want to make deployments even smarter so as to handle updating things like mongo replicasets. We also want to add in smart monitoring to further improve on the self-healing nature of Kubernetes. We’re also working on adding additional integrations (such as Slack) and notification methods. And most importantly we’re working towards shifting more control to the individual developers of each service by allowing the kit manifest templates to exist in each individual service repository instead of a single master manifest repository. This will allow them to manage their service completely from development straight to production across all clusters.

We hope you take a closer look at kit and tell us what you think! Check out our InVision Engineering blog for more posts about the cool things we are up to at InVision. If you want to work on kit or other interesting things like this, click through to our jobs page. We’d love to hear from you!

–Chesley Brown, Full-Stack Engineer, at InVision.

Cloud Native Application Interfaces

September 01 2016

Standard Interfaces (or, the Thirteenth Factor)

–by Brian Grant and Craig Mcluckie, Google

When you say we need ‘software standards’ in erudite company, you get some interesting looks. Most concede that software standards have been central to the success of the boldest and most successful projects out there (like the Internet). Most are also skeptical about how they apply to the innovative world we live in today. Our projects are executed in week increments, not years. Getting bogged down behind mega-software-corporation-driven standards practices would be the death knell in this fluid, highly competitive world.

This isn’t about ‘those’ standards. The ones that emerge after years of deep consideration and negotiation that are eventually published by a body with a four-letter acronym for a name. This is about a different approach: finding what is working in the real world, and acting as a community to embrace it.

Let’s go back to first principles. To describe Cloud Native in one word, we’d choose “automatable”.

Most existing applications are not. 

Applications have many interfaces with their environment, whether with management infrastructure, shared services, or other applications. For us to remove the operator from patching, scaling, migrating an app from one environment to another, changing out dependencies, and handling of failure conditions, a set of well structured common interfaces is essential. It goes without saying that these interfaces must be designed for machines, not just humans. Machine-friendly interfaces allow automation systems to understand the systems under management, and create the loose coupling needed for applications to live in automated environments. 

As containerized infrastructure gets built there are a set of critical interfaces available to applications that go far beyond what is available to a single node today. The adoption of ‘serverless patterns’ (meaning ephemeral, event driven function execution) will further compound the need to make sense of running code in an environment that is completely decoupled from the node. The services needed will start with application configuration and extend to monitoring, logging, autoscaling and beyond. The set of capabilities will only grow as applications continue to adapt to be fuller citizens in a “cloud native” world.

Exploring one example a little further, a number of service-discovery solutions have been developed but are often tied to a particular storage implementation, a particular programming language, a non-standard protocol, and/or are opinionated in some other way (e.g., dictating application naming structure). This makes them unsuitable for general-purpose use. While DNS has limitations (that will eventually need to be addressed), it’s at least a standard protocol with room for innovation in its implementation. This is demonstrated by CoreDNS and other cloud-native DNS implementations. 

When we look inside the systems at Google, we have been able to achieve very high levels of automation without formal interface definitions thanks to a largely homogeneous software and hardware environment. Adjacent systems can safely make assumptions about interfaces, and by providing a set of universally used libraries we can skirt the issue. A good example of this is our log format doesn’t need to be formally specified because the libraries that generate logs are maintained by the teams that maintain the logs processing systems. This means that we have been able to get by to date without something like fluentd (which is solving the problem in the community of interfacing with logging systems).

Even though Google has managed to get by this way, it hurts us. One way is when we acquire a company. Porting their technology to run in our automation systems requires a spectacular amount of work. Doing that work while continuing to innovate is particularly tough. Even more significant though, there’s a lot of innovation happening in the open source world that isn’t easy for us to tap into. When new technology emerges, we would like to be able to experiment with it, adopt it piecemeal, and perhaps contribute back to it. When you run a vertically integrated, bespoke stack, that is a hard thing to do.

The lack of standard interfaces leaves customers with three choices: 

  • Live with high operations cost (the status quo), and accept that your developers in many cases will spend the majority of their time dealing with the care and feeding of applications.
  • Sign-up to be like Google (build your own everything, down to the concrete in the floor). 
  • Rely on a single, or a small collection of vendors to provide a complete solution and accept some degree of lock-in. Few in companies of any size (from enterprise to startup) find this appealing. It is our belief that an open community is more powerful and that customers benefit when there is competition at every layer of the stack. It should be possible to pull together a stack with best-of-breed capabilities at every level – logging, monitoring, orchestration, container runtime environment, block and file-system storage, SDN technology, etc. 

Standardizing interfaces (at least by convention) between the management system and applications is critical. One might consider the use of common conventions for interfaces as a thirteenth factor (expanding on the 12-factor methodology) in creating modern systems that work well in the cloud and at scale.

Kubernetes and Cloud Native Computing Foundation (CNCF) represent a great opportunity to support the emergence of standard interfaces, and to support the emergence of a fully automated software world. We’d love to see this community embrace the ideal of promoting standard interfaces from working technology. The obvious first step is to identify the immediate set of critical interfaces, and establish working groups in CNCF to start assess what exists in this area as candidates, and to sponsor work to start developing standard interfaces that work across container formats, orchestrators, developer tools and the myriad other systems that are needed to deliver on the Cloud Native vision.

–Brian Grant and Craig Mcluckie, Google

Security Best Practices for Kubernetes Deployment

August 31 2016

Editor’s note: today’s post is by Amir Jerbi and Michael Cherny of Aqua Security, describing security best practices for Kubernetes deployments, based on data they’ve collected from various use-cases seen in both on-premises and cloud deployments.

Kubernetes provides many controls that can greatly improve your application security. Configuring them requires intimate knowledge with Kubernetes and the deployment’s security requirements. The best practices we highlight here are aligned to the container lifecycle: build, ship and run, and are specifically tailored to Kubernetes deployments. We adopted these best practices in our own SaaS deployment that runs Kubernetes on Google Cloud Platform.

The following are our recommendations for deploying a secured Kubernetes application:

**Ensure That Images Are Free of Vulnerabilities **
Having running containers with vulnerabilities opens your environment to the risk of being easily compromised. Many of the attacks can be mitigated simply by making sure that there are no software components that have known vulnerabilities.

  • Implement Continuous Security Vulnerability Scanning – Containers might include outdated packages with known vulnerabilities (CVEs). This cannot be a ‘one off’ process, as new vulnerabilities are published every day. An ongoing process, where images are continuously assessed, is crucial to insure a required security posture.

  • Regularly Apply Security Updates to Your Environment – Once vulnerabilities are found in running containers, you should always update the source image and redeploy the containers. Try to avoid direct updates (e.g. ‘apt-update’) to the running containers, as this can break the image-container relationship. Upgrading containers is extremely easy with the Kubernetes rolling updates feature - this allows gradually updating a running application by upgrading its images to the latest version.

Ensure That Only Authorized Images are Used in Your Environment
Without a process that ensures that only images adhering to the organization’s policy are allowed to run, the organization is open to risk of running vulnerable or even malicious containers. Downloading and running images from unknown sources is dangerous. It is equivalent to running software from an unknown vendor on a production server. Don’t do that.

Use private registries to store your approved images - make sure you only push approved images to these registries. This alone already narrows the playing field, reducing the number of potential images that enter your pipeline to a fraction of the hundreds of thousands of publicly available images. Build a CI pipeline that integrates security assessment (like vulnerability scanning), making it part of the build process.

The CI pipeline should ensure that only vetted code (approved for production) is used for building the images. Once an image is built, it should be scanned for security vulnerabilities, and only if no issues are found then the image would be pushed to a private registry, from which deployment to production is done. A failure in the security assessment should create a failure in the pipeline, preventing images with bad security quality from being pushed to the image registry.

There is work in progress being done in Kubernetes for image authorization plugins (expected in Kubernetes 1.4), which will allow preventing the shipping of unauthorized images. For more info see this pull request.

Limit Direct Access to Kubernetes Nodes
You should limit SSH access to Kubernetes nodes, reducing the risk for unauthorized access to host resource. Instead you should ask users to use “kubectl exec”, which will provide direct access to the container environment without the ability to access the host.

You can use Kubernetes Authorization Plugins to further control user access to resources. This allows defining fine-grained-access control rules for specific namespace, containers and operations.

Create Administrative Boundaries between Resources
Limiting the scope of user permissions can reduce the impact of mistakes or malicious activities. A Kubernetes namespace allows you to partition created resources into logically named groups. Resources created in one namespace can be hidden from other namespaces. By default, each resource created by a user in Kubernetes cluster runs in a default namespace, called default. You can create additional namespaces and attach resources and users to them. You can use Kubernetes Authorization plugins to create policies that segregate access to namespace resources between different users.

For example: the following policy will allow ‘alice’ to read pods from namespace ‘fronto’.


  "apiVersion": "abac.authorization.kubernetes.io/v1beta1",

  "kind": "Policy",

  "spec": {

    "user": "alice",

    "namespace": "fronto",

    "resource": "pods",

    "readonly": true



Define Resource Quota
An option of running resource-unbound containers puts your system in risk of DoS or “noisy neighbor” scenarios. To prevent and minimize those risks you should define resource quotas. By default, all resources in Kubernetes cluster are created with unbounded CPU and memory requests/limits. You can create resource quota policies, attached to Kubernetes namespace, in order to limit the CPU and memory a pod is allowed to consume.

The following is an example for namespace resource quota definition that will limit number of pods in the namespace to 4, limiting their CPU requests between 1 and 2 and memory requests between 1GB to 2GB.


apiVersion: v1  
kind: ResourceQuota  
  name: compute-resources  
    pods: "4"  
    requests.cpu: "1"  
    requests.memory: 1Gi  
    limits.cpu: "2"  
    limits.memory: 2Gi

Assign a resource quota to namespace:

kubectl create -f ./compute-resources.yaml --namespace=myspace

Implement Network Segmentation

Running different applications on the same Kubernetes cluster creates a risk of one compromised application attacking a neighboring application. Network segmentation is important to ensure that containers can communicate only with those they are supposed to.

One of the challenges in Kubernetes deployments is creating network segmentation between pods, services and containers. This is a challenge due to the “dynamic” nature of container network identities (IPs), along with the fact that containers can communicate both inside the same node or between nodes.

Users of Google Cloud Platform can benefit from automatic firewall rules, preventing cross-cluster communication. A similar implementation can be deployed on-premises using network firewalls or SDN solutions. There is work being done in this area by the Kubernetes Network SIG, which will greatly improve the pod-to-pod communication policies. A new network policy API should address the need to create firewall rules around pods, limiting the network access that a containerized can have.

The following is an example of a network policy that controls the network for “backend” pods, only allowing inbound network access from “frontend” pods:

POST /apis/net.alpha.kubernetes.io/v1alpha1/namespaces/tenant-a/networkpolicys  
  "kind": "NetworkPolicy",

  "metadata": {

    "name": "pol1"


  "spec": {

    "allowIncoming": {

      "from": [{

        "pods": { "segment": "frontend" }


      "toPorts": [{

        "port": 80,

        "protocol": "TCP"



    "podSelector": {

      "segment": "backend"




Read more about Network policies here.

Apply Security Context to Your Pods and Containers

When designing your containers and pods, make sure that you configure the security context for your pods, containers and volumes. A security context is a property defined in the deployment yaml. It controls the security parameters that will be assigned to the pod/container/volume. Some of the important parameters are:

Security Context Setting Description
SecurityContext->runAsNonRoot Indicates that containers should run as non-root user
SecurityContext->Capabilities Controls the Linux capabilities assigned to the container.
SecurityContext->readOnlyRootFilesystem Controls whether a container will be able to write into the root filesystem.
PodSecurityContext->runAsNonRoot Prevents running a container with ‘root’ user as part of the pod

The following is an example for pod definition with security context parameters:

apiVersion: v1  
kind: Pod  
  name: hello-world  
  # specification of the pod’s containers  
  # ...  
    readOnlyRootFilesystem: true  
    runAsNonRoot: true

Reference here.

In case you are running containers with elevated privileges (–privileged) you should consider using the “DenyEscalatingExec” admission control. This control denies exec and attach commands to pods that run with escalated privileges that allow host access. This includes pods that run as privileged, have access to the host IPC namespace, and have access to the host PID namespace. For more details on admission controls, see the Kubernetes documentation.

Log Everything

Kubernetes supplies cluster-based logging, allowing to log container activity into a central log hub. When a cluster is created, the standard output and standard error output of each container can be ingested using a Fluentd agent running on each node into either Google Stackdriver Logging or into Elasticsearch and viewed with Kibana.


Kubernetes supplies many options to create a secured deployment. There is no one-size-fit-all solution that can be used everywhere, so a certain degree of familiarity with these options is required, as well as an understanding of how they can enhance your application’s security.

We recommend implementing the best practices that were highlighted in this blog, and use Kubernetes flexible configuration capabilities to incorporate security processes into the continuous integration pipeline, automating the entire process with security seamlessly “baked in”.

–Michael Cherny, Head of Security Research, and Amir Jerbi, CTO and co-founder Aqua Security

@Kubernetesio View on Github #kubernetes-users Stack Overflow Download Kubernetes