Kubernetes Blog

Kubernetes Performance Measurements and Roadmap

September 10 2015

No matter how flexible and reliable your container orchestration system is, ultimately, you have some work to be done, and you want it completed quickly. For big problems, a common answer is to just throw more machines at the problem. After all, more compute = faster, right?

Interestingly, adding more nodes is a little like the tyranny of the rocket equation - in some systems, adding more machines can actually make your processing slower. However, unlike the rocket equation, we can do better. Kubernetes in v1.0 version supports clusters with up to 100 nodes. However, we have a goal to 10x the number of nodes we will support by the end of 2015. This blog post will cover where we are and how we intend to achieve the next level of performance.

What do we measure?

The first question we need to answer is: “what does it mean that Kubernetes can manage an N-node cluster?” Users expect that it will handle all operations “reasonably quickly,” but we need a precise definition of that. We decided to define performance and scalability goals based on the following two metrics:

  1. 1.“API-responsiveness”: 99% of all our API calls return in less than 1 second

  2. 2.“Pod startup time”: 99% of pods (with pre-pulled images) start within 5 seconds

Note that for “pod startup time” we explicitly assume that all images necessary to run a pod are already pre-pulled on the machine where it will be running. In our experiments, there is a high degree of variability (network throughput, size of image, etc) between images, and these variations have little to do with Kubernetes’ overall performance.

The decision to choose those metrics was made based on our experience spinning up 2 billion containers a week at Google. We explicitly want to measure the latency of user-facing flows since that’s what customers will actually care about.

How do we measure?

To monitor performance improvements and detect regressions we set up a continuous testing infrastructure. Every 2-3 hours we create a 100-node cluster from HEAD and run our scalability tests on it. We use a GCE n1-standard-4 (4 cores, 15GB of RAM) machine as a master and n1-standard-1 (1 core, 3.75GB of RAM) machines for nodes.

In scalability tests, we explicitly focus only on the full-cluster case (full N-node cluster is a cluster with 30 * N pods running in it) which is the most demanding scenario from a performance point of view. To reproduce what a customer might actually do, we run through the following steps:

  • Populate pods and replication controllers to fill the cluster

  • Generate some load (create/delete additional pods and/or replication controllers, scale the existing ones, etc.) and record performance metrics

  • Stop all running pods and replication controllers

  • Scrape the metrics and check whether they match our expectations

It is worth emphasizing that the main parts of the test are done on full clusters (30 pods per node, 100 nodes) - starting a pod in an empty cluster, even if it has 100 nodes will be much faster.

To measure pod startup latency we are using very simple pods with just a single container running the “gcr.io/google_containers/pause:go” image, which starts and then sleeps forever. The container is guaranteed to be already pre-pulled on nodes (we use it as the so-called pod-infra-container).

Performance data

The following table contains percentiles (50th, 90th and 99th) of pod startup time in 100-node clusters which are 10%, 25%, 50% and 100% full.

  10%-full 25%-full 50%-full 100%-full
50th percentile .90s 1.08s 1.33s 1.94s
90th percentile 1.29s 1.49s 1.72s 2.50s
99th percentile 1.59s 1.86s 2.56s 4.32s

As for api-responsiveness, the following graphs present 50th, 90th and 99th percentiles of latencies of API calls grouped by kind of operation and resource type. However, note that this also includes internal system API calls, not just those issued by users (in this case issued by the test itself).

get.pngput.png

delete.pngpost.png

list.png

Some resources only appear on certain graphs, based on what was running during that operation (e.g. no namespace was put at that time).

As you can see in the results, we are ahead of target for our 100-node cluster with pod startup time even in a fully-packed cluster occurring 14% faster in the 99th percentile than 5 seconds. It’s interesting to point out that LISTing pods is significantly slower than any other operation. This makes sense: in a full cluster there are 3000 pods and each of pod is roughly few kilobytes of data, meaning megabytes of data that need to processed for each LIST.

#####Work done and some future plans

The initial performance work to make 100-node clusters stable enough to run any tests on them involved a lot of small fixes and tuning, including increasing the limit for file descriptors in the apiserver and reusing tcp connections between different requests to etcd.

However, building a stable performance test was just step one to increasing the number of nodes our cluster supports by tenfold. As a result of this work, we have already taken on significant effort to remove future bottlenecks, including:

  • Rewriting controllers to be watch-based: Previously they were relisting objects of a given type every few seconds, which generated a huge load on the apiserver.

  • Using code generators to produce conversions and deep-copy functions: Although the default implementation using Go reflections are very convenient, they proved to be extremely slow, as much as 10X in comparison to the generated code.

  • Adding a cache to apiserver to avoid deserialization of the same data read from etcd multiple times

  • Reducing frequency of updating statuses: Given the slow changing nature of statutes, it only makes sense to update pod status only on change and node status only every 10 seconds.

  • Implemented watch at the apiserver instead of redirecting the requests to etcd: We would prefer to avoid watching for the same data from etcd multiple times, since, in many cases, it was filtered out in apiserver anyway.

Looking further out to our 1000-node cluster goal, proposed improvements include:

  • Moving events out from etcd: They are more like system logs and are neither part of system state nor are crucial for Kubernetes to work correctly.

  • Using better json parsers: The default parser implemented in Go is very slow as it is based on reflection.

  • Rewriting the scheduler to make it more efficient and concurrent

  • Improving efficiency of communication between apiserver and Kubelets: In particular, we plan to reduce the size of data being sent on every update of node status.

This is by no means an exhaustive list. We will be adding new elements (or removing existing ones) based on the observed bottlenecks while running the existing scalability tests and newly-created ones. If there are particular use cases or scenarios that you’d like to see us address, please join in!

  • We have weekly meetings for our Kubernetes Scale Special Interest Group on Thursdays 11am PST where we discuss ongoing issues and plans for performance tracking and improvements.
  • If you have specific performance or scalability questions before then, please join our scalability special interest group on Slack: https://kubernetes.slack.com/messages/sig-scale
  • General questions? Feel free to join our Kubernetes community on Slack: https://kubernetes.slack.com/messages/kubernetes-users/
  • Submit a pull request or file an issue! You can do this in our GitHub repository. Everyone is also enthusiastically encouraged to contribute with their own experiments (and their result) or PR contributions improving Kubernetes. - Wojciech Tyczynski, Google Software Engineer

Using Kubernetes Namespaces to Manage Environments

August 28 2015

One of the advantages that Kubernetes provides is the ability to manage various environments easier and better than traditional deployment strategies. For most nontrivial applications, you have test, staging, and production environments. You can spin up a separate cluster of resources, such as VMs, with the same configuration in staging and production, but that can be costly and managing the differences between the environments can be difficult.
Kubernetes includes a cool feature called [namespaces][4], which enable you to manage different environments within the same cluster. For example, you can have different test and staging environments in the same cluster of machines, potentially saving resources. You can also run different types of server, batch, or other jobs in the same cluster without worrying about them affecting each other.

The Default Namespace

Specifying the namespace is optional in Kubernetes because by default Kubernetes uses the “default” namespace. If you’ve just created a cluster, you can check that the default namespace exists using this command:

$ kubectl get namespaces
NAME          LABELS    STATUS
default                  Active
kube-system              Active

Here you can see that the default namespace exists and is active. The status of the namespace is used later when turning down and deleting the namespace.

Creating a New Namespace

You create a namespace in the same way you would any other resource. Create a my-namespace.yaml file and add these contents:

kind: Namespace  
apiVersion: v1  
metadata:  
 name: my-namespace  
 labels:  
   name: my-namespace  

Then you can run this command to create it:

$ kubectl create -f my-namespace.yaml

Service Names

With namespaces you can have your apps point to static service endpoints that don’t change based on the environment. For instance, your MySQL database service could be named mysql in production and staging even though it runs on the same infrastructure.

This works because each of the resources in the cluster will by default only “see” the other resources in the same namespace. This means that you can avoid naming collisions by creating pods, services, and replication controllers with the same names provided they are in separate namespaces. Within a namespace, short DNS names of services resolve to the IP of the service within that namespace. So for example, you might have an Elasticsearch service that can be accessed via the DNS name elasticsearch as long as the containers accessing it are located in the same namespace.

You can still access services in other namespaces by looking it up via the full DNS name which takes the form of SERVICE-NAME.NAMESPACE-NAME. So for example, elasticsearch.prod or elasticsearch.canary for the production and canary environments respectively.

An Example

Lets look at an example application. Let’s say you want to deploy your music store service MyTunes in Kubernetes. You can run the application production and staging environment as well as some one-off apps running in the same cluster. You can get a better idea of what’s going on by running some commands:

~$ kubectl get namespaces  
NAME                    LABELS    STATUS  
default                     Active  
mytunes-prod                Active  
mytunes-staging             Active  
my-other-app                Active  

Here you can see a few namespaces running. Next let’s list the services in staging:

~$ kubectl get services --namespace=mytunes-staging
NAME          LABELS                    SELECTOR        IP(S)             PORT(S)  
mytunes       name=mytunes,version=1    name=mytunes    10.43.250.14      80/TCP  
                                                        104.185.824.125     
mysql         name=mysql                name=mysql      10.43.250.63      3306/TCP  

Next check production:

~$ kubectl get services --namespace=mytunes-prod  
NAME          LABELS                    SELECTOR        IP(S)             PORT(S)  
mytunes       name=mytunes,version=1    name=mytunes    10.43.241.145     80/TCP  
                                                        104.199.132.213     
mysql         name=mysql                name=mysql      10.43.245.77      3306/TCP  

Notice that the IP addresses are different depending on which namespace is used even though the names of the services themselves are the same. This capability makes configuring your app extremely easy—since you only have to point your app at the service name—and has the potential to allow you to configure your app exactly the same in your staging or test environments as you do in production.

Caveats

While you can run staging and production environments in the same cluster and save resources and money by doing so, you will need to be careful to set up resource limits so that your staging environment doesn’t starve production for CPU, memory, or disk resources. Setting resource limits properly, and testing that they are working takes a lot of time and effort so unless you can measurably save money by running production in the same cluster as staging or test, you may not really want to do that.

Whether or not you run staging and production in the same cluster, namespaces are a great way to partition different apps within the same cluster. Namespaces will also serve as a level where you can apply resource limits so look for more resource management features at the namespace level in the future.

- Posted by Ian Lewis, Developer Advocate at Google

Weekly Kubernetes Community Hangout Notes - July 31 2015

August 04 2015

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who’s interested to know what’s discussed in this forum.

Here are the notes from today’s meeting:

  • Private Registry Demo - Muhammed

    • Run docker-registry as an RC/Pod/Service

    • Run a proxy on every node

    • Access as localhost:5000

    • Discussion:

      • Should we back it by GCS or S3 when possible?

      • Run real registry backed by $object_store on each node

      • DNS instead of localhost?

        • disassemble image strings?

        • more like DNS policy?

  • Running Large Clusters - Joe

    • Samsung keen to see large scale O(1000)

      • Starting on AWS
    • RH also interested - test plan needed

    • Plan for next week: discuss working-groups

    • If you are interested in joining conversation on cluster scalability send mail to [joe@0xBEDA.com][4]

  • Resource API Proposal - Clayton

    • New stuff wants more info on resources

    • Proposal for resources API - ask apiserver for info on pods

    • Send feedback to: #11951

    • Discussion on snapshot vs time-series vs aggregates

  • Containerized kubelet - Clayton

    • Open pull

    • Docker mount propagation - RH carries patches

    • Big issues around whole bootstrap of the system

      • dual: boot-docker/system-docker
    • Kube-in-docker is really nice, but maybe not critical

      • Do the small stuff to make progress

      • Keep pressure on docker

  • Web UI (preilly)

    • Where does web UI stand?

      • OK to split it back out

      • Use it as a container image

      • Build image as part of kube release process

      • Vendor it back in? Maybe, maybe not.

    • Will DNS be split out?

      • Probably more tightly integrated, instead
    • Other potential spin-outs:

      • apiserver

      • clients

The Growing Kubernetes Ecosystem

July 24 2015

Over the past year, we’ve seen fantastic momentum in the Kubernetes project, culminating with the release of Kubernetes v1 earlier this week. We’ve also witnessed the ecosystem around Kubernetes blossom, and wanted to draw attention to some of the cooler offerings we’ve seen.

CloudBees and the Jenkins community have created a Kubernetes plugin, allowing Jenkins slaves to be built as Docker images and run in Docker hosts managed by Kubernetes, either on the Google Cloud Platform or on a more local Kubernetes instance. These elastic slaves are then brought online as Jenkins schedules jobs for them and destroyed after their builds are complete, ensuring masters have steady access to clean workspaces and minimizing builds’ resource footprint.

CoreOS has created launched Tectonic, an opinionated enterprise distribution of Kubernetes, CoreOS and Docker. Tectonic includes a management console for workflows and dashboards, an integrated registry to build and share containers, and additional tools to automate deployment and customize rolling updates. At KuberCon, CoreOS launched Tectonic Preview, giving users easy access to Kubernetes 1.0, 24x7 enterprise ready support, Kubernetes guides and Kubernetes training to help enterprises begin experiencing the power of Kubernetes, CoreOS and Docker.

Hitachi Data Systems has announced that Kubernetes now joins the list of solutions validated to run on their enterprise Unified Computing Platform. With this announcement Hitachi has validated Kubernetes and VMware running side-by-side on the UCP platform, providing an enterprise solution for container-based applications and traditional virtualized workloads.

Kismatic is providing enterprise support for pure play open source Kubernetes. They have announced open source and commercially supported Kubernetes plug-ins specifically built for production-grade enterprise environments. Any Kubernetes deployment can now benefit from modular role-based access controls (RBAC), Kerberos for bedrock authentication, LDAP/AD integration, rich auditing and platform-agnostic Linux distro packages.

Meteor Development Group, creators of Meteor, a JavaScript App Platform, are using Kubernetes to build Galaxy to run Meteor apps in production. Galaxy will scale from free test apps to production-suitable high-availability hosting.

Mesosphere has incorporated Kubernetes into its Data Center Operating System (DCOS) platform as a first class citizen. Using DCOS, enterprises can deploy Kubernetes across thousands of nodes, both bare-metal and virtualized machines that can run on-premise and in the cloud. Mesosphere also launched a beta of their Kubernetes Training Bootcamp and will be offering more in the future.

Mirantis is enabling hybrid cloud applications across OpenStack and other clouds supporting Kubernetes. An OpenStack Murano app package supports full application lifecycle actions such as deploy, create cluster, create pod, add containers to pods, scale up and scale down.

OpenContrail is creating a kubernetes-contrail plugin designed to stitch the cluster management capabilities of Kubernetes with the network service automation capabilities of OpenContrail. Given the event-driven abstractions of pods and services inherent in Kubernetes, it is a simple extension to address network service enforcement by leveraging OpenContrail’s Virtual Network policy approach and programmatic API’s.

logo.png

Pachyderm is a containerized data analytics engine which provides the broad functionality of Hadoop with the ease of use of Docker. Users simply provide containers with their data analysis logic and Pachyderm will distribute that computation over the data. They have just released full deployment on Kubernetes for on premise deployments, and on Google Container Engine, eliminating all the operational overhead of running a cluster yourself.

Platalytics, Inc. and announced the release of one-touch deploy-anywhere feature for its Spark Application Platform. Based on Kubernetes, Docker, and CoreOS, it allows simple and automated deployment of Apache Hadoop, Spark, and Platalytics platform, with a single click, to all major public clouds, including Google, Amazon, Azure, Digital Ocean, and private on-premise clouds. It also enables hybrid cloud scenarios, where resources on public and private clouds can be mixed.

Rackspace has created Corekube as a simple, quick way to deploy Kubernetes on OpenStack. By using a decoupled infrastructure that is coordinated by etcd, fleet and flannel, it enables users to try Kubernetes and CoreOS without all the fuss of setting things up by hand.

Red Hat is a long time proponent of Kubernetes, and a significant contributor to the project. In their own words, “From Red Hat Enterprise Linux 7 and Red Hat Enterprise Linux Atomic Host to OpenShift Enterprise 3 and the forthcoming Red Hat Atomic Enterprise Platform, we are well-suited to bring container innovations into the enterprise, leveraging Kubernetes as the common backbone for orchestration.”

Redapt has launching a variety of turnkey, on-premises Kubernetes solutions co-engineered with other partners in the Kubernetes partner ecosystem. These include appliances built to leverage the CoreOS/Tectonic, Mirantis OpenStack, and Mesosphere platforms for management and provisioning. Redapt also offers private, public, and multi-cloud solutions that help customers accelerate their Kubernetes deployments successfully into production.

We’ve also seen a community of services partners spring up to assist in adopting Kubernetes and containers:

Screen Shot 2015-07-21 at 1.12.16 PM.png

Biarca is using Kubernetes to ease application deployment and scale on demand across available hybrid and multi-cloud clusters through strategically managed policy. A video on their website illustrates how to use Kubernetes to deploy applications in a private cloud infrastructure based on OpenStack and use a public cloud like GCE to address bursting demand for applications.

Cloud Technology Partners has developed a Container Services Offering featuring Kubernetes to assist enterprises with container best practices, adoption and implementation. This offering helps organizations understand how containers deliver competitive edge.

DoIT International is offering a Kubernetes Bootcamp which consists of a series of hands-on exercises interleaved with mini-lectures covering hands on topics such as Container Basics, Using Docker, Kubernetes and Google Container Engine.

OpenCredo provides a practical, lab style container and scheduler course in addition to consulting and solution delivery. The three-day course allows development teams to quickly ramp up and make effective use of containers in real world scenarios, covering containers in general along with Docker and Kubernetes.

Pythian focuses on helping clients design, implement, and manage systems that directly contribute to revenue and business success. They provide small, dedicated teams of highly trained and experienced data experts have the deep Kubernetes and container experience necessary to help companies solve Big Data problems with containers.

- Martin Buhr, Product Manager at Google

Weekly Kubernetes Community Hangout Notes - July 17 2015

July 23 2015

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who’s interested to know what’s discussed in this forum.

Here are the notes from today’s meeting:

- Eric Paris: replacing salt with ansible (if we want)

- In contrib, there is a provisioning tool written in ansible - The goal in the rewrite was to eliminate as much of the cloud provider stuff as possible - The salt setup does a bunch of setup in scripts and then the environment is setup with salt

- This means that things like generating certs is done differently on GCE/AWS/Vagrant   - For ansible, everything must be done within ansible   - Background on ansible

- Does not have clients
- Provisioner ssh into the machine and runs scripts on the machine
- You define what you want your cluster to look like, run the script, and it sets up everything at once
- If you make one change in a config file, ansible re-runs everything (which isn’t always desirable)
- Uses a jinja2 template   - Create machines with minimal software, then use ansible to get that machine into a runnable state

- Sets up all of the add-ons   - Eliminates the provisioner shell scripts   - Full cluster setup currently takes about 6 minutes

- CentOS with some packages
- Redeploy to the cluster takes 25 seconds   - Questions for Eric

- Where does the provider-specific configuration go?

  - The only network setup that the ansible config does is flannel; you can turn it off
- What about init vs. systemd?

  - Should be able to support in the code w/o any trouble (not yet implemented)   - Discussion

- Why not push the setup work into containers or kubernetes config?

  - To bootstrap a cluster drop a kubelet and a manifest
- Running a kubelet and configuring the network should be the only things required. We can cut a machine image that is preconfigured minus the data package (certs, etc)

  - The ansible scripts install kubelet & docker if they aren’t already installed
- Each OS (RedHat, Debian, Ubuntu) could have a different image. We could view this as part of the build process instead of the install process.
- There needs to be solution for bare metal as well.
- In favor of the overall goal -- reducing the special configuration in the salt configuration
- Everything except the kubelet should run inside a container (eventually the kubelet should as well)

  - Running in a container doesn’t cut down on the complexity that we currently have
  - But it does more clearly define the interface about what the code expects
- These tools (Chef, Puppet, Ansible) conflate binary distribution with configuration

  - Containers more clearly separate these problems
- The mesos deployment is not completely automated yet, but the mesos deployment is completely different: kubelets get put on top on an existing mesos cluster

  - The bash scripts allow the mesos devs to see what each cloud provider is doing and re-use the relevant bits
  - There was a large reverse engineering curve, but the bash is at least readable as opposed to the salt
- Openstack uses a different deployment as well
- We need a well documented list of steps (e.g. create certs) that are necessary to stand up a cluster

  - This would allow us to compare across cloud providers
  - We should reduce the number of steps as much as possible
  - Ansible has 241 steps to launch a cluster - 1.0 Code freeze

- How are we getting out of code freeze? - This is a topic for next week, but the preview is that we will move slowly rather than totally opening the firehose

- We want to clear the backlog as fast as possible while maintaining stability both on HEAD and on the 1.0 branch
- The backlog of almost 300 PRs but there are also various parallel feature branches that have been developed during the freeze   - Cutting a cherry pick release today (1.0.1) that fixes a few issues   - Next week we will discuss the cadence for patch releases

Strong, Simple SSL for Kubernetes Services

July 14 2015

Hi, I’m Evan Brown (@evandbrown) and I work on the solutions architecture team for Google Cloud Platform. I recently wrote an article and tutorial about using Jenkins on Kubernetes to automate the Docker and GCE image build process. Today I’m going to discuss how I used Kubernetes services and secrets to add SSL to the Jenkins web UI. After reading this, you’ll be able to add SSL termination (and HTTP->HTTPS redirects + basic auth) to your public HTTP Kubernetes services.

In the beginning

In the spirit of minimum viability, the first version of Jenkins-on-Kubernetes I built was very basic but functional:

  • The Jenkins leader was just a single container in one pod, but it was managed by a replication controller, so if it failed it would automatically respawn.
  • The Jenkins leader exposes two ports - TCP 8080 for the web UI and TCP 50000 for build agents to register - and those ports are made available as a Kubernetes service with a public load balancer.

Here’s a visual of that first version:

This works, but I have a few problems with it. First, authentication isn’t configured in a default Jenkins installation. The leader is sitting on the public Internet, accessible to anyone, until you connect and configure authentication. And since there’s no encryption, configuring authentication is kind of a symbolic gesture. We need SSL, and we need it now!

Do what you know

For a few milliseconds I considered trying to get SSL working directly on Jenkins. I’d never done it before, and I caught myself wondering if it would be as straightforward as working with SSL on Nginx, something I do have experience with. I’m all for learning new things, but this seemed like a great place to not invent a new wheel: SSL on Nginx is straightforward and well documented (as are its reverse-proxy capabilities), and Kubernetes is all about building functionality by orchestrating and composing containers. Let’s use Nginx, and add a few bonus features that Nginx makes simple: HTTP->HTTPS redirection, and basic access authentication.

SSL termination proxy as an nginx service

I started by putting together a Dockerfile that inherited from the standard nginx image, copied a few Nginx config files, and added a custom entrypoint (start.sh). The entrypoint script checks an environment variable (ENABLE_SSL) and activates the correct Nginx config accordingly (meaning that unencrypted HTTP reverse proxy is possible, but that defeats the purpose). The script also configures basic access authentication if it’s enabled (the ENABLE_BASIC_AUTH env var).

Finally, start.sh evaluates the SERVICE_HOST_ENV_NAME and SERVICE_PORT_ENV_NAME env vars. These variables should be set to the names of the environment variables for the Kubernetes service you want to proxy to. In this example, the service for our Jenkins leader is cleverly named jenkins, which means pods in the cluster will see an environment variable named JENKINS_SERVICE_HOST and JENKINS_SERVICE_PORT_UI (the port that 8080 is mapped to on the Jenkins leader). SERVICE_HOST_ENV_NAME and SERVICE_PORT_ENV_NAME simply reference the correct service to use for a particular scenario, allowing the image to be used generically across deployments.

Defining the Controller and Service

LIke every other pod in this example, we’ll deploy Nginx with a replication controller, allowing us to scale out or in, and recover automatically from container failures. This excerpt from acomplete descriptor in the sample app shows some relevant bits of the pod spec:

  spec:

    containers:

      -

        name: "nginx-ssl-proxy"

        image: "gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest"

        env:

          -

            name: "SERVICE\_HOST\_ENV\_NAME"

            value: "JENKINS\_SERVICE\_HOST"

          -

            name: "SERVICE\_PORT\_ENV\_NAME"

            value: "JENKINS\_SERVICE\_PORT\_UI"

          -

            name: "ENABLE\_SSL"

            value: "true"

          -

            name: "ENABLE\_BASIC\_AUTH"

            value: "true"

        ports:

          -

            name: "nginx-ssl-proxy-http"

            containerPort: 80

          -

            name: "nginx-ssl-proxy-https"

            containerPort: 443

The pod will have a service exposing TCP 80 and 443 to a public load balancer. Here’s the service descriptor (also available in the sample app):

  kind: "Service"

  apiVersion: "v1"

  metadata:

    name: "nginx-ssl-proxy"

    labels:

      name: "nginx"

      role: "ssl-proxy"

  spec:

    ports:

      -

        name: "https"

        port: 443

        targetPort: "nginx-ssl-proxy-https"

        protocol: "TCP"

      -

        name: "http"

        port: 80

        targetPort: "nginx-ssl-proxy-http"

        protocol: "TCP"

    selector:

      name: "nginx"

      role: "ssl-proxy"

    type: "LoadBalancer"

And here’s an overview with the SSL termination proxy in place. Notice that Jenkins is no longer directly exposed to the public Internet:

Now, how did the Nginx pods get ahold of the super-secret SSL key/cert and htpasswd file (for basic access auth)?

Keep it secret, keep it safe

Kubernetes has an API and resource for Secrets. Secrets “are intended to hold sensitive information, such as passwords, OAuth tokens, and ssh keys. Putting this information in a secret is safer and more flexible than putting it verbatim in a pod definition or in a docker image.”

You can create secrets in your cluster in 3 simple steps:

1. Base64-encode your secret data (i.e., SSL key pair or htpasswd file)

$ cat ssl.key | base64  
   LS0tLS1CRUdJTiBDRVJUS...

1. Create a json document describing your secret, and add the base64-encoded values:

  apiVersion: "v1"

  kind: "Secret"

  metadata:

    name: "ssl-proxy-secret"

    namespace: "default"

  data:

    proxycert: "LS0tLS1CRUd..."

    proxykey: "LS0tLS1CR..."

    htpasswd: "ZXZhb..."

1. Create the secrets resource:

$ kubectl create -f secrets.json

To access the secrets from a container, specify them as a volume mount in your pod spec. Here’s the relevant excerpt from the Nginx proxy template we saw earlier:

  spec:

    containers:

      -

        name: "nginx-ssl-proxy"

        image: "gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest"

        env: [...]

        ports: ...[]

        volumeMounts:

          -

            name: "secrets"

            mountPath: "/etc/secrets"

            readOnly: true

    volumes:

      -

        name: "secrets"

        secret:

          secretName: "ssl-proxy-secret"

A volume of type secret that points to the ssl-proxy-secret secret resource is defined, and then mounted into /etc/secrets in the container. The secrets spec in the earlier example defined data.proxycert, data.proxykey, and data.htpasswd, so we would see those files appear (base64-decoded) in /etc/secrets/proxycert, /etc/secrets/proxykey, and /etc/secrets/htpasswd for the Nginx process to access.

All together now

I have “containers and Kubernetes are fun and cool!” moments all the time, like probably every day. I’m beginning to have “containers and Kubernetes are extremely useful and powerful and are adding value to what I do by helping me do important things with ease” more frequently. This SSL termination proxy with Nginx example is definitely one of the latter. I didn’t have to waste time learning a new way to use SSL. I was able to solve my problem using well-known tools, in a reusable way, and quickly (from idea to working took about 2 hours).

Check out the complete Automated Image Builds with Jenkins, Packer, and Kubernetes repo to see how the SSL termination proxy is used in a real cluster, or dig into the details of the proxy image in the nginx-ssl-proxy repo (complete with a Dockerfile and Packer template so you can build the image yourself).

Weekly Kubernetes Community Hangout Notes - July 10 2015

July 13 2015

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who’s interested to know what’s discussed in this forum.

Here are the notes from today’s meeting:

  • Eric Paris: replacing salt with ansible (if we want)
    • In contrib, there is a provisioning tool written in ansible
    • The goal in the rewrite was to eliminate as much of the cloud provider stuff as possible
    • The salt setup does a bunch of setup in scripts and then the environment is setup with salt
      • This means that things like generating certs is done differently on GCE/AWS/Vagrant
    • For ansible, everything must be done within ansible
    • Background on ansible
      • Does not have clients
      • Provisioner ssh into the machine and runs scripts on the machine
      • You define what you want your cluster to look like, run the script, and it sets up everything at once
      • If you make one change in a config file, ansible re-runs everything (which isn’t always desirable)
      • Uses a jinja2 template
    • Create machines with minimal software, then use ansible to get that machine into a runnable state
      • Sets up all of the add-ons
    • Eliminates the provisioner shell scripts
    • Full cluster setup currently takes about 6 minutes
      • CentOS with some packages
      • Redeploy to the cluster takes 25 seconds
    • Questions for Eric
      • Where does the provider-specific configuration go?
        • The only network setup that the ansible config does is flannel; you can turn it off
      • What about init vs. systemd?
        • Should be able to support in the code w/o any trouble (not yet implemented)
    • Discussion
      • Why not push the setup work into containers or kubernetes config?
        • To bootstrap a cluster drop a kubelet and a manifest
      • Running a kubelet and configuring the network should be the only things required. We can cut a machine image that is preconfigured minus the data package (certs, etc)
        • The ansible scripts install kubelet & docker if they aren’t already installed
      • Each OS (RedHat, Debian, Ubuntu) could have a different image. We could view this as part of the build process instead of the install process.
      • There needs to be solution for bare metal as well.
      • In favor of the overall goal – reducing the special configuration in the salt configuration
      • Everything except the kubelet should run inside a container (eventually the kubelet should as well)
        • Running in a container doesn’t cut down on the complexity that we currently have
        • But it does more clearly define the interface about what the code expects
      • These tools (Chef, Puppet, Ansible) conflate binary distribution with configuration
        • Containers more clearly separate these problems
      • The mesos deployment is not completely automated yet, but the mesos deployment is completely different: kubelets get put on top on an existing mesos cluster
        • The bash scripts allow the mesos devs to see what each cloud provider is doing and re-use the relevant bits
        • There was a large reverse engineering curve, but the bash is at least readable as opposed to the salt
      • Openstack uses a different deployment as well
      • We need a well documented list of steps (e.g. create certs) that are necessary to stand up a cluster
        • This would allow us to compare across cloud providers
        • We should reduce the number of steps as much as possible
        • Ansible has 241 steps to launch a cluster
  • 1.0 Code freeze
    • How are we getting out of code freeze?
    • This is a topic for next week, but the preview is that we will move slowly rather than totally opening the firehose
      • We want to clear the backlog as fast as possible while maintaining stability both on HEAD and on the 1.0 branch
      • The backlog of almost 300 PRs but there are also various parallel feature branches that have been developed during the freeze
    • Cutting a cherry pick release today (1.0.1) that fixes a few issues
    • Next week we will discuss the cadence for patch releases
@Kubernetesio View on Github #kubernetes-users Stack Overflow Download Kubernetes