Kubernetes Blog

AppC Support for Kubernetes through RKT

May 04 2015

We very recently accepted a pull request to the Kubernetes project to add appc support for the Kubernetes community.  Appc is a new open container specification that was initiated by CoreOS, and is supported through CoreOS rkt container runtime.

This is an important step forward for the Kubernetes project and for the broader containers community.  It adds flexibility and choice to the container-verse and brings the promise of  compelling new security and performance capabilities to the Kubernetes developer.

Container based runtimes (like Docker or rkt) when paired with smart orchestration technologies (like Kubernetes and/or Apache Mesos) are a legitimate disruption to the way that developers build and run their applications.  While the supporting technologies are relatively nascent, they do offer the promise of some very powerful new ways to assemble, deploy, update, debug and extend solutions.  I believe that the world has not yet felt the full potential of containers and the next few years are going to be particularly exciting!  With that in mind it makes sense for several projects to emerge with different properties and different purposes. It also makes sense to be able to plug together different pieces (whether it be the container runtime or the orchestrator) based on the specific needs of a given application.

Docker has done an amazing job of democratizing container technologies and making them accessible to the outside world, and we expect Kubernetes to support Docker indefinitely. CoreOS has also started to do interesting work with rkt to create an elegant, clean, simple and open platform that offers some really interesting properties.  It looks poised deliver a secure and performant operating environment for containers.  The Kubernetes team has been working with the appc team at CoreOS for a while and in many ways they built rkt with Kubernetes in mind as a simple pluggable runtime component.  

The really nice thing is that with Kubernetes you can now pick the container runtime that works best for you based on your workloads’ needs, change runtimes without having the replace your cluster environment, or even mix together applications where different parts are running in different container runtimes in the same cluster.  Additional choices can’t help but ultimately benefit the end developer.

– Craig McLuckie
Google Product Manager and Kubernetes co-founder

Weekly Kubernetes Community Hangout Notes - April 24 2015

April 30 2015

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who’s interested to know what’s discussed in this forum.

Agenda:

  • Flocker and Kubernetes integration demo

Notes:

  • flocker and kubernetes integration demo
    • Flocker Q/A

      • Does the file still exists on node1 after migration?

      • Brendan: Any plan this to make it a volume? So we don’t need powerstrip?

        • Luke: Need to figure out interest to decide if we want to make it a first-class persistent disk provider in kube.

        • Brendan: Removing need for powerstrip would make it simple to use. Totally go for it.

        • Tim: Should take no more than 45 minutes to add it to kubernetes:)

      • Derek: Contrast this with persistent volumes and claims?

        • Luke: Not much difference, except for the novel ZFS based backend. Makes workloads really portable.

        • Tim: very different than network-based volumes. Its interesting that it is the only offering that allows upgrading media.

        • Brendan: claims, how does it look for replicated claims? eg Cassandra wants to have replicated data underneath. It would be efficient to scale up and down. Create storage on the fly based on load dynamically. Its step beyond taking snapshots - programmatically creating replicas with preallocation.

        • Tim: helps with auto-provisioning.

      • Brian: Does flocker requires any other component?

        • Kai: Flocker control service co-located with the master. (dia on blog post). Powerstrip + Powerstrip Flocker. Very interested in mpersisting state in etcd. It keeps metadata about each volume.

        • Brendan: In future, flocker can be a plugin and we’ll take care of persistence. Post v1.0.

        • Brian: Interested in adding generic plugin for services like flocker.

        • Luke: Zfs can become really valuable when scaling to lot of containers on a single node.

      • Alex: Can flocker service can be run as a pod?

        • Kai: Yes, only requirement is the flocker control service should be able to talk to zfs agent. zfs agent needs to be installed on the host and zfs binaries need to be accessible.

        • Brendan: In theory, all zfs bits can be put it into a container with devices.

        • Luke: Yes, still working through cross-container mounting issue.

        • Tim: pmorie is working through it to make kubelet work in a container. Possible re-use.

      • Kai: Cinder support is coming. Few days away.

  • Bob: What’s the process of pushing kube to GKE? Need more visibility for confidence.

Borg: The Predecessor to Kubernetes

April 23 2015

Google has been running containerized workloads in production for more than a decade. Whether it’s service jobs like web front-ends and stateful servers, infrastructure systems like Bigtable and Spanner, or batch frameworks like MapReduce and Millwheel, virtually everything at Google runs as a container. Today, we took the wraps off of Borg, Google’s long-rumored internal container-oriented cluster-management system, publishing details at the academic computer systems conference Eurosys. You can find the paper here.

Kubernetes traces its lineage directly from Borg. Many of the developers at Google working on Kubernetes were formerly developers on the Borg project. We’ve incorporated the best ideas from Borg in Kubernetes, and have tried to address some pain points that users identified with Borg over the years.

To give you a flavor, here are four Kubernetes features that came from our experiences with Borg:

1) Pods. A pod is the unit of scheduling in Kubernetes. It is a resource envelope in which one or more containers run. Containers that are part of the same pod are guaranteed to be scheduled together onto the same machine, and can share state via local volumes.

Borg has a similar abstraction, called an alloc (short for “resource allocation”). Popular uses of allocs in Borg include running a web server that generates logs alongside a lightweight log collection process that ships the log to a cluster filesystem (not unlike fluentd or logstash); running a web server that serves data from a disk directory that is populated by a process that reads data from a cluster filesystem and prepares/stages it for the web server (not unlike a Content Management System); and running user-defined processing functions alongside a storage shard. Pods not only support these use cases, but they also provide an environment similar to running multiple processes in a single VM – Kubernetes users can deploy multiple co-located, cooperating processes in a pod without having to give up the simplicity of a one-application-per-container deployment model.

2) Services. Although Borg’s primary role is to manage the lifecycles of tasks and machines, the applications that run on Borg benefit from many other cluster services, including naming and load balancing. Kubernetes supports naming and load balancing using the service abstraction: a service has a name and maps to a dynamic set of pods defined by a label selector (see next section). Any container in the cluster can connect to the service using the service name. Under the covers, Kubernetes automatically load-balances connections to the service among the pods that match the label selector, and keeps track of where the pods are running as they get rescheduled over time due to failures.

3) Labels. A container in Borg is usually one replica in a collection of identical or nearly identical containers that correspond to one tier of an Internet service (e.g. the front-ends for Google Maps) or to the workers of a batch job (e.g. a MapReduce). The collection is called a Job, and each replica is called a Task. While the Job is a very useful abstraction, it can be limiting. For example, users often want to manage their entire service (composed of many Jobs) as a single entity, or to uniformly manage several related instances of their service, for example separate canary and stable release tracks. At the other end of the spectrum, users frequently want to reason about and control subsets of tasks within a Job – the most common example is during rolling updates, when different subsets of the Job need to have different configurations.

Kubernetes supports more flexible collections than Borg by organizing pods using labels, which are arbitrary key/value pairs that users attach to pods (and in fact to any object in the system). Users can create groupings equivalent to Borg Jobs by using a “job:<jobname>” label on their pods, but they can also use additional labels to tag the service name, service instance (production, staging, test), and in general, any subset of their pods. A label query (called a “label selector”) is used to select which set of pods an operation should be applied to. Taken together, labels and replication controllers allow for very flexible update semantics, as well as for operations that span the equivalent of Borg Jobs.

4) IP-per-Pod. In Borg, all tasks on a machine use the IP address of that host, and thus share the host’s port space. While this means Borg can use a vanilla network, it imposes a number of burdens on infrastructure and application developers: Borg must schedule ports as a resource; tasks must pre-declare how many ports they need, and take as start-up arguments which ports to use; the Borglet (node agent) must enforce port isolation; and the naming and RPC systems must handle ports as well as IP addresses.

Thanks to the advent of software-defined overlay networks such as flannel or those built into public clouds, Kubernetes is able to give every pod and service its own IP address. This removes the infrastructure complexity of managing ports, and allows developers to choose any ports they want rather than requiring their software to adapt to the ones chosen by the infrastructure. The latter point is crucial for making it easy to run off-the-shelf open-source applications on Kubernetes–pods can be treated much like VMs or physical hosts, with access to the full port space, oblivious to the fact that they may be sharing the same physical machine with other pods.

With the growing popularity of container-based microservice architectures, the lessons Google has learned from running such systems internally have become of increasing interest to the external DevOps community. By revealing some of the inner workings of our cluster manager Borg, and building our next-generation cluster manager as both an open-source project (Kubernetes) and a publicly available hosted service (Google Container Engine), we hope these lessons can benefit the broader community outside of Google and advance the state-of-the-art in container scheduling and cluster management.  

Kubernetes and the Mesosphere DCOS

April 22 2015

Kubernetes and the Mesosphere DCOS

Today Mesosphere announced the addition of Kubernetes as a standard part of their DCOS offering. This is a great step forwards in bringing cloud native application management to the world, and should lay to rest many questions we hear about ‘Kubernetes or Mesos, which one should I use?’. Now you can have your cake and eat it too: use both. Today’s announcement extends the reach of Kubernetes to a new class of users, and add some exciting new capabilities for everyone.

By way of background, Kubernetes is a cluster management framework that was started by Google nine months ago, inspired by the internal system known as Borg. You can learn a little more about Borg by checking out this paper. At the heart of it Kubernetes offers what has been dubbed ‘cloud native’ application management. To us, there are three things that together make something ‘cloud native’:

  • Container oriented deployments Package up your application components with all their dependencies and deploy them using technologies like Docker or Rocket. Containers radically simplify the deployment process, making rollouts repeatable and predictable.
  • Dynamically managed Rely on modern control systems to make moment-to-moment decisions around the health management and scheduling of applications to radically improve reliability and efficiency. There are some things that just machines do better than people, and actively running applications is one of those things.
  • Micro-services oriented Tease applications apart into small semi-autonomous services that can be consumed easily so that the resulting systems are easier to understand, extend and adapt.

Kubernetes was designed from the start to make these capabilities available to everyone, and built by the same engineers that built the system internally known as Borg. For many users the promise of ‘Google style app management’ is interesting, but they want to run these new classes of applications on the same set of physical resources as their existing workloads like Hadoop, Spark, Kafka, etc. Now they will have access to commercially supported offering that brings the two worlds together.

Mesosphere, one of the earliest supporters of the Kubernetes project, has been working closely with the core Kubernetes team to create a natural experience for users looking to get the best of both worlds, adding Kubernetes to every Mesos deployment they instantiate, whether it be in the public cloud, private cloud, or in a hybrid deployment model. This is well aligned with the overall Kubernetes vision of creating ubiquitous management framework that runs anywhere a container can. It will be interesting to see how you blend together the old world and the new on a commercially supported, versatile platform.

Craig McLuckie

Product Manager, Google and Kubernetes co-founder

Weekly Kubernetes Community Hangout Notes - April 17 2015

April 17 2015

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who’s interested to know what’s discussed in this forum.

Agenda

  • Mesos Integration
  • High Availability (HA)
  • Adding performance and profiling details to e2e to track regressions
  • Versioned clients

Notes

  • Mesos integration

    • Mesos integration proposal:

    • No blockers to integration.

    • Documentation needs to be updated.

  • HA

    • Proposal should land today.

    • Etcd cluster.

    • Load-balance apiserver.

    • Cold standby for controller manager and other master components.

  • Adding performance and profiling details to e2e to track regression

    • Want red light for performance regression

    • Need a public DB to post the data

      • See
    • Justin working on multi-platform e2e dashboard

  • Versioned clients

    *

    *

    • Client library currently uses internal API objects.

    • Nobody reported that frequent changes to types.go have been painful, but we are worried about it.

    • Structured types are useful in the client. Versioned structs would be ok.

    • If start with json/yaml (kubectl), shouldn’t convert to structured types. Use swagger.

  • Security context

    *

    • Administrators can restrict who can run privileged containers or require specific unix uids

    • Kubelet will be able to get pull credentials from apiserver

    • Policy proposal coming in the next week or so

  • Discussing upstreaming of users, etc. into Kubernetes, at least as optional
  • 1.0 Roadmap

    • Focus is performance, stability, cluster upgrades

    • TJ has been making some edits to roadmap.md but hasn’t sent out a PR yet

  • Kubernetes UI

    • Dependencies broken out into third-party

    • @lavalamp is reviewer

Kubernetes Release: 0.15.0

April 16 2015

Release Notes:

  • Enables v1beta3 API and sets it to the default API version (#6098)
  • Added multi-port Services (#6182)
    • New Getting Started Guides
    • Multi-node local startup guide (#6505)
    • Mesos on Google Cloud Platform (#5442)
    • Ansible Setup instructions (#6237)
  • Added a controller framework (#5270, #5473)
  • The Kubelet now listens on a secure HTTPS port (#6380)
  • Made kubectl errors more user-friendly (#6338)
  • The apiserver now supports client cert authentication (#6190)
  • The apiserver now limits the number of concurrent requests it processes (#6207)
  • Added rate limiting to pod deleting (#6355)
  • Implement Balanced Resource Allocation algorithm as a PriorityFunction in scheduler package (#6150)
  • Enabled log collection from master (#6396)
  • Added an api endpoint to pull logs from Pods (#6497)
  • Added latency metrics to scheduler (#6368)
  • Added latency metrics to REST client (#6409)
  • etcd now runs in a pod on the master (#6221)
  • nginx now runs in a container on the master (#6334)
  • Began creating Docker images for master components (#6326)
  • Updated GCE provider to work with gcloud 0.9.54 (#6270)
  • Updated AWS provider to fix Region vs Zone semantics (#6011)
  • Record event when image GC fails (#6091)
  • Add a QPS limiter to the kubernetes client (#6203)
  • Decrease the time it takes to run make release (#6196)
  • New volume support
    • Added iscsi volume plugin (#5506)
    • Added glusterfs volume plugin (#6174)
    • AWS EBS volume support (#5138)
  • Updated to heapster version to v0.10.0 (#6331)
  • Updated to etcd 2.0.9 (#6544)
  • Updated to Kibana to v1.2 (#6426)
  • Bug Fixes
    • Kube-proxy now updates iptables rules if a service’s public IPs change (#6123)
    • Retry kube-addons creation if the initial creation fails (#6200)
    • Make kube-proxy more resiliant to running out of file descriptors (#6727)

To download, please visit https://github.com/GoogleCloudPlatform/kubernetes/releases/tag/v0.15.0

Introducing Kubernetes API Version v1beta3

April 16 2015

We’ve been hard at work on cleaning up the API over the past several months (see https://github.com/GoogleCloudPlatform/kubernetes/issues/1519 for details). The result is v1beta3, which is considered to be the release candidate for the v1 API.

We would like you to move to this new API version as soon as possible. v1beta1 and v1beta2 are deprecated, and will be removed by the end of June, shortly after we introduce the v1 API.

As of the latest release, v0.15.0, v1beta3 is the primary, default API. We have changed the default kubectl and client API versions as well as the default storage version (which means objects persisted in etcd will be converted from v1beta1 to v1beta3 as they are rewritten). 

You can take a look at v1beta3 examples such as:

https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/guestbook/v1beta3

https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/walkthrough/v1beta3

https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/update-demo/v1beta3

To aid the transition, we’ve also created a conversion tool and put together a list of important different API changes.

  • The resource id is now called name.
  • namelabelsannotations, and other metadata are now nested in a map called metadata
  • desiredState is now called spec, and currentState is now called status
  • /minions has been moved to /nodes, and the resource has kind Node
  • The namespace is required (for all namespaced resources) and has moved from a URL parameter to the path:/api/v1beta3/namespaces/{namespace}/{resource_collection}/{resource_name}
  • The names of all resource collections are now lower cased - instead of replicationControllers, usereplicationcontrollers.
  • To watch for changes to a resource, open an HTTP or Websocket connection to the collection URL and provide the?watch=true URL parameter along with the desired resourceVersion parameter to watch from.
  • The container entrypoint has been renamed to command, and command has been renamed to args.
  • Container, volume, and node resources are expressed as nested maps (e.g., resources{cpu:1}) rather than as individual fields, and resource values support scaling suffixes rather than fixed scales (e.g., milli-cores).
  • Restart policy is represented simply as a string (e.g., “Always”) rather than as a nested map (“always{}”).
  • The volume source is inlined into volume rather than nested.
  • Host volumes have been changed to hostDir to hostPath  to better reflect that they can be files or directories

And the most recently generated Swagger specification of the API is here:

http://kubernetes.io/third_party/swagger-ui/#!/v1beta3

More details about our approach to API versioning and the transition can be found here:

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/api.md

Another change we discovered is that with the change to the default API version in kubectl, commands that use “-o template” will break unless you specify “–api-version=v1beta1” or update to v1beta3 syntax. An example of such a change can be seen here:

https://github.com/GoogleCloudPlatform/kubernetes/pull/6377/files

If you use “-o template”, I recommend always explicitly specifying the API version rather than relying upon the default. We may add this setting to kubeconfig in the future.

Let us know if you have any questions. As always, we’re available on IRC (#google-containers) and github issues.

@Kubernetesio View on Github #kubernetes-users Stack Overflow Download Kubernetes