Kubernetes Blog

Scaling Stateful Applications using Kubernetes Pet Sets and FlexVolumes with Datera Elastic Data Fabric

August 29 2016

Editor’s note: today’s guest post is by Shailesh Mittal, Software Architect and Ashok Rajagopalan, Sr Director Product at Datera Inc, talking about Stateful Application provisioning with Kubernetes on Datera Elastic Data Fabric.

Introduction

Persistent volumes in Kubernetes are foundational as customers move beyond stateless workloads to run stateful applications. While Kubernetes has supported stateful applications such as MySQL, Kafka, Cassandra, and Couchbase for a while, the introduction of Pet Sets has significantly improved this support. In particular, the procedure to sequence the provisioning and startup, the ability to scale and associate durably by Pet Sets has provided the ability to automate to scale the “Pets” (applications that require consistent handling and durable placement).

Datera, elastic block storage for cloud deployments, has seamlessly integrated with Kubernetes through the FlexVolume framework. Based on the first principles of containers, Datera allows application resource provisioning to be decoupled from the underlying physical infrastructure. This brings clean contracts (aka, no dependency or direct knowledge of the underlying physical infrastructure), declarative formats, and eventually portability to stateful applications.

While Kubernetes allows for great flexibility to define the underlying application infrastructure through yaml configurations, Datera allows for that configuration to be passed to the storage infrastructure to provide persistence. Through the notion of Datera AppTemplates, in a Kubernetes environment, stateful applications can be automated to scale.

Deploying Persistent Storage

Persistent storage is defined using the Kubernetes PersistentVolume subsystem. PersistentVolumes are volume plugins and define volumes that live independently of the lifecycle of the pod that is using it. They are implemented as NFS, iSCSI, or by cloud provider specific storage system. Datera has developed a volume plugin for PersistentVolumes that can provision iSCSI block storage on the Datera Data Fabric for Kubernetes pods.

The Datera volume plugin gets invoked by kubelets on minion nodes and relays the calls to the Datera Data Fabric over its REST API. Below is a sample deployment of a PersistentVolume with the Datera plugin:

  apiVersion: v1

  kind: PersistentVolume

  metadata:

    name: pv-datera-0

  spec:

    capacity:

      storage: 100Gi

    accessModes:

      - ReadWriteOnce

    persistentVolumeReclaimPolicy: Retain

    flexVolume:

      driver: "datera/iscsi"

      fsType: "xfs"

      options:

        volumeID: "kube-pv-datera-0"

        size: “100"

        replica: "3"

        backstoreServer: "[tlx170.tlx.daterainc.com](http://tlx170.tlx.daterainc.com/):7717”

This manifest defines a PersistentVolume of 100 GB to be provisioned in the Datera Data Fabric, should a pod request the persistent storage.

[root@tlx241 /]# kubectl get pv

NAME          CAPACITY   ACCESSMODES   STATUS      CLAIM     REASON    AGE

pv-datera-0   100Gi        RWO         Available                       8s

pv-datera-1   100Gi        RWO         Available                       2s

pv-datera-2   100Gi        RWO         Available                       7s

pv-datera-3   100Gi        RWO         Available                       4s

Configuration

The Datera PersistenceVolume plugin is installed on all minion nodes. When a pod lands on a minion node with a valid claim bound to the persistent storage provisioned earlier, the Datera plugin forwards the request to create the volume on the Datera Data Fabric. All the options that are specified in the PersistentVolume manifest are sent to the plugin upon the provisioning request.

Once a volume is provisioned in the Datera Data Fabric, volumes are presented as an iSCSI block device to the minion node, and kubelet mounts this device for the containers (in the pod) to access it.

Using Persistent Storage

Kubernetes PersistentVolumes are used along with a pod using PersistentVolume Claims. Once a claim is defined, it is bound to a PersistentVolume matching the claim’s specification. A typical claim for the PersistentVolume defined above would look like below:

kind: PersistentVolumeClaim

apiVersion: v1

metadata:

  name: pv-claim-test-petset-0

spec:

  accessModes:

    - ReadWriteOnce

  resources:

    requests:

      storage: 100Gi

When this claim is defined and it is bound to a PersistentVolume, resources can be used with the pod specification:

[root@tlx241 /]# kubectl get pv

NAME          CAPACITY   ACCESSMODES   STATUS      CLAIM                            REASON    AGE

pv-datera-0   100Gi      RWO           Bound       default/pv-claim-test-petset-0             6m

pv-datera-1   100Gi      RWO           Bound       default/pv-claim-test-petset-1             6m

pv-datera-2   100Gi      RWO           Available                                              7s

pv-datera-3   100Gi      RWO           Available                                              4s


[root@tlx241 /]# kubectl get pvc

NAME                     STATUS    VOLUME        CAPACITY   ACCESSMODES   AGE

pv-claim-test-petset-0   Bound     pv-datera-0   0                        3m

pv-claim-test-petset-1   Bound     pv-datera-1   0                        3m

A pod can use a PersistentVolume Claim like below:

apiVersion: v1

kind: Pod

metadata:

  name: kube-pv-demo

spec:

  containers:

  - name: data-pv-demo

    image: nginx

    volumeMounts:

    - name: test-kube-pv1

      mountPath: /data

    ports:

    - containerPort: 80

  volumes:

  - name: test-kube-pv1

    persistentVolumeClaim:

      claimName: pv-claim-test-petset-0

The result is a pod using a PersistentVolume Claim as a volume. It in-turn sends the request to the Datera volume plugin to provision storage in the Datera Data Fabric.

[root@tlx241 /]# kubectl describe pods kube-pv-demo

Name:       kube-pv-demo

Namespace:  default

Node:       tlx243/172.19.1.243

Start Time: Sun, 14 Aug 2016 19:17:31 -0700

Labels:     \<none\>

Status:     Running

IP:         10.40.0.3

Controllers: \<none\>

Containers:

  data-pv-demo:

    Container ID: [docker://ae2a50c25e03143d0dd721cafdcc6543fac85a301531110e938a8e0433f74447](about:blank)

    Image:   nginx

    Image ID: [docker://sha256:0d409d33b27e47423b049f7f863faa08655a8c901749c2b25b93ca67d01a470d](about:blank)

    Port:    80/TCP

    State:   Running

      Started:  Sun, 14 Aug 2016 19:17:34 -0700

    Ready:   True

    Restart Count:  0

    Environment Variables:  \<none\>

Conditions:

  Type           Status

  Initialized    True

  Ready          True

  PodScheduled   True

Volumes:

  test-kube-pv1:

    Type:  PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)

    ClaimName:   pv-claim-test-petset-0

    ReadOnly:    false

  default-token-q3eva:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  default-token-q3eva

    QoS Tier:  BestEffort

Events:

  FirstSeen LastSeen Count From SubobjectPath Type Reason Message

  --------- -------- ----- ---- ------------- -------- ------ -------

  43s 43s 1 {default-scheduler } Normal Scheduled Successfully assigned kube-pv-demo to tlx243

  42s 42s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Pulling pulling image "nginx"

  40s 40s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Pulled Successfully pulled image "nginx"

  40s 40s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Created Created container with docker id ae2a50c25e03

  40s 40s 1 {kubelet tlx243} spec.containers{data-pv-demo} Normal Started Started container with docker id ae2a50c25e03

The persistent volume is presented as iSCSI device at minion node (tlx243 in this case):

[root@tlx243 ~]# lsscsi

[0:2:0:0]    disk    SMC      SMC2208          3.24  /dev/sda

[11:0:0:0]   disk    DATERA   IBLOCK           4.0   /dev/sdb


[root@tlx243 datera~iscsi]# mount  ``` grep sdb

/dev/sdb on /var/lib/kubelet/pods/6b99bd2a-628e-11e6-8463-0cc47ab41442/volumes/datera~iscsi/pv-datera-0 type xfs (rw,relatime,attr2,inode64,noquota)

Containers running in the pod see this device mounted at /data as specified in the manifest:

[root@tlx241 /]# kubectl exec kube-pv-demo -c data-pv-demo -it bash

root@kube-pv-demo:/# mount  ``` grep data

/dev/sdb on /data type xfs (rw,relatime,attr2,inode64,noquota)

Using Pet Sets

Typically, pods are treated as stateless units, so if one of them is unhealthy or gets superseded, Kubernetes just disposes it. In contrast, a PetSet is a group of stateful pods that has a stronger notion of identity. The goal of a PetSet is to decouple this dependency by assigning identities to individual instances of an application that are not anchored to the underlying physical infrastructure.

A PetSet requires {0..n-1} Pets. Each Pet has a deterministic name, PetSetName-Ordinal, and a unique identity. Each Pet has at most one pod, and each PetSet has at most one Pet with a given identity. A PetSet ensures that a specified number of “pets” with unique identities are running at any given time. The identity of a Pet is comprised of:

  • a stable hostname, available in DNS
  • an ordinal index
  • stable storage: linked to the ordinal & hostname

A typical PetSet definition using a PersistentVolume Claim looks like below:

# A headless service to create DNS records

apiVersion: v1

kind: Service

metadata:

  name: test-service

  labels:

    app: nginx

spec:

  ports:

  - port: 80

    name: web

  clusterIP: None

  selector:

    app: nginx

---

apiVersion: apps/v1alpha1

kind: PetSet

metadata:

  name: test-petset

spec:

  serviceName: "test-service"

  replicas: 2

  template:

    metadata:

      labels:

        app: nginx

      annotations:

        [pod.alpha.kubernetes.io/initialized:](http://pod.alpha.kubernetes.io/initialized:) "true"

    spec:

      terminationGracePeriodSeconds: 0

      containers:

      - name: nginx

        image: [gcr.io/google\_containers/nginx-slim:0.8](http://gcr.io/google_containers/nginx-slim:0.8)

        ports:

        - containerPort: 80

          name: web

        volumeMounts:

        - name: pv-claim

          mountPath: /data

  volumeClaimTemplates:

  - metadata:

      name: pv-claim

      annotations:

        [volume.alpha.kubernetes.io/storage-class:](http://volume.alpha.kubernetes.io/storage-class:) anything

    spec:

      accessModes: ["ReadWriteOnce"]

      resources:

        requests:

          storage: 100Gi

We have the following PersistentVolume Claims available:

[root@tlx241 /]# kubectl get pvc

NAME                     STATUS    VOLUME        CAPACITY   ACCESSMODES   AGE

pv-claim-test-petset-0   Bound     pv-datera-0   0                        41m

pv-claim-test-petset-1   Bound     pv-datera-1   0                        41m

pv-claim-test-petset-2   Bound     pv-datera-2   0                        5s

pv-claim-test-petset-3   Bound     pv-datera-3   0                        2s

When this PetSet is provisioned, two pods get instantiated:

[root@tlx241 /]# kubectl get pods

NAMESPACE     NAME                        READY     STATUS    RESTARTS   AGE

default       test-petset-0               1/1       Running   0          7s

default       test-petset-1               1/1       Running   0          3s

Here is how the PetSet test-petset instantiated earlier looks like:

[root@tlx241 /]# kubectl describe petset test-petset

Name: test-petset

Namespace: default

Image(s): [gcr.io/google\_containers/nginx-slim:0.8](http://gcr.io/google_containers/nginx-slim:0.8)

Selector: app=nginx

Labels: app=nginx

Replicas: 2 current / 2 desired

Annotations: \<none\>

CreationTimestamp: Sun, 14 Aug 2016 19:46:30 -0700

Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed

No volumes.

No events.

Once a PetSet is instantiated, such as test-petset below, upon increasing the number of replicas (i.e. the number of pods started with that PetSet), more pods get instantiated and more PersistentVolume Claims get bound to new pods:

[root@tlx241 /]# kubectl patch petset test-petset -p'{"spec":{"replicas":"3"}}'

"test-petset” patched


[root@tlx241 /]# kubectl describe petset test-petset

Name: test-petset

Namespace: default

Image(s): [gcr.io/google\_containers/nginx-slim:0.8](http://gcr.io/google_containers/nginx-slim:0.8)

Selector: app=nginx

Labels: app=nginx

Replicas: 3 current / 3 desired

Annotations: \<none\>

CreationTimestamp: Sun, 14 Aug 2016 19:46:30 -0700

Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed

No volumes.

No events.


[root@tlx241 /]# kubectl get pods

NAME                        READY     STATUS    RESTARTS   AGE

test-petset-0               1/1       Running   0          29m

test-petset-1               1/1       Running   0          28m

test-petset-2               1/1       Running   0          9s

Now the PetSet is running 3 pods after patch application.

When the above PetSet definition is patched to have one more replica, it introduces one more pod in the system. This in turn results in one more volume getting provisioned on the Datera Data Fabric. So volumes get dynamically provisioned and attached to a pod upon the PetSet scaling up.

To support the notion of durability and consistency, if a pod moves from one minion to another, volumes do get attached (mounted) to the new minion node and detached (unmounted) from the old minion to maintain persistent access to the data.

Conclusion

This demonstrates Kubernetes with Pet Sets orchestrating stateful and stateless workloads. While the Kubernetes community is working on expanding the FlexVolume framework’s capabilities, we are excited that this solution makes it possible for Kubernetes to be run more widely in the datacenters.

Join and contribute: Kubernetes Storage SIG.

SIG Apps: build apps for and operate them in Kubernetes

August 16 2016

Editor’s note: This post is by the Kubernetes SIG-Apps team sharing how they focus on the developer and devops experience of running applications in Kubernetes.

Kubernetes is an incredible manager for containerized applications. Because of this, numerous companies have started to run their applications in Kubernetes.

Kubernetes Special Interest Groups (SIGs) have been around to support the community of developers and operators since around the 1.0 release. People organized around networking, storage, scaling and other operational areas.

As Kubernetes took off, so did the need for tools, best practices, and discussions around building and operating cloud native applications. To fill that need the Kubernetes SIG Apps came into existence.

SIG Apps is a place where companies and individuals can:

  • see and share demos of the tools being built to enable app operators
  • learn about and discuss needs of app operators
  • organize around efforts to improve the experience

Since the inception of SIG Apps we’ve had demos of projects like KubeFuse, KPM, and StackSmith. We’ve also executed on a survey of those operating apps in Kubernetes.

From the survey results we’ve learned a number of things including:

  • That 81% of respondents want some form of autoscaling
  • To store secret information 47% of respondents use built-in secrets. At reset these are not currently encrypted. (If you want to help add encryption there is an issue for that.) 
  • The most responded questions had to do with 3rd party tools and debugging
  • For 3rd party tools to manage applications there were no clear winners. There are a wide variety of practices
  • An overall complaint about a lack of useful documentation. (Help contribute to the docs here.)
  • There’s a lot of data. Many of the responses were optional so we were surprised that 935 of all questions across all candidates were filled in. If you want to look at the data yourself it’s available online.

When it comes to application operation there’s still a lot to be figured out and shared. If you’ve got opinions about running apps, tooling to make the experience better, or just want to lurk and learn about what’s going please come join us.

–Matt Farina, Principal Engineer, Hewlett Packard Enterprise

Kubernetes Namespaces: use cases and insights

August 16 2016

“Who’s on first, What’s on second, I Don’t Know’s on third” 

Who’s on First? by Abbott and Costello

Introduction

Kubernetes is a system with several concepts. Many of these concepts get manifested as “objects” in the RESTful API (often called “resources” or “kinds”). One of these concepts is Namespaces. In Kubernetes, Namespaces are the way to partition a single Kubernetes cluster into multiple virtual clusters. In this post we’ll highlight examples of how our customers are using Namespaces. 

But first, a metaphor: Namespaces are like human family names. A family name, e.g. Wong, identifies a family unit. Within the Wong family, one of its members, e.g. Sam Wong, is readily identified as just “Sam” by the family. Outside of the family, and to avoid “Which Sam?” problems, Sam would usually be referred to as “Sam Wong”, perhaps even “Sam Wong from San Francisco”.  

Namespaces are a logical partitioning capability that enable one Kubernetes cluster to be used by multiple users, teams of users, or a single user with multiple applications without concern for undesired interaction. Each user, team of users, or application may exist within its Namespace, isolated from every other user of the cluster and operating as if it were the sole user of the cluster. (Furthermore, Resource Quotas provide the ability to allocate a subset of a Kubernetes cluster’s resources to a Namespace.)

For all but the most trivial uses of Kubernetes, you will benefit by using Namespaces. In this post, we’ll cover the most common ways that we’ve seen Kubernetes users on Google Cloud Platform use Namespaces, but our list is not exhaustive and we’d be interested to learn other examples from you.

Use-cases covered

  • Roles and Responsibilities in an enterprise for namespaces
  • Partitioning landscapes: dev vs. test vs. prod
  • Customer partitioning for non-multi-tenant scenarios
  • When not to use namespaces

Use-case #1: Roles and Responsibilities in an Enterprise

A typical enterprise contains multiple business/technology entities that operate independently of each other with some form of overarching layer of controls managed by the enterprise itself. Operating a Kubernetes clusters in such an environment can be done effectively when roles and responsibilities pertaining to Kubernetes are defined. 

Below are a few recommended roles and their responsibilities that can make managing Kubernetes clusters in a large scale organization easier.

  • Designer/Architect role: This role will define the overall namespace strategy, taking into account product/location/team/cost-center and determining how best to map these to Kubernetes Namespaces. Investing in such a role prevents namespace proliferation and “snowflake” Namespaces.
  • Admin role: This role has admin access to all Kubernetes clusters. Admins can create/delete clusters and add/remove nodes to scale the clusters. This role will be responsible for patching, securing and maintaining the clusters. As well as implementing Quotas between the different entities in the organization. The Kubernetes Admin is responsible for implementing the namespaces strategy defined by the Designer/Architect. 

These two roles and the actual developers using the clusters will also receive support and feedback from the enterprise security and network teams on issues such as security isolation requirements and how namespaces fit this model, or assistance with networking subnets and load-balancers setup.

Anti-patterns

  1. Isolated Kubernetes usage “Islands” without centralized control: Without the initial investment in establishing a centralized control structure around Kubernetes management there is a risk of ending with a “mushroom farm” topology i.e. no defined size/shape/structure of clusters within the org. The result is a difficult to manage, higher risk and elevated cost due to underutilization of resources.
  2. Old-world IT controls choking usage and innovation: A common tendency is to try and transpose existing on-premises controls/procedures onto new dynamic frameworks .This results in weighing down the agile nature of these frameworks and nullifying the benefits of rapid dynamic deployments.
  3. Omni-cluster: Delaying the effort of creating the structure/mechanism for namespace management can result in one large omni-cluster that is hard to peel back into smaller usage groups. 

Use-case #2: Using Namespaces to partition development landscapes

Software development teams customarily partition their development pipelines into discrete units. These units take various forms and use various labels but will tend to result in a discrete dev environment, a testing QA environment, possibly a staging environment and finally a production environment. The resulting layouts are ideally suited to Kubernetes Namespaces. Each environment or stage in the pipeline becomes a unique namespace.

The above works well as each namespace can be templated and mirrored to the next subsequent environment in the dev cycle, e.g. dev->qa->prod. The fact that each namespace is logically discrete allows the development teams to work within an isolated “development” namespace. DevOps (The closest role at Google is called Site Reliability Engineering “SRE”)  will be responsible for migrating code through the pipelines and ensuring that appropriate teams are assigned to each environment. Ultimately, DevOps is solely responsible for the final, production environment where the solution is delivered to the end-users.

A major benefit of applying namespaces to the development cycle is that the naming of software components (e.g. micro-services/endpoints) can be maintained without collision across the different environments. This is due to the isolation of the Kubernetes namespaces, e.g. serviceX in dev would be referred to as such across all the other namespaces; but, if necessary, could be uniquely referenced using its full qualified name serviceX.development.mycluster.com in the development namespace of mycluster.com.

Anti-patterns

  1. Abusing the namespace benefit resulting in unnecessary environments in the development pipeline. So; if you don’t do staging deployments, don’t create a “staging” namespace.
  2. Overcrowding namespaces e.g. having all your development projects in one huge “development” namespace. Since namespaces attempt to partition, use these to partition by your projects as well. Since Namespaces are flat, you may wish something similar to: projectA-dev, projectA-prod as projectA’s namespaces.

Use-case #3: Partitioning of your Customers

If you are, for example, a consulting company that wishes to manage separate applications for each of your customers, the partitioning provided by Namespaces aligns well. You could create a separate Namespace for each customer, customer project or customer business unit to keep these distinct while not needing to worry about reusing the same names for resources across projects.

An important consideration here is that Kubernetes does not currently provide a mechanism to enforce access controls across namespaces and so we recommend that you do not expose applications developed using this approach externally.

Anti-patterns

  1. Multi-tenant applications don’t need the additional complexity of Kubernetes namespaces since the application is already enforcing this partitioning.
  2. Inconsistent mapping of customers to namespaces. For example, you win business at a global corporate, you may initially consider one namespace for the enterprise not taking into account that this customer may prefer further partitioning e.g. BigCorp Accounting and BigCorp Engineering. In this case, the customer’s departments may each warrant a namespace.

When Not to use Namespaces

In some circumstances Kubernetes Namespaces will not provide the isolation that you need. This may be due to geographical, billing or security factors. For all the benefits of the logical partitioning of namespaces, there is currently no ability to enforce the partitioning. Any user or resource in a Kubernetes cluster may access any other resource in the cluster regardless of namespace. So, if you need to protect or isolate resources, the ultimate namespace is a separate Kubernetes cluster against which you may apply your regular security ACL controls.

Another time when you may consider not using namespaces is when you wish to reflect a geographically distributed deployment. If you wish to deploy close to US, EU and Asia customers, a Kubernetes cluster deployed locally in each region is recommended.

When fine-grained billing is required perhaps to chargeback by cost-center or by customer, the recommendation is to leave the billing to your infrastructure provider. For example, in Google Cloud Platform (GCP), you could use a separate GCP Project or Billing Account and deploy a Kubernetes cluster to a specific-customer’s project(s).

In situations where confidentiality or compliance require complete opaqueness between customers, a Kubernetes cluster per customer/workload will provide the desired level of isolation. Once again, you should delegate the partitioning of resources to your provider.

Work is underway to provide (a) ACLs on Kubernetes Namespaces to be able to enforce security; (b) to provide Kubernetes Cluster Federation. Both mechanisms will address the reasons for the separate Kubernetes clusters in these anti-patterns. 

An easy to grasp anti-pattern for Kubernetes namespaces is versioning. You should not use Namespaces as a way to disambiguate versions of your Kubernetes resources. Support for versioning is present in the containers and container registries as well as in Kubernetes Deployment resource. Multiple versions should coexist by utilizing the Kubernetes container model which also provides for auto migration between versions with deployments. Furthermore versions scope namespaces will cause massive proliferation of namespaces within a cluster making it hard to manage.

Caveat Gubernator

You may wish to, but you cannot create a hierarchy of namespaces. Namespaces cannot be nested within one another. You can’t, for example, create my-team.my-org as a namespace but could perhaps have team-org.

Namespaces are easy to create and use but it’s also easy to deploy code inadvertently into the wrong namespace. Good DevOps hygiene suggests documenting and automating processes where possible and this will help. The other way to avoid using the wrong namespace is to set a kubectl context

As mentioned previously, Kubernetes does not (currently) provide a mechanism to enforce security across Namespaces. You should only use Namespaces within trusted domains (e.g. internal use) and not use Namespaces when you need to be able to provide guarantees that a user of the Kubernetes cluster or ones its resources be unable to access any of the other Namespaces resources. This enhanced security functionality is being discussed in the Kubernetes Special Interest Group for Authentication and Authorization, get involved at SIG-Auth

–Mike Altarace & Daz Wilkin, Strategic Customer Engineers, Google Cloud Platform

Create a Couchbase cluster using Kubernetes

August 15 2016

Editor’s note: today’s guest post is by Arun Gupta, Vice President Developer Relations at Couchbase, showing how to setup a Couchbase cluster with Kubernetes.

Couchbase Server is an open source, distributed NoSQL document-oriented database. It exposes a fast key-value store with managed cache for submillisecond data operations, purpose-built indexers for fast queries and a query engine for executing SQL queries. For mobile and Internet of Things (IoT) environments, Couchbase Lite runs native on-device and manages sync to Couchbase Server.

Couchbase Server 4.5 was recently announced, bringing many new features, including production certified support for Docker. Couchbase is supported on a wide variety of orchestration frameworks for Docker containers, such as Kubernetes, Docker Swarm and Mesos, for full details visit this page.

This blog post will explain how to create a Couchbase cluster using Kubernetes. This setup is tested using Kubernetes 1.3.3, Amazon Web Services, and Couchbase 4.5 Enterprise Edition.

Like all good things, this post is standing on the shoulder of giants. The design pattern used in this blog was defined in a Friday afternoon hack with @saturnism. A working version of the configuration files was contributed by @r_schmiddy.

Couchbase Cluster

A cluster of Couchbase Servers is typically deployed on commodity servers. Couchbase Server has a peer-to-peer topology where all the nodes are equal and communicate to each other on demand. There is no concept of master nodes, slave nodes, config nodes, name nodes, head nodes, etc, and all the software loaded on each node is identical. It allows the nodes to be added or removed without considering their “type”. This model works particularly well with cloud infrastructure in general. For Kubernetes, this means that we can use the exact same container image for all Couchbase nodes.

A typical Couchbase cluster creation process looks like:

  • Start Couchbase: Start n Couchbase servers
  • Create cluster: Pick any server, and add all other servers to it to create the cluster
  • Rebalance cluster: Rebalance the cluster so that data is distributed across the cluster

In order to automate using Kubernetes, the cluster creation is split into a “master” and “worker” Replication Controller (RC).

The master RC has only one replica and is also published as a Service. This provides a single reference point to start the cluster creation. By default services are visible only from inside the cluster. This service is also exposed as a load balancer. This allows the Couchbase Web Console to be accessible from outside the cluster.

The worker RC use the exact same image as master RC. This keeps the cluster homogenous which allows to scale the cluster easily.

Configuration files used in this blog are available here. Let’s create the Kubernetes resources to create the Couchbase cluster.

Create Couchbase “master” Replication Controller

Couchbase master RC can be created using the following configuration file:

apiVersion: v1  
kind: ReplicationController  
metadata:  
  name: couchbase-master-rc  
spec:  
  replicas: 1  
  selector:  
    app: couchbase-master-pod  
  template:  
    metadata:  
      labels:  
        app: couchbase-master-pod  
    spec:  
      containers:  
      - name: couchbase-master  
        image: arungupta/couchbase:k8s  
        env:  
          - name: TYPE  
            value: MASTER  
        ports:  
        - containerPort: 8091  
----  
apiVersion: v1  
kind: Service  
metadata:   
  name: couchbase-master-service  
  labels:   
    app: couchbase-master-service  
spec:   
  ports:  
    - port: 8091  
  selector:   
    app: couchbase-master-pod  
  type: LoadBalancer

This configuration file creates a couchbase-master-rc Replication Controller. This RC has one replica of the pod created using the arungupta/couchbase:k8s image. This image is created using the Dockerfile here. This Dockerfile uses a configuration script to configure the base Couchbase Docker image. First, it uses Couchbase REST API to setup memory quota, setup index, data and query services, security credentials, and loads a sample data bucket. Then, it invokes the appropriate Couchbase CLI commands to add the Couchbase node to the cluster or add the node and rebalance the cluster. This is based upon three environment variables:

  • TYPE: Defines whether the joining pod is worker or master
  • AUTO_REBALANCE: Defines whether the cluster needs to be rebalanced
  • COUCHBASE_MASTER: Name of the master service

For this first configuration file, the TYPE environment variable is set to MASTER and so no additional configuration is done on the Couchbase image.

Let’s create and verify the artifacts.

Create Couchbase master RC:

kubectl create -f cluster-master.yml   
replicationcontroller "couchbase-master-rc" created  
service "couchbase-master-service" created

List all the services:

kubectl get svc  
NAME                       CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE  
couchbase-master-service   10.0.57.201                 8091/TCP   30s  
kubernetes                 10.0.0.1      \<none\>        443/TCP    5h

Output shows that couchbase-master-service is created.

Get all the pods:

kubectl get po  
NAME                        READY     STATUS    RESTARTS   AGE  
couchbase-master-rc-97mu5   1/1       Running   0          1m

A pod is created using the Docker image specified in the configuration file.

Check the RC:

kubectl get rc  
NAME                  DESIRED   CURRENT   AGE  
couchbase-master-rc   1         1         1m

It shows that the desired and current number of pods in the RC are matching.

Describe the service:

kubectl describe svc couchbase-master-service  
Name: couchbase-master-service  
Namespace: default  
Labels: app=couchbase-master-service  
Selector: app=couchbase-master-pod  
Type: LoadBalancer  
IP: 10.0.57.201  
LoadBalancer Ingress: a94f1f286590c11e68e100283628cd6c-1110696566.us-west-2.elb.amazonaws.com  
Port: \<unset\> 8091/TCP  
NodePort: \<unset\> 30019/TCP  
Endpoints: 10.244.2.3:8091  
Session Affinity: None  
Events:

  FirstSeen LastSeen Count From SubobjectPath Type Reason Message

  --------- -------- ----- ---- ------------- -------- ------ -------

  2m 2m 1 {service-controller } Normal CreatingLoadBalancer Creating load balancer

  2m 2m 1 {service-controller } Normal CreatedLoadBalancer Created load balancer

Among other details, the address shown next to LoadBalancer Ingress is relevant for us. This address is used to access the Couchbase Web Console.

Wait for ~3 mins for the load balancer to be ready to receive requests. Couchbase Web Console is accessible at <ip>:8091 and looks like:

The image used in the configuration file is configured with the Administrator username and password password. Enter the credentials to see the console:

Click on Server Nodes to see how many Couchbase nodes are part of the cluster. As expected, it shows only one node:

Click on Data Buckets to see a sample bucket that was created as part of the image:

This shows the travel-sample bucket is created and has 31,591 JSON documents.

Create Couchbase “worker” Replication Controller
Now, let’s create a worker replication controller. It can be created using the configuration file:

apiVersion: v1  
kind: ReplicationController  
metadata:  
  name: couchbase-worker-rc  
spec:  
  replicas: 1  
  selector:  
    app: couchbase-worker-pod  
  template:  
    metadata:  
      labels:  
        app: couchbase-worker-pod  
    spec:  
      containers:  
      - name: couchbase-worker  
        image: arungupta/couchbase:k8s  
        env:  
          - name: TYPE  
            value: "WORKER"  
          - name: COUCHBASE\_MASTER  
            value: "couchbase-master-service"  
          - name: AUTO\_REBALANCE  
            value: "false"  
        ports:  
        - containerPort: 8091

This RC also creates a single replica of Couchbase using the same arungupta/couchbase:k8s image. The key differences here are:

  • TYPE environment variable is set to WORKER. This adds a worker Couchbase node to be added to the cluster.
  • COUCHBASE_MASTER environment variable is passed the value of couchbase-master-service. This uses the service discovery mechanism built into Kubernetes for pods in the worker and the master to communicate.
  • AUTO_REBALANCE environment variable is set to false. This ensures that the node is only added to the cluster but the cluster itself is not rebalanced. Rebalancing is required to to re-distribute data across multiple nodes of the cluster. This is the recommended way as multiple nodes can be added first, and then cluster can be manually rebalanced using the Web Console. Let’s create a worker:
kubectl create -f cluster-worker.yml   
replicationcontroller "couchbase-worker-rc" created

Check the RC:

kubectl get rc  
NAME                  DESIRED   CURRENT   AGE  
couchbase-master-rc   1         1         6m  
couchbase-worker-rc   1         1         22s

A new couchbase-worker-rc is created where the desired and the current number of instances are matching.

Get all pods:

kubectl get po  
NAME                        READY     STATUS    RESTARTS   AGE  
couchbase-master-rc-97mu5   1/1       Running   0          6m  
couchbase-worker-rc-4ik02   1/1       Running   0          46s

An additional pod is now created. Each pod’s name is prefixed with the corresponding RC’s name. For example, a worker pod is prefixed with couchbase-worker-rc.

Couchbase Web Console gets updated to show that a new Couchbase node is added. This is evident by red circle with the number 1 on the Pending Rebalance tab.

Clicking on the tab shows the IP address of the node that needs to be rebalanced:

Scale Couchbase cluster

Now, let’s scale the Couchbase cluster by scaling the replicas for worker RC:

kubectl scale rc couchbase-worker-rc --replicas=3  
replicationcontroller "couchbase-worker-rc" scaled

Updated state of RC shows that 3 worker pods have been created:

kubectl get rc  
NAME                  DESIRED   CURRENT   AGE  
couchbase-master-rc   1         1         8m  
couchbase-worker-rc   3         3         2m

This can be verified again by getting the list of pods:

kubectl get po  
NAME                        READY     STATUS    RESTARTS   AGE  
couchbase-master-rc-97mu5   1/1       Running   0          8m  
couchbase-worker-rc-4ik02   1/1       Running   0          2m  
couchbase-worker-rc-jfykx   1/1       Running   0          53s  
couchbase-worker-rc-v8vdw   1/1       Running   0          53s

Pending Rebalance tab of Couchbase Web Console shows that 3 servers have now been added to the cluster and needs to be rebalanced.

Rebalance Couchbase Cluster

Finally, click on Rebalance button to rebalance the cluster. A message window showing the current state of rebalance is displayed:

Once all the nodes are rebalanced, Couchbase cluster is ready to serve your requests:

In addition to creating a cluster, Couchbase Server supports a range of high availability and disaster recovery (HA/DR) strategies. Most HA/DR strategies rely on a multi-pronged approach of maximizing availability, increasing redundancy within and across data centers, and performing regular backups.

Now that your Couchbase cluster is ready, you can run your first sample application.

For further information check out the Couchbase Developer Portal and Forums, or see questions on Stack Overflow.

–Arun Gupta, Vice President Developer Relations at Couchbase

Challenges of a Remotely Managed, On-Premises, Bare-Metal Kubernetes Cluster

August 02 2016

Today’s post is written by Bich Le, chief architect at Platform9, describing how their engineering team overcame challenges in remotely managing bare-metal Kubernetes clusters. 

Introduction

The recently announced Platform9 Managed Kubernetes (PMK) is an on-premises enterprise Kubernetes solution with an unusual twist: while clusters run on a user’s internal hardware, their provisioning, monitoring, troubleshooting and overall life cycle is managed remotely from the Platform9 SaaS application. While users love the intuitive experience and ease of use of this deployment model, this approach poses interesting technical challenges. In this article, we will first describe the motivation and deployment architecture of PMK, and then present an overview of the technical challenges we faced and how our engineering team addressed them.

Multi-OS bootstrap model

Like its predecessor, Managed OpenStack, PMK aims to make it as easy as possible for an enterprise customer to deploy and operate a “private cloud”, which, in the current context, means one or more Kubernetes clusters. To accommodate customers who standardize on a specific Linux distro, our installation process uses a “bare OS” or “bring your own OS” model, which means that an administrator deploys PMK to existing Linux nodes by installing a simple RPM or Deb package on their favorite OS (Ubuntu-14, CentOS-7, or RHEL-7). The package, which the administrator downloads from their Platform9 SaaS portal, starts an agent which is preconfigured with all the information and credentials needed to securely connect to and register itself with the customer’s Platform9 SaaS controller running on the WAN.

Node management

The first challenge was configuring Kubernetes nodes in the absence of a bare-metal cloud API and SSH access into nodes. We solved it using the node pool concept and configuration management techniques. Every node running the agent automatically shows up in the SaaS portal, which allows the user to authorize the node for use with Kubernetes. A newly authorized node automatically enters a node pool, indicating that it is available but not used in any clusters. Independently, the administrator can create one or more Kubernetes clusters, which start out empty. At any later time, he or she can request one or more nodes to be attached to any cluster. PMK fulfills the request by transferring the specified number of nodes from the pool to the cluster. When a node is authorized, its agent becomes a configuration management agent, polling for instructions from a CM server running in the SaaS application and capable of downloading and configuring software.

Cluster creation and node attach/detach operations are exposed to administrators via a REST API, a CLI utility named qb, and the SaaS-based Web UI. The following screenshot shows the Web UI displaying one 3-node cluster named clus100, one empty cluster clus101, and the three nodes.

clusters_and_containervisors_view.png

Cluster initialization

The first time one or more nodes are attached to a cluster, PMK configures the nodes to form a complete Kubernetes cluster. Currently, PMK automatically decides the number and placement of Master and Worker nodes. In the future, PMK will give administrators an “advanced mode” option allowing them to override and customize those decisions. Through the CM server, PMK then sends to each node a configuration and a set of scripts to initialize each node according to the configuration. This includes installing or upgrading Docker to the required version; starting 2 docker daemons (bootstrap and main), creating the etcd K/V store, establishing the flannel network layer, starting kubelet, and running the Kubernetes appropriate for the node’s role (master vs. worker). The following diagram shows the component layout of a fully formed cluster.

architecture.png

Containerized kubelet?

Another hurdle we encountered resulted from our original decision to run kubelet as recommended by the Multi-node Docker Deployment Guide. We discovered that this approach introduces complexities that led to many difficult-to-troubleshoot bugs that were sensitive to the combined versions of Kubernetes, Docker, and the node OS. Example: kubelet’s need to mount directories containing secrets into containers to support the Service Accounts mechanism. It turns out that doing this from inside of a container is tricky, and requires a complex sequence of steps that turned out to be fragile. After fixing a continuing stream of issues, we finally decided to run kubelet as a native program on the host OS, resulting in significantly better stability.

Overcoming networking hurdles

The Beta release of PMK currently uses flannel with UDP back-end for the network layer. In a Kubernetes cluster, many infrastructure services need to communicate across nodes using a variety of ports (443, 4001, etc..) and protocols (TCP and UDP). Often, customer nodes intentionally or unintentionally block some or all of the traffic, or run existing services that conflict with the required ports, resulting in non-obvious failures. To address this, we try to detect configuration problems early and inform the administrator immediately. PMK runs a “preflight” check on all nodes participating in a cluster before installing the Kubernetes software. This means running small test programs on each node to verify that (1) the required ports are available for binding and listening; and (2) nodes can connect to each other using all required ports and protocols. These checks run in parallel and take less than a couple of seconds before cluster initialization.

Monitoring

One of the values of a SaaS-managed private cloud is constant monitoring and early detection of problems by the SaaS team. Issues that can be addressed without intervention by the customer are handled automatically, while others trigger proactive communication with the customer via UI alerts, email, or real-time channels. Kubernetes monitoring is a huge topic worthy of its own blog post, so we’ll just briefly touch upon it. We broadly classify the problem into layers: (1) hardware & OS, (2) Kubernetes core (e.g. API server, controllers and kubelets), (3) add-ons (e.g. SkyDNS & ServiceLoadbalancer) and (4) applications. We are currently focused on layers 1-3. A major source of issues we’ve seen is add-on failures. If either DNS or the ServiceLoadbalancer reverse http proxy (soon to be upgraded to an Ingress Controller) fails, application services will start failing. One way we detect such failures is by monitoring the components using the Kubernetes API itself, which is proxied into the SaaS controller, allowing the PMK support team to monitor any cluster resource. To detect service failure, one metric we pay attention to is pod restarts. A high restart count indicates that a service is continually failing.

Future topics

We faced complex challenges in other areas that deserve their own posts: (1) Authentication and authorization with Keystone, the identity manager used by Platform9 products; (2) Software upgrades, i.e. how to make them brief and non-disruptive to applications; and (3) Integration with customer’s external load-balancers (in the absence of good automation APIs).

Conclusion

Platform9 Managed Kubernetes uses a SaaS-managed model to try to hide the complexity of deploying, operating and maintaining bare-metal Kubernetes clusters in customers’ data centers. These requirements led to the development of a unique cluster deployment and management architecture, which in turn led to unique technical challenges.This article described an overview of some of those challenges and how we solved them. For more information on the motivation behind PMK, feel free to view Madhura Maskasky’s blog post.

–Bich Le, Chief Architect, Platform9

Why OpenStack's embrace of Kubernetes is great for both communities

July 26 2016

Today, Mirantis, the leading contributor to OpenStack, announced that it will re-write its private cloud platform to use Kubernetes as its underlying orchestration engine. We think this is a great step forward for both the OpenStack and Kubernetes communities. With Kubernetes under the hood, OpenStack users will benefit from the tremendous efficiency, manageability and resiliency that Kubernetes brings to the table, while positioning their applications to use more cloud-native patterns. The Kubernetes community, meanwhile, can feel confident in their choice of orchestration framework, while gaining the ability to manage both container- and VM-based applications from a single platform.

The Path to Cloud Native

Google spent over ten years developing, applying and refining the principles of cloud native computing. A cloud-native application is:

  • Container-packaged. Applications are composed of hermetically sealed, reusable units across diverse environments;
  • Dynamically scheduled, for increased infrastructure efficiency and decreased operational overhead; and 
  • Microservices-based. Loosely coupled components significantly increase the overall agility, resilience and maintainability of applications.

These principles have enabled us to build the largest, most efficient, most powerful cloud infrastructure in the world, which anyone can access via Google Cloud Platform. They are the same principles responsible for the recent surge in popularity of Linux containers. Two years ago, we open-sourced Kubernetes to spur adoption of containers and scalable, microservices-based applications, and the recently released Kubernetes version 1.3 introduces a number of features to bridge enterprise and cloud native workloads. We expect that adoption of cloud-native principles will drive the same benefits within the OpenStack community, as well as smoothing the path between OpenStack and the public cloud providers that embrace them.

Making OpenStack better

We hear from enterprise customers that they want to move towards cloud-native infrastructure and application patterns. Thus, it is hardly surprising that OpenStack would also move in this direction [1], with large OpenStack users such as eBay and GoDaddy adopting Kubernetes as key components of their stack. Kubernetes and cloud-native patterns will improve OpenStack lifecycle management by enabling rolling updates, versioning, and canary deployments of new components and features. In addition, OpenStack users will benefit from self-healing infrastructure, making OpenStack easier to manage and more resilient to the failure of core services and individual compute nodes. Finally, OpenStack users will realize the developer and resource efficiencies that come with a container-based infrastructure.

OpenStack is a great tool for Kubernetes users

Conversely, incorporating Kubernetes into OpenStack will give Kubernetes users access to a robust framework for deploying and managing applications built on virtual machines. As users move to the cloud-native model, they will be faced with the challenge of managing hybrid application architectures that contain some mix of virtual machines and Linux containers. The combination of Kubernetes and OpenStack means that they can do so on the same platform using a common set of tools.

We are excited by the ever increasing momentum of the cloud-native movement as embodied by Kubernetes and related projects, and look forward to working with Mirantis, its partner Intel, and others within the OpenStack community to brings the benefits of cloud-native to their applications and infrastructure.

–Martin Buhr, Product Manager, Strategic Initiatives, Google

[1] Check out the announcement of Kubernetes-OpenStack Special Interest Group here, and a great talk about OpenStack on Kubernetes by CoreOS CEO Alex Polvi at the most recent OpenStack summit here.

A Very Happy Birthday Kubernetes

July 21 2016

Last year at OSCON, I got to reconnect with a bunch of friends and see what they have been working on. That turned out to be the Kubernetes 1.0 launch event. Even that day, it was clear the project was supported by a broad community – a group that showed an ambitious vision for distributed computing. 

Today, on the first anniversary of the Kubernetes 1.0 launch, it’s amazing to see what a community of dedicated individuals can do. Kubernauts have collectively put in 237 person years of coding effort since launch to bring forward our most recent release 1.3. However the community is much more than simply coding effort. It is made up of people – individuals that have given their expertise and energy to make this project flourish. With more than 830 diverse contributors, from independents to the largest companies in the world, it’s their work that makes Kubernetes stand out. Here are stories from a couple early contributors reflecting back on the project:

The community is also more than online GitHub and Slack conversation; year one saw the launch of KubeCon, the Kubernetes user conference, which started as a grassroot effort that brought together 1,000 individuals between two events in San Francisco and London. The advocacy continues with users globally. There are more than 130 Meetup groups that mention Kubernetes, many of which are helping celebrate Kubernetes’ birthday. To join the celebration, participate at one of the 20 #k8sbday parties worldwide: Austin, Bangalore, Beijing, Boston, Cape Town, Charlotte, Cologne, Geneva, Karlsruhe, Kisumu, Montreal, Portland, Raleigh, Research Triangle, San Francisco, Seattle, Singapore, SF Bay Area, or Washington DC.

The Kubernetes community continues to work to make our project more welcoming and open to our kollaborators. This spring, Kubernetes and KubeCon moved to the Cloud Native Compute Foundation (CNCF), a Linux Foundation Project, to accelerate the collaborative vision outlined only a year ago at OSCON …. lifting a glass to another great year.

– Sarah Novotny, Kubernetes Community Wonk

@Kubernetesio View on Github #kubernetes-users Stack Overflow Download Kubernetes