Before starting we need to talk about Kubernetes:
Kubernetes defines a set of building blocks (“primitives”), which collectively provide mechanisms that deploy, maintain and scale applications based on CPU, memory, or custom metrics. Kubernetes is loosely coupled and extensible to meet different workloads. This extensibility is provided in large part by the Kubernetes API, which is used by internal components as well as extensions and containers that run on Kubernetes. The platform exerts its control over compute and storage resources by defining resources as Objects, which can then be managed as such.
The components of Kubernetes can be divided into those that manage an individual node and those that are part of the control plane:
The Kubernetes master is the main controlling unit of the cluster, managing its workload and directing communication across the system. The Kubernetes control plane consists of various components, each its own process, that can run both on a single master node or on multiple masters supporting high-availability clusters. The various components of the Kubernetes control plane are as follows:
etcd: is a persistent, lightweight, distributed, key-value data store developed by CoreOS that reliably stores the configuration data of the cluster, representing the overall state of the cluster at any given point of time. Just like Apache ZooKeeper, etcd is a system that favors consistency over availability in the event of a network partition (see CAP theorem). This consistency is crucial for correctly scheduling and operating services. The Kubernetes API Server uses etcd’s watch API to monitor the cluster and roll out critical configuration changes or simply restore any divergences of the state of the cluster back to what was declared by the deployer. As an example, if the deployer specified that three instances of a particular pod need to be running, this fact is stored in etcd. If it is found that only two instances are running, this delta will be detected by comparison with etcd data, and Kubernetes will use this to schedule the creation of an additional instance of that pod.
API server: The API server is a key component and serves the Kubernetes API using JSON over HTTP, which provides both the internal and external interface to Kubernetes. The API server processes and validates REST requests and updates state of the API objects in etcd, thereby allowing clients to configure workloads and containers across Worker nodes.
Scheduler: The scheduler is the pluggable component that selects which node an unscheduled pod (the basic entity managed by the scheduler) runs on, based on resource availability. The scheduler tracks resource use on each node to ensure that workload is not scheduled in excess of available resources. For this purpose, the scheduler must know the resource requirements, resource availability, and other user-provided constraints and policy directives such as quality-of-service, affinity/anti-affinity requirements, data locality, and so on. In essence, the scheduler’s role is to match resource “supply” to workload “demand”.
Controller manager: A controller is a reconciliation loop that drives actual cluster state toward the desired cluster state, communicating with the API server to create, update, and delete the resources it manages (pods, service endpoints, etc.). The controller manager is a process that manages a set of core Kubernetes controllers. One kind of controller is a Replication Controller, which handles replication and scaling by running a specified number of copies of a pod across the cluster. It also handles creating replacement pods if the underlying node fails. Other controllers that are part of the core Kubernetes system include a DaemonSet Controller for running exactly one pod on every machine (or some subset of machines), and a Job Controller for running pods that run to completion, e.g. as part of a batch job. The set of pods that a controller manages is determined by label selectors that are part of the controller’s definition
A Node, also known as a Worker or a Minion, is a machine where containers (workloads) are deployed. Every node in the cluster must run a container runtime such as Docker, as well as the below-mentioned components, for communication with the primary for network configuration of these containers.
- Kubelet: Kubelet is responsible for the running state of each node, ensuring that all containers on the node are healthy. It takes care of starting, stopping, and maintaining application containers organized into pods as directed by the control plane.
Kubelet monitors the state of a pod, and if not in the desired state, the pod re-deploys to the same node. Node status is relayed every few seconds via heartbeat messages to the primary. Once the primary detects a node failure, the Replication Controller observes this state change and launches pods on other healthy nodes.
- Kube-proxy: The Kube-proxy is an implementation of a network proxy and a load balancer, and it supports the service abstraction along with other networking operation. It is responsible for routing traffic to the appropriate container based on IP and port number of the incoming request.
- Container runtime: A container resides inside a pod. The container is the lowest level of a micro-service, which holds the running application, libraries, and their dependencies. Containers can be exposed to the world through an external IP address. Kubernetes has supported Docker containers since its first version, and in July 2016 the rkt container engine was added.
The basic scheduling unit in Kubernetes is a pod. A pod is a grouping of containerized components. A pod consists of one or more containers that are guaranteed to be co-located on the same node.
Each pod in Kubernetes is assigned a unique IP address within the cluster, which allows applications to use ports without the risk of conflict. Within the pod, all containers can reference each other on localhost, but a container within one pod has no way of directly addressing another container within another pod; for that, it has to use the Pod IP Address. An application developer should never use the Pod IP Address though, to reference / invoke a capability in another pod, as Pod IP addresses are ephemeral – the specific pod that they are referencing may be assigned to another Pod IP address on restart. Instead, they should use a reference to a Service, which holds a reference to the target pod at the specific Pod IP Address.
A pod can define a volume, such as a local disk directory or a network disk, and expose it to the containers in the pod. Pods can be managed manually through the Kubernetes API, or their management can be delegated to a controller. Such volumes are also the basis for the Kubernetes features of ConfigMaps (to provide access to configuration through the filesystem visible to the container) and Secrets (to provide access to credentials needed to access remote resources securely, by providing those credentials on the filesystem visible only to authorized containers).
A ReplicaSet’s purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
The ReplicaSets can also be said to be a grouping mechanism that lets Kubernetes maintain the number of instances that have been declared for a given pod. The definition of a Replica Set uses a selector, whose evaluation will result in identifying all pods that are associated with it.
A Kubernetes service is a set of pods that work together, such as one tier of a multi-tier application. The set of pods that constitute a service are defined by a label selector. Kubernetes provides two modes of service discovery, using environmental variables or using Kubernetes DNS. Service discovery assigns a stable IP address and DNS name to the service, and load balances traffic in a round-robin manner to network connections of that IP address among the pods matching the selector (even as failures cause the pods to move from machine to machine). By default a service is exposed inside a cluster (e.g., back end pods might be grouped into a service, with requests from the front-end pods load-balanced among them), but a service can also be exposed outside a cluster (e.g., for clients to reach front-end pods).
Filesystems in the Kubernetes container provide ephemeral storage, by default. This means that a restart of the pod will wipe out any data on such containers, and therefore, this form of storage is quite limiting in anything but trivial applications. A Kubernetes Volume provides persistent storage that exists for the lifetime of the pod itself. This storage can also be used as shared disk space for containers within the pod. Volumes are mounted at specific mount points within the container, which are defined by the pod configuration, and cannot mount onto other volumes or link to other volumes. The same volume can be mounted at different points in the filesystem tree by different containers.
Kubernetes provides a partitioning of the resources it manages into non-overlapping sets called namespaces. They are intended for use in environments with many users spread across multiple teams, or projects, or even separating environments like development, test, and production.
Karbon (Container Services)
Nutanix provides the ability to leverage persistent containers on the Nutanix platform using Kubernetes (currently). It was previously possible to run Docker on Nutanix platform; however, data persistence was an issue given the ephemeral nature of containers.
Container technologies like Docker are a different approach to hardware virtualization. With traditional virtualization each VM has its own Operating System (OS) but they share the underlying hardware. Containers, which include the application and all its dependencies, run as isolated processes that share the underlying Operating System (OS) kernel.
The following table shows a simple comparison between VMs and Containers:
|Metric||Virtual Machines (VM)||Containers|
|Virtualization Type||Hardware-level virtualization||OS kernel virtualization|
|Provisioning Speed||Slower (seconds to minutes)||Real-time / fast (us to ms)|
|Performance Overhead||Limited performance||Native performance|
|Security||Fully isolated (more secure)||Process-level isolation (less secure)|
The solution is applicable to the configurations below (list may be incomplete, refer to documentation for a fully supported list):
- Docker 1.13
*As of 4.7, the solution only supports storage integration with Docker based containers. However, any other container system can run as a VM on the Nutanix platform.
Container Services Constructs
The following entities compose Karbon Container Services:
- Nutanix Docker Machine Driver: Handles Docker container host provisioning via Docker Machine and the AOS Image Service
- Nutanix Docker Volume Plugin: Responsible for interfacing with AOS Volumes to create, mount, format and attach volumes to the desired container
The following entities compose Docker (note: not all are required):
- Docker Image: The basis and image for a container
- Docker Registry: Holding space for Docker Images
- Docker Hub: Online container marketplace (public Docker Registry)
- Docker File: Text file describing how to construct the Docker image
- Docker Container: Running instantiation of a Docker Image
- Docker Engine: Creates, ships and runs Docker containers
- Docker Swarm: Docker host clustering / scheduling platform
- Docker Daemon: Handles requests from Docker Client and does heavy lifting of building, running and distributing containers
- Docker Store: Marketplace for trusted and enterprise ready containers
The Nutanix solution currently leverages Docker Engine running in VMs which are created using Docker Machine. These machines can run in conjunction with normal VMs on the platform.
Nutanix has developed a Docker Volume Plugin which will create, format and attach a volume to container(s) using the AOS Volumes feature. This allows the data to persist as a container is power cycled / moved.
Data persistence is achieved by using the Nutanix Volume Plugin which will leverage AOS Volumes to attach a volume to the host / container:
In order for Container Services to be used the following are necessary:
- Nutanix cluster must be AOS 4.7 or later
- A CentOS 7.0+ or a Rhel 7.2+ OS image with the iscsi-initiator-utils package installed must be downloaded and exist as an image in the AOS Image Service
- The Nutanix Data Services IP must be configured
- Docker Toolbox must be installed on the client machine used for configuration
- Nutanix Docker Machine Driver must be in client’s PATH
Docker Host Creation
Assuming all pre-requisites have been met the first step is to provision the Nutanix Docker Hosts using Docker Machine:
docker-machine -D create -d nutanix \
–nutanix-username <PRISM_USER> –nutanix-password <PRISM_PASSWORD> \
–nutanix-endpoint <CLUSTER_IP>:9440 –nutanix-vm-image <DOCKER_IMAGE_NAME> \
–nutanix-vm-network <NETWORK_NAME> \
–nutanix-vm-cores <NUM_CPU> –nutanix-vm-mem <MEM_MB> \
The following figure shows a high-level overview of the backend workflow:
The next step is to SSH into the newly provisioned Docker Host(s) via docker-machine ssh:
docker-machine ssh <DOCKER_HOST_NAME>
To install the Nutanix Docker Volume Plugin run:docker plugin install ntnx/nutanix_volume_plugin PRISM_IP= DATASERVICES_IP= PRISM_PASSWORD= PRISM_USERNAME= DEFAULT_CONTAINER= –alias nutanix
After that runs you should now see the plugin enabled:
[root@DOCKER-NTNX-00 ~]# docker plugin ls ID Name Description Enabled 37fba568078d nutanix:latest Nutanix volume plugin for docker true
Docker Container Creation
Once the Nutanix Docker Host(s) have been deployed and the volume plugin has been enabled, you can provision containers with persistent storage.
A volume using the AOS Volumes can be created using the typical Docker volume command structure and specifying the Nutanix volume driver. Example usage below:
docker volume create \
<VOLUME_NAME> –driver nutanix
docker volume create PGDataVol –driver nutanix
To following command structure can be used to create a container using the created volume. Example usage below:
docker run -d –name <CONTAINER_NAME> \
-p <START_PORT:END_PORT> –volume-driver nutanix \
-v <VOL_NAME:VOL_MOUNT_POINT> <DOCKER_IMAGE_NAME>
docker run -d –name postgresexample -p 5433:5433 –volume-driver nutanix -v PGDataVol:/var/lib/postgresql/data postgres:latest
The following figure shows a high-level overview of the backend workflow: