Whats new with VMware PKS v1.3

Last week VMware announced release of PKS 1.3 , which has some of the much awaited features  like enhance multi-cloud support, additional networking and security options, ease of management and operations. Few features i am going to discusses here:

Microsoft Azure support as IAAS

VMware PKS already support VMware vSphere , Google Cloud Platform and Amazon EC2 as supported platform for PKS deployment , in this new VMware PKS 1.3 release introduces support for Microsoft Azure. so now you can deploy production grade kubernetes from a single console to your choice of IAAS. Here is the list of features supported by PKS on different IAAS.


Kubernetes 1.12 Support

if you see kubernetes 1.12 release notes around 60+ enhancement and features has been introduced so it make all the sense to upgrade to Kubernetes 1.12.4.

Backup and Recovery of Kubernetes Clusters

This release supports backup and recovery of Kubernetes clusters when they are deployed in a single master mode. You can recover Kubernetes clusters and stateless workloads by using the BOSH Backup and Restore (BBR) toolset.

Smoke Tests

Smoke tests let you assess the impact of an upgrade before actually upgrading running clusters.The smoke tests create an ephemeral Kubernetes cluster after each upgrade of VMware PKS, but before applying upgrades to running Kubernetes clusters. This ensures that a test cluster can be provisioned and basic Kubernetes functionality validated with the upgraded software before applying the upgrade to the running clusters. Upon successful completion of the smoke test, the test cluster is deprovisioned to reduce resource consumption, and upgrades then proceed on the running clusters.


Support for Multiple Tier 0 and Selectable Tier 0 Routers

As you know NSX-T Tier 0 edges connects the physical and virtual networks. A single VMware NSX-T instance can support multiple Tier 0 routers. By deploying Kubernetes clusters across multiple Tier 0 routers service providers get better network isolation between tenants and additionally  service providers can use multiple Tier 0 routers which allows them to use overlapping IP address ranges, providing greater autonomy to tenants in choosing IP address ranges for their services.

With this VMware PKS 1.3 release, now provider/customer can specify a Tier 0 router using the network profile when you create a cluster (pks create cluster). The Kubernetes clusters and all networking objects that are created or configured as part of the cluster such as a load balancer, Tier 1 routers, and SNAT rules are created on this Tier 0 router. Given that a single Tier 0 router can support a finite set of such networking objects, use of multiple Tier 0 routers allows much greater scale.


Support for Larger Load Balancers

Previous versions of VMware PKS, we can only specify small or medium load balancers. now with VMware PKS 1.3 , it adds support for large load balancers. large load balancers provides higher scale in areas like number of services, number of backend pods per service, and throughput per service.

Routable CIDR blocks for Pod Networks

Routable IP addresses assigned to pods provide traceability of workloads making egress requests. also routable IP addresses provide direct ingress access to pods for some of the specialized workloads. With VMware PKS 1.3, at the time of Kubernetes cluster creation, you can specify whether you need the pods to be routable or non-routable (NAT’ed) by using the network profile.

Specific IP Address Range and Subnet Size for Pod IP Addresses

VMware PKS 1.3 allow you to override the global pod IP address block configured for VMware PKS with a custom IP address block range along with a custom subnet size. This feature helps in where your global IP address range for pods is reaching capacity and you need to deploy new Kubernetes clusters or you need a larger or smaller size subnet for each namespace being created within a cluster.

Multiple VMware PKS Control Planes across a Single NSX-T Instance

With this new release, multiple instances of VMware PKS can be deployed on a single shared NSX-T instance. Each instance of the VMware PKS control plane can be deployed on a dedicated NSX-T Tier 0 router to provide complete end-to-end isolation. With this feature, users can dedicate separate VMware PKS instances to their development, staging, and production environments or cloud provider can offer dedicated PKS as a Service to their customer.


Harbor 1.7

Harbor is an VMware’s contribution to open source community , Harbor is open source cloud native registry that stores, signs, and scans container images for vulnerabilities. Harbor solves common challenges by delivering trust, compliance, performance, and interoperability. with PKS 1.3 , Harbor 1.7 has been shipped and offers below enhancements like:

  • Support deploy Harbor with Helm Chart, enables the user to have high availability of Harbor services.
  • Support on-demand Garbage Collection, enables the admin to configure run docker registry garbage collection manually or automatically with a cron schedule.
  • Support Image Retag, enables the user to tag image to different repositories and projects, this is particularly useful in cases when images need to be retagged programmatically in a CI pipeline.
  • Support Image Build History, makes it easy to see the contents of a container image.
  • Improve user experience of Helm Chart Repository:
    • Chart searching included in the global search results
    • Show chart versions total number in the chart list
    • Mark labels to helm charts
    • The chart can be deleted by deleting all the versions under it

Monitoring with vRealize Operation Manager

With the integration of cAdvisor, vRops can be used to monitor entire cloud native infrastructure with the help of vRops Management Pack for Containers.


Sink resources include both pod logs as well as events from the Kubernetes API. These events are combined in a shared format that provides operators with a robust set of filtering and monitoring options. Now inbuilt Support for creating sink resources with the PKS Command Line Interface.

Workers Scale up and down

with this version kubernetes cluster’s worker node can easily be scaled up and down with a single command like:


These are the some of the important features which i like to share , for details feature list check Release note here.


How to Prepare for Certified Kubernetes Administration (CKA) Exam

Finally my last two months of preparation for the CKA exam is paid off , when i got this:


so after getting certified , i got lots of message from friends , colleagues around how to prepare for the exam , so this post is all about how to prepare for the exam , one of my friend in his blog (Blog Link) shared that this is manageable exam not as tough as people talk about and i totally agree with him that it is achievable with lots of practice and hard work on understanding the product.

About CKA Exam

The Certified Kubernetes Administrator (CKA) program was created by the Cloud Native Computing Foundation (CNCF), in collaboration with The Linux Foundation, to help develop the Kubernetes ecosystem. Kubernetes is one of the highest velocity open source projects and is exploding. CNCF offers a certification program that allows users to demonstrate their competence in a hands-on, command-line environment. The purpose of the Certified Kubernetes Administrator (CKA) program is to provide assurance that CKAs have the skills, knowledge, and competency to perform the responsibilities of Kubernetes administrators.

CKA is an online, proctored, performance-based test that requires solving multiple issues from a command line.this  certification focuses on the skills required to be a successful Kubernetes Administrator in industry today. This includes these general domains and their weights on the exam:

  • Application Lifecycle Management 8%
  • Installation, Configuration & Validation 12%
  • Kubernetes Core Concepts 19%
  • Kubernetes Networking 11%
  • Kubernetes Scheduling 5%
  • Kubernetes Security 12%
  • Kubernetes Cluster Maintenance 11%
  • Kubernetes Logging / Monitoring 5%
  • Kubernetes Storage 7%
  • Kubernetes Troubleshooting 10%

How to Prepare – My Way of Preparation 

First to prepare for the exam , ensure that you deploy a LAB , without LAB and practice you can not pass the exam no way!!. for Lab i followed below:

You need three virtual machines , so deploy a Lab with three nodes, can be easily setup on a laptop/desktop with virtulization software like VMware Workstation or Virtual Box etc..

  • Deploy Kubernetes using “Kubernetes the Hard Way” , this will help you understand communication between nodes and what all components makes kubernetes.
  • Understand Kubernetes Architecture Components in Detail.

you can also choose above or can deploy on one of your favourite public or private cloud environments. once Lab is ready , i would suggest kubernetes.io is the best resource for preparation of this exam, since in exam you are allowed to open “kubernetes.io” , so preparation from this website is going to help you in the exam also. I prepared using “kubernetes.io” website. all though i  would suggest that you follow each and every page of this website and here are few links on which i focused more for the exam:

Time management is the Key

time management is very important during this exam. CKA exam is  3 hour exam, its is very important to be careful and to pace yourself on the questions so as not get stuck on one question for too long. The exam environment (i.e. ssh session) runs in Google Chrome with a specific extension and can be laggy and slow at times. In addition, I noticed myself spending way too much time trying to select text and running into various UI issues , be prepared and have lots of practice in your labs. there is no way you can successfully clear the exam if not practicing a lot.

The day of the exam

Here are the few Tips for the Exam:

  • Before the exam, the examiner will ask you to clean your desk completely.

  • The place should be quiet because even if you work with headphones you will not be allowed to use them and for next three hours no body will be allowed in the room.

  • The examiner will ask you to see all the room, even under the desk.
  • The examiner will not talk to you by voice, only by chat. He/she will hear you because you will need to share the screen and micro phone.
  • The exam happens in a Chrome tab, the left side will show you the questions and the percentage marks of the questions.The right side is for the shell, I tried to use tmux  ( i would suggest not to use ) there, but it was pretty difficult inside a browser terminal. You can also have a popup with notes.
  • You can only open a tab with kubernetes.io and use its search box, no Google or anything .
  • It’s ok to request a brake, but be very careful because the time doesn’t stop.
  • You have three hours to finish the exam, if you get blocked it’s better to skip that question for now and retake it later.

So, that’s it from me. If you are interested in Kubernetes and you work with it go ahead and prepare for it and once you are certified , you will be proud on you because this certification really does carry the weight it implies and the real-world, live cluster examination is a nail-biter. i like the way CNCF measuring this competency using a Live Lab exam than a multiple choice exam. Best of Luck!!!


VMware PKS, NSX-T & Kubernetes Networking & Security explained

In continuation of my last post on VMware PKS and NSX-T explaining on getting started with VMware PKS and NSX-T (Getting Started with VMware PKS & NSX-T) , here is next one explaining around behind the scene NSX-T automation for Kubernetes by VMware NSX CNI plugin and PKS.

NSX-T address all the K8s networking functions, load balancing , IPAM , Routing and Firewalling needs, it supports complete automation and dynamic provisioning of network objects required for K8s and it’s workload and this is what i am going to uncover in this blog post.

Other features  like it has support for different topology choice for POD and NODE networks (NAT or NO-NAT) , it supports network security policies for Kubernetes , Clusters , Namespaces and individual services and it also supports network traceability/visibility using NSX-T in built operational tools for kubernetes.

I will be covering the deployment procedure of PKS after some time but just want to let explain that what happens on NSX-T side when you run “#pks create cluster” on PKS command line..and then when you create K8s Namespaces and PODs

pks create cluster

So when you run #pks create cluster with some argument , it goes to vCenter and deploys Kubernetes Master and Worker VMs based on specification you have chosen during deployment and on NSX-T side a new logical switch get created for these vms and get connected to these vms. (in this Example one K8s Master and 2 Nodes has been deployed) , along with logical switch , a Tier-1 cluster router get created which get connected to your organisation’s Tier-0 router.

K8s Master and Node Logical Switches


K8s Cluster connectivity towards Tier-0


Kubernetes Namespaces and NSX-T

Now if K8s cluster deployed successfully, Kubernetes cluster by default deploys three name space:

  • Default – The default namespace for objects with no other namespace.
  • kube-system – The namespace for objects created by the Kubernetes system.
  • kube-public – The namespace is created automatically and readable by all users (including those not authenticated). This namespace is mostly reserved for cluster usage, in case that some resources should be visible and readable publicly throughout the whole cluster. The public aspect of this namespace is only a convention, not a requirement.

for each default namespace, PKS automatically deploys and configures NSX-T Logical Switchs and each logical switch will have its own Tier-1 router connected to Tier-0.


pks-infrastructure Namespace and NSX-T

in the above figure you can clearly see “default”,”kube-public” and “kube-system” Logical Switches. there is another Logical Switch “pks-infrastructure” get created which is pks specific namespace and running pks related stuff like NSX-T CNI. “pks-infrastructure” is running NSX-NCP CNI plugin to integrate NSX-T with kubernetes.


kube-system Namespace & NSX-T

Typically, this runs pod like heapster , kube-dns , kubernetes-dashboard,  monitoring db , telemetry agent and stuff like ingresses and so on if you deploy so.


on NSX-T side as explained earlier a Logical switch get created for this Namespace and for each system POD a logical port get created by PKS on NSX-T.


Default Namespace & NSX-T

This is the cluster’s default namespace which is used for holding the default set of pods, services, and deployments used by the cluster. so when you deploy a POD without creating/specifying a new name space , “default Namespace” becomes default container to hold these pods and as explained earlier this also has its own NSX-T logical switch with a uplink port to Tier-1 router.


now when you deploy a Kubernetes pod without a new namespace , since that POD will be part of “Default Namespace”, PKS create a NSX-T logical port on the default logical switch.   let’s create a simple POD:


let’s go back to NSX-T’s  “Default Namespace” logical switch:


as you can see a new logical port has been created on default logical switch.

New Namespace & NSX-T

Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces. in simple terms Namespaces are like org vdc in vCD and Kubernetes best practice is to arrange PODs in namespaces. so when we create a new Namespace , what happens in NSX-T ?

i have created a new Namespace called “demo”.


if you observe below images left image showing default switches and right image is showing logical switches after creation of new Namespace.

and as you can see a new Logical switch has been created for new Namespace.

if you are creating PODs in default Namespace then all the pods get attached to default logical switch and if you are creating Namespace ( which is K8s best-practice) then a new logical switch get created and any POD which is getting deployed in this namespace will be part of its NSX-T logical switch and this new logical switch will also have its own Tier-1 router connecting to Tier-0 router.


Expose PODs to Outer World

in this example we deployed POD and get the internal network connectivity but internal only connectivity is not going to give access to this web server to outer world and this is default forbidden in kubernetes , so we need to expose this deployment using load balancer to the public interface on the specific port. let’s do that:



Lets browse this App using EXTERNAL-IP as you might know CLUSTER-IP is internal ip.


Kubernetes Cluster and NSX-T Load Balancer

As above when we expose this deployment using service on NSX-T there is a cluster load balancer get deployed automatically when we create cluster , on this load balancer NSX-CNI go ahead and add pod to virtual servers under a new load balancer VIP.


if we drill down to pool members of this VIP , we will see our kubernetes pod ep ips.


behind the scene when you deploy a cluster a LB logical switch and LB Tier-1 router which is having logical connectivity to the Load Balancer and Tier-0 Router , so that you can access the deployment externally.


This is what your Tier-0 will look like, having connectivity to all the Tier-1 and Tier-1 is having connected to Namespace logical switches.


these all logical switchs, Tier-1 router , Tier-0 router creations , their connectivity , LB creations etc all has been done automatically by NSX-T container (CNI) plugin and PKS. i was really thrilled when i tried this first time and it was so simple , if you understood the concept.

Kubernetes and Micro-segmentation

The NSX-T container plugin helps to exposure of container “Pods”as NSX-T logical switch /ports and because of this we can easily implement micro-segmentation rules. once “Pods ” expose to the NSX ecosystem, we can use the same approach we have with Virtual Machines for implementing micro segmentation and other security measures.


or you can use security groups based on tags to achieve micro segmentation.



This is what i have tried in this post to explain what happen behind the scene on NSX-T networking stack when you deploy and expose your applications on kubernetes and how we can bring in proven NSX-T based micro-segmentation.

Enjoy learning PKS,NSX-T and Kubernetes one of the best combination for Day-1 and Day-2 operation of kubernetes 🙂 and feel free to comment and suggestions.




Getting Started with VMware PKS & NSX-T

VMware Pivotal Container Service (PKS) provides a Kubernetes based container service for deploying and operating modern applications across private and public clouds. basically it is Managed kubernetes for multiple kubernetes cluster and aimed at Day 2 operations. K8S is designed with focus on high availability, auto-scaling and supports rolling upgrades.

PKS integrates with VMware NSX-T for advanced container networking, including micro-segmentation, ingress controller, load balancing, and security policy and also by using VMware Harbor, PKS secures container images through vulnerability scanning, image signing, and auditing.A PKS deployment consists of multiple VM instances classified into 2 different categories:

PKS Management Plane –

PKS management plane consist of below VMs:


  • PCF Ops Manager

Pivotal Operations Manager (Ops Manager) is a graphical interface for deploying and managing Pivotal BOSH, PKS Control Plane, and VMware Harbor application tiles. Ops Manager also provides a programmatic interface for performing lifecycle management of Ops Manager and application tiles.

  • VMware BOSH Director

Pivotal BOSH is an open-source tool for release engineering for the deployment and lifecycle management of large distributed systems. By using BOSH, developers can version, package, and deploy software in a consistent and reproducible manner.

BOSH is the first component, that’s installed by Ops Manager. BOSH is a primary PKS tile.BOSH was originally designed to deploy open source Cloud Foundry.Internally BOSH has below components:

  1. Director: This holds the role of core orchestration engine controls the provisioning of vms , required softwares and service life cycle events.
  2. Blobstore: The Blobstore stores the source forms of releases and the compiled images of releases. An operator uploads a release using the CLI, and the Director inserts the release into the Blobstore. When you deploy a release, BOSH orchestrates the compilation of packages and stores the result in the Blobstore.
  3. Postgres DB: Bosh director uses a postgres database to store information about the desired state of deployment including information about stemcells, releases and deployments. DB is internal to the Director VM.
  • Pivotal Container Service (PKS Control Plane)

PKS Control Plane is the self service API for on-demand deployment and Life cycle management of K8s clusters. API submit the request to BOSH which automates the creation , deletion and updates of kubernetes clusters.

  • VMware Harbor

VMwware harbor is an open-source, enterprise-class container registry service that stores and distributes container images in a private, on-premises registry. In addition to providing Role-Based Access Control (RBAC), Lightweight Directory Access Protocol (LDAP), and Active Directory (AD) support, Harbor provides container image vulnerability scanning, policy-based image replication, notary and auditing service.

PKS Data Plane

  • Kubernetes

K8s is an open-source container orchestration framework. Containers package applications and their dependencies in container images. A container image is a distributable artifact that provides portability across multiple environments, streamlining the development and deployment of software. Kubernetes orchestrates these containers to manage and automate resource use, failure handling, availability, configuration, scalability, and desired state of the application.

Integration with NSX-T

VMware NSX-T helps simplify networking and security for Kubernetes by automating the implementation of network policies, network object creation, network isolation, and micro-segmentation. NSX-T also provides flexible network topology choices and end-to-end network visibility.

PKS integrates with VMware NSX-T for production-grade container networking and security. A new capability introduced in NSX-T 2.2 allows you to perform workload SSL termination using Load Balancing services. PKS can leverage this capability to provide better security and workload protection.

Major benefit of using NSX-T with PKS and K8s is automation that is  dynamic provisioning and association of network objects for unified VM and pod networking. The automation includes the following:

  • On-demand provisioning of routers and logical switches for each Kubernetes cluster
  • Allocation of a unique IP address segment per logical switch
  • Automatic creation of SNAT rules for external connectivity
  • Dynamic assignment of IP addresses from an IPAM IP block for each pod
  • On-demand creation of load balancers and associated virtual servers for HTTPS and HTTP
  • Automatic creation of routers and logical switches per Kubernetes namespace, which can isolate environments for production, development, and test.

PKS Networking

  • PKS Management Network

This network will be used to deploy PKS Management components. this could be a dvSwitch or NSX-T logical switch since in my Lab i will be using no NAT topologies with virtual switch. in my Lab i will be using dvs with network segment of 192.168.110.x/24.

  • Kubernetes Node Network

This Network will be used for kubernetes management nodes. it is allocated to master and worker nodes. these nodes embed Node Agent to monitor the liveness of the cluster.

  • Kubernetes Pod Network

This network is used when an application will be deployed on to a new kubernetes namespace. A /24 network is taken from IP Block and is allocated to a specific Kubernetes namespace allowing for network isolation and policies to be applied between name spaces. The NSX-T Container Plugin automatically creates the NSX-T logical switch and Tier-1 router for each name spaces.

  • Load Balancer and NAT Subnet

This network pool, also known as the Floating IP Pool, provides IP addresses for load balancing and NAT services which are required as a part of an application deployment in Kubernetes.

PKS deployment Network Topologies – Refer Here

PKS Deployment Planning

Before you install PKS on vSphere with NSX-T integration, you must prepare your vSphere and NSX-T environment and ensure vCenter, NSX-T components, and ESXi hosts must be able to communicate with each other, ensure we have adequate resources.

  • PKS Management VM Sizing

When you size the vSphere resources, consider the compute and storage requirements for each PKS management component.

VM Name vCPU Memory Storage No. of VMs
Ops Manager 1 8 160 GB 1
BOSH 2 8 103 GB 1
PKS Control VM 2 8 29 GB 1
Compilation VMs 4 4 10 GB 4
Client VM 1 2 8 GB 1
VMware Harbor 2 8 169 GB 1

Compilation vms get created when an initial K8s cluster is deployed, software packages are compiled and four additional service VMs are automatically deployed as a process as a single task and these vms get deleted once compilation process completes. To manage and configure PKS,  PKS and Kubernetes CLI command-line utilities are required , these utilities can be installed locally on a workstation called Client VM.

  • Plan your CIDR block

Before you install PKS on vSphere with NSX-T, you should plan for the CIDRs and IP blocks that you are using in your deployment as explained above. these are the CIDR blocks that we need to plan:

  • Pods IP Block
  • Nodes IP Block

Below are the CIDR blocks that you can’t use because:

The Docker daemon on the Kubernetes worker node uses the subnet in the following CIDR range:


If PKS is deployed with Harbor, Harbor uses the following CIDR ranges for its internal Docker bridges:


Each Kubernetes cluster uses the following subnet for Kubernetes services,Do not use the following IP block for the Nodes IP Block:


In this blog post series i will be deploying NO-NAT topology and will walk you through step by step process of PKS deployment with NSX-T integration.

Next post on this series is VMware PKS, NSX-T & Kubernetes Networking & Security explained , this will help you understand what happens behind the scene in networking and security stack when PKS and NSX-T deploys kubernetes and its networking stack.



Upgrade NSX-T 2.1 to NSX-T 2.3

I am working on PKS deployment and will soon sharing my deployment procedure on PKS but before proceeding with PKS deployment, i need to upgrade my NSX-T lab environment to support latest PKS as per below compatibility matrix.

PKS Version Compatible NSX-T Versions Compatible Ops Manager Versions
v1.2 v2.2, v2.3 v2.2.2+, v2.3.1+
v1.1.6 v2.1, v2.2 v2.1.x, 2.2.x
v1.1.5 v2.1, v2.2 v2.1.x, v2.2.x
v1.1.4 v2.1 v2.1.x, 2.2.x
v1.1.3 v2.1 v2.1.0 – 2.1.6
v1.1.2 v2.1 v2.1.x, 2.2.x
v1.1.1 v2.1 – Advanced Edition v2.1.0 – 2.1.6

In this post i will be covering the procedure to upgrade NSX-T 2.1 to NSX-T 2.3.

So Before proceeding for upgrade , lets check the health of current deployment which is very important because if we start upgrading the environment and once upgrade is completed and after upgrade if some thing is not working , we will not come to know whether before upgrade it was working or not , so lets get in to validation of health and version checks.

Validate Current Version Components Health

First thing to check the Management Cluster and Controller connectivity and ensure they are up.

Next is to Validate host deployment status and connectivity.


Check the Edge health


Lets check the Transport Node Health


Upgrade Procedure

Now Download the upgrade bundle


Go to NSX Manager and browse to Upgrade


Upload the downloaded upgrade bundle file in NSX Manager


Since upgrade bundle is very big in size , it will take lots of time in upload, extraction and verification.Once the package has uploaded, click to “BEGIN UPGRADE”.


The upgrade coordinator will then check the install for any potential issues. In my environment there is one warnings for the Edge that the connectivity is degraded – this is because of i have disconnected 4 th nic which is safe to ignore, so when you are doing for your environment , please access all the warnings and take necessary actions before proceeding with upgrade.


Click Next will take you to view the Hosts Upgrade page. Here you can define the order and method of upgrade for each host, and define host groups to control the order of upgrade. I’ve gone with the defaults, serial (one at a time) upgrades over the parallel because i have two hosts in each clusters.

Click START to begin the upgrade, and the hosts will be put in maintenance mode, then upgraded and rebooted if necessary. ensure you need to have DRS enabled and the VMs on the hosts must be able to vMotion off of the host being put in maintenance mode. Once the host has upgraded, and the Management Plane Agent has reported back to the Manager, the Upgrade Coordinator will move on to the next host in the group.


Once the hosts are upgraded, click next to move to the Edge Upgrade page. Edge Clusters can be upgraded parallel if you have multiple edge clusters, but the Edges which has formed the Edge Clusters and upgraded serially to ensure connectivity is maintained. In my lab , i have a single Edge Cluster with two Edge VMs, so this will be upgraded one Edge at a time.Click on the “START” to start the edge upgrade process.



Once the Edge Cluster has been upgraded successfully, click NEXT to move to the Controller Node Upgrade Page. here you can’t change the sequence of upgrade of the controllers, controllers are done in parallel by default. (in my Lab i am running a single controller because of resource constraint but in production you will see three controllers deployed in a cluster). Click on “START” to begin the upgrade process.


Once the controller upgrade has been completed, click NEXT to move to the NSX Manager upgrade page. The NSX Manager will become unavailable for about 5 minutes after you click START and it might take 15 to 20 minutes to upgrade the manager.


Once the Manager upgrade has completed. review the upgrade cycle.


you can re-validate the installation as we did at the start of the upgrade, checking that we have all the green lights on, and the version of components have increased.

vCloud Availability Cloud-to-Cloud Design and Deploy Guide

a.pngvCloud Architecture Toolkit white paper that I have written now has been  published on the cloudsolutions.vmware.com website – this design and deploy guide helps cloud providers to design and deploy vCloud Availability Cloud-to-Cloud DR solution.  This guide is based on real life example and helps cloud providers to successfully plan , design and deploy vCloud Availability Cloud-to-Cloud DR based on version 1.5.

White Paper Download Link

This white paper includes the following chapters to plan your deployment:

  • Introduction
  •  Use Cases
  • vCloud Availability Cloud-to-Cloud DR Components
  • vCloud Availability Cloud-to-Cloud DR Node Types and Sizing
  • vCloud Availability Cloud-to-Cloud DR Deployment Requirements
  • vCloud Availability Cloud-to-Cloud DR Architecture Design
  • Physical Design
  • Certificate
  • Network Communication and Firewalls
  • Deployment
  • Replication Policy
  • Services Management Interface Addresses
  • Log Files
  • Configuration Files
  •  References

I hope this helps in your plan, design  and deployment of vCloud Availability Cloud-to-Cloud DR version 1.5. please feel free to share the feedback to make this white paper more effective and helpful.

What is VMware vCloud Availability Cloud-to-Cloud DR

The VMware vCloud Availability for Cloud-to-Cloud DR solution extends existing hybrid cloud offerings of VMware Cloud Providers™ on top of VMware vCloud Director with disaster recovery and application continuity between vCloud Director Virtual Data Centers or Cloud Environments. vCloud Availability Cloud-to-Cloud brings in a much-needed in providing native Disaster Recovery between vCloud Director instances. VMware vCloud Availability for Cloud-to-Cloud DR can help VMware Cloud Providers enable further monetization of existing VMware vCloud Director multi-tenant cloud environments with DR services, including replication and failover capabilities for workloads at both VM and vApp level.



  • vCloud Availability Cloud-to-Cloud DR has capability of each deployment to serve as both source and recovery sites. There are no dedicated source and destination sites. Same set of appliances works as Sources or Destination.
  • Replication and recovery of vApps (VMs) between organization Virtual Data Centers (orgVDC) as well as two instances of vCloud Director for migration, DR, and planned migration.
  • it offers complete self-serviceability for the provider and tenant administrator via a unified HTML5 portal that can be used alongside vCloud Director. Replication, migration, and failover can be managed completely by the tenant or provided as a managed service by the provider.
  • Symmetrical replication flow that can be started from either the source or the recovery vCD instance.
  • Built-in encryption or encryption and compression of replication traffic.
  • Enhanced control with white-listing of DR-enabled vCloud Director organizations, enforcement of specific min Recovery Point Objective (RPO) at an organization (org) level, maximum snapshots per org and max replications per tenant.
  • Provide Non-disruptive, on-demand disaster recovery testing.
  • Policies that allow service provider administrators to control the following system attributes for one or multiple vCloud Director organizations:
    • Limit the number of replications at the vCloud Director organization level
    • Limit the minimum Recovery Point Objective (RPO)
    • Limit number of retained snapshots per VM replication
    • Limit the total number of VM replications

Use Cases:

Though the most obvious use case for VMware vCloud Availability Cloud-to-Cloud DR is disaster recovery from one cloud availability zone to another cloud availability zone, it can handle a number of different use cases and provide significant capability and flexibility to service providers. For all use cases and situations, VMware vCloud Availability Cloud-to-Cloud DR supports non-disruptive testing of protected cloud workload in network and storage isolated environments. This provides the ability to test disaster recovery, disaster avoidance, or planned migrations as frequently as desired to ensure confidence in the configuration and operation of recovery on cloud. The use cases are as below:


A tenant or provider administrator can utilize C2C to migrate workloads from one organization VDC to another with minimal disruption from a self-service portal. End benefit is re-organizing workloads from an easy to use workflow.

  • Easy to use workflow mechanism
  • Organize workloads in different orgVDCs
  • Ability to migrate between vCD instances or within the same vCD instance

Disaster Recovery:

A service provider has multiple sites with vCD based multi-tenant environment. Customer like to do DR from one cloud provider site to another cloud site. Disaster recovery or a planned/unplanned failover is what VMware vCloud Availability Cloud-to-Cloud DR was specifically designed to accomplish for cloud providers. This helps providers and customers to achieve:

  • Fastest RTO
  • Recover from unexpected failure
  • Full or partial site recovery

Disaster Avoidance:

Preventive failover is another common use case for VMware vCloud Availability Cloud-to-Cloud DR. This can be anything from an oncoming storm to the threat of power issues.

VMware vCloud Availability Cloud-to-Cloud DR allows for the graceful shutdown of virtual machines at the protected site, full replication of data, and startup of virtual machines and applications at the recovery site ensuring app-consistency and zero data loss. Solution helps Providers and Customer in recovering from:

  • Anticipate outages
  • Preventive failover
  • Graceful shutdown ensuring no data loss

Upgrade and Patch Testing:

The VMware vCloud Availability Cloud-to-Cloud DR test environment provides a perfect location for conducting operating system and application upgrade and patch testing. Test environments are complete copies of production environments configured in an isolated network segment which ensures that testing is as realistic as possible while at the same time not impacting production workloads or replication.

This will give you basic idea of what vCloud Availability Clout-to-Cloud DR solves for the providers.


Features of VMware Cloud on AWS

VMware Cloud on AWS enables operational consistency for customers of all sizes whether their workloads operate on-premises or in the public cloud. here i would be covering some of the great feature which i like most and will give you opportunity to understand and explore more..

Automated Cluster Remediation:

Let’s suppose in our on-prem environment we have 8 node cluster , one of the node goes down because of hardware failure , that’s where our struggle start to get required hardware from hardware vendor etc.. but most importantly we loose one host in our HA cluster and if this cluster was highly utilised then your application VM might start facing resource crunch and in my experience this might go for at least 3-4 days by the time you get hardware fix and put back the host in to the cluster.

Now see the power of VMware Cloud on AWS – failed hosts in a VMware SDDC are automatically detected by VMware and replaced with healthy hosts and process runs as below:

  • VMware Team detects Host failure or problem identified
  • New Host will be added in to the cluster and data from problematic host will be either rebuild or migrated.
  • Old host evacuated from the cluster and replaced by new host.

Scale as per your convenience:

One of the major challenges in traditional data centers is finding the right balance between hardware and workload utilization.

VMware Cloud on AWS enables you to quickly scale up to ensure that you always have enough capacity to run your workloads during volume spikes and quickly scale down to ensure that you are not paying for hardware that is not being used. This feature provides higher availability with lower overall costs.


you have option to add and remove cluster as well as Host or you can enable Elastic Distributed Resources Scheduler (EDRS) , which is a policy-based solution that automatically scales a vSphere Cluster in VMware Cloud on AWS based on utilization. EDRS monitors CPU, memory, and storage resources for scaling operations. EDRS monitors the vSphere cluster continuously, and each 5 minutes EDRS runs the algorithm to determine if scale-out or scale-in operations is required.

vCenter Hybrid Linked Mode:

Hybrid Linked Mode allows you to link your VMware Cloud on AWS vCenter Server instance with an on-premises vCenter Single Sign-On domain and If you link your cloud vCenter Server to a domain that contains multiple vCenter Server instances linked using Enhanced Linked Mode, all of those instances are linked to your cloud SDDC.

You have two options for configuring Hybrid Linked Mode. You can use only one of these options at a time.

  • You can install the Cloud Gateway Appliance and use it to link from your on-premises data center to your cloud SDDC. In this case, Active Directory groups are mapped from your on-premises environment to the cloud.

  • you can link from your cloud SDDC to your on-premises data center. In this case, you must add Active Directory as an identity source to the cloud vCenter Server.

Using Hybrid Linked Mode, you can:

  • View and manage the inventories of both your on-premises and VMware Cloud on AWS data centers from a single vSphere Client interface, accessed using your on-premises credentials.

  • Migrate workloads between your on-premises data center and cloud SDDC.

  • Share tags and tag categories across vCenter Server instances.

Well Defined Separation of Duty for VMware and Customer Teams:

Amazon in discussion with VMware performs the following  tasks:

Hardware refresh , failed component replacement , bios upgrade and underline firmware patching will be done by AWS based on VMware compatibility list and this allow customer not to worry about this tedious exercise, compatibility issues and dedicated skill resources.

VMware Experts perform the following maintenance tasks:

  • Backup and restore of VMware appliances and infrastructure  like vCenter, NSX Manager,PSC etc…
  • Patching VMware Cloud on AWS components like vSphere, ESXi drivers, vSAN, NSX, SDDC console etc…this helps customers to just focus of App VM and their business , leave their virtual infrastructure maintenance to experts.
  • Providing VMware Tools patches through vSphere and will be available to your virtual machines , now customer is free to
  • Host and infrastructure VM monitoring

Customer’s Administrator are responsible for the following tasks:

  • Customer administrator manages backup and restoration of your workload VMs and applications.
  • Patching inside VM like guest OS, applications etc..
  • Upgrading VMware Tools installed on workload VMs
  • Monitoring of the your workload VMs and applications
  • Keeping VM templates and content library files updated so that new vms are deployed with latest/updated/patched updated master templates.
  • Manage and monitoring user access and monitoring of resource utilization and charges of integrated AWS if consuming.

Outages, Scheduled Maintenance, and Health Service Information:

VMware has hosted a separate website to display the current status of VMware Cloud services at https://status.vmware-services.io/ , you can subscribe to updates.

Apart from VMware Cloud on AWS service, this website reports for below services also:

  • VMware AppDefense
  • VMware Cost Insight
  • VMware Discovery
  • VMware Kubernetes Engine
  • Log Intelligence
  • VMware Network Insight

NSX Hybrid Connect

NSX Hybrid Connect enables cloud on-boarding without retrofitting source infrastructure and supports migration from vSphere 5.1 or later to VMware Cloud on AWS without introducing application risk and complex migration assessments.NSX Hybrid Connect includes:

  • vSphere vMotion
  • bulk migration
  • high throughput network extension
  • WAN optimization
  • traffic engineering
  • load balancing
  • automated VPN with strong encryption
  • secured data center interconnectivity with built-in hybrid abstraction and hybrid interconnects.


VMware Site Recovery

VMware Site Recovery for VMware Cloud on AWS is separately purchased item that communicates with separately licensed VMware Site Recovery Manager and VMware vSphere Replication instances. Recovery can occur from on-premises to AWS or AWS SDDC to AWS SDDC. VMware Site Recovery can protect vCenter Server version 6.7, 6.5, and 6.0 U3.


Consumption of AWS Native Services with VMware Cloud on AWS

The partnership between VMware and Amazon increases the catalog of solutions readily available to all VMware Cloud on AWS users. Some of the popular AWS solutions are listed below:

  • Simple Storage Service (S3): Highly available, highly durable object storage service.
  • Glacier: Highly durable, high latency archive storage used mostly for backup.
  • EC2: AWS flagship compute platform.
  • VPC: Networking solution of AWS solutions both internal and external.
  • CloudWatch: Monitoring for AWS solutions.
  • IAM: Identity and Access Management solution of AWS.
  • AWS Database Services: Wide range of  DB service like: Relational Database Service (RDS), DynamoDB (NoSQL Database Service), RedShift (data warehouse for data from relational databases for analytics)
  • Simple Queue Service (SQS): Fully managed message queues for microservices, distributed systems, and server-less applications.
  • Route 53: (DNS) Domain name provider and services.
  • Elasti-Cache: Managed, in-memory data store services.

Simple and feature-rich Web Interface for Network Services

Customer can easily consume Network services with few clicks , you need not to be network expert and strong command line hands-on experience. just few clicks and your IPsec VPN, L2 VPN , NAT , Edge FW rules , getting public IP from amazon all are ready to consume.


i have covered few features of VMware Cloud on AWS , if you wants to dirty your hands , go ahead and login to http://labs.hol.vmware.com  and if your organisation wants to test the feature and ease of consumption , there is one host option is there , By deploying a 1-node SDDC, you will be able to test out the features and functionality of VMware Cloud on AWS at a fraction of the cost. These 1-node SDDC’s are fully self-service, paid for by credit card (or HPP/SPP credits), and deployed in just under two hours.

Hope this helps you in understanding feature of VMware Cloud on AWS  better 🙂



Getting Started with VMware Cloud on AWS

What is VMware Cloud on AWS ?

VMware Cloud on AWS allows the use of familiar VMware products while leveraging the benefits of a public cloud. A hybrid infrastructure can be created between an on-premises VMware vSphere software-defined data center (SDDC) and a VMware Cloud on AWS SDDC.

VMware Cloud on AWS allows you to create vSphere data centers on Amazon Web Services and these vSphere data centers include vCenter Server for managing your data center, vSAN for storage, and VMware NSX for networking. you can use Hybrid Linked Mode, if you want to connect an on-premises data center to your cloud SDDC, and manage both from a single vSphere Client interface. Hybrid Linked Mode is like existing Enhanced Linked mode additionally it support cross SSO connection and vMotion.

VMware Cloud on AWS offers the following benefits:

  • It reduces capital and operational expenditures.
  • It reduces time to market for new applications.
  • It helps in enhanced scalability of applications in reduced time frames.
  • It helps in achieving greater availability of applications.
  • Your Application will have reduced recovery time objective (RTO).
  • and the most important one , it helps you to reduce staff time performing maintenance operations.

VMware Cloud Foundation

VMware Cloud Foundation is the unified SDDC platform that bundles vSphere, vSAN, and VMware NSX into a natively integrated stack to deliver enterprise-ready cloud infrastructure for the private and public cloud.

Secret sauce behind cloud foundation is VMware SDDC Manager which manages the initial configuration of the Cloud Foundation system, creates and manages workload domains, and performs life cycle management to ensure that the software components remain up to date. SDDC Manager also monitors the logical and physical resources of Cloud Foundation.

VMware Cloud on AWS is powered by VMware Cloud Foundation.

So in nutshell VMware Cloud on AWS uses VMware Cloud Foundation and VMware Validated Design to provide VMware SDDC and other migration solution on the hardware of AWS.

All components of this solution are delivered, operated, and supported by VMware Global Support Services. VMware fully certifies and supports all hardware and software components of this service. The customers are facing issue around managing firmware , patches , upgrades of underline infrastructure, now with VMware Cloud on AWS , VMware removed the burden of managing software patches, updates, or upgrades.  all this will be managed and maintained by VMware itself.

Use Cases

Data Center Extension:

  • DC extension of the on-premises data center to the public cloud to expand resource capacity, increase disaster avoidance and recovery options, or localize application instances to new geographic regions. For Example, one Organisation which is successful in one particular region and wants to grow their foot print across another region, in-stead of arranging data center space , hardware etc , this organisation can focus on core business and order IT infrastructure on VMware on AWS and just clone / migrate application vms to this localized data center.

Data Center Consolidation:

  • Maintaining a Datacenter is not easy, you have to take care of multi-source power , cooling , power backups , people , access management,BMS operations, Real state etc.., so instead of you managing Data Center , let VMware maintain your data center by consolidation of the on-premises data center costs by migrating applications from on-premises data center to the public cloud to reduce data center costs, prevent costs from growing, or close data centers entirely.

Data Center Peering:

  • Peering private and public cloud to allow for moving workloads between clouds. For example, moving applications from development or test to production or vice versa. or running CI/CD across private and public cloud.

This gives you basic understanding about what is VMware on AWS , in next few posts i will be covering how to install and configure this service.

vCloud Director – Chargeback

There were frequent asks from VMware based cloud providers that we must have a robust metering capabilities, VMware has launched New vRealize Operations Manager Tenant App for vCloud Director 2.0 in conjunction with vROps which has now inbuilt Charge back and metering capabilities.

Here I am going to discusses few awesome features with detailed screenshot. Go ahead and try these new features in your environment and build a robust Cloud infrastructure with native charge back with additional cost.

Creation of pricing policy based on chargeback strategy: With this new Release  Provider administrator can create one or more pricing policies based on how they can chargeback their consumers. Based on the vCloud Director allocation models, each pricing policy is of the type, allocation, reservation, or pay-as-you-go (PAYG).


This New Tenant App for vCloud Director 2.0 provides following ways to create pricing policies:

  • Base prices for primary resources:

    Pricing policy can be created to charge for primary resources, CPU, memory, storage, and network.

    • CPU & Memory ->

      • Users can be charged base on GHz or vCPU , can be charged “Hourly”,”Daily”,”Monthly”.policy02policy03
      • Charge Flexibility : Users can be charge based on allocation, use, reservation, or the advanced methodology such as, taking maximum of usage and allocation. Fixed cost too is available.policy04policy06
    • Storage ->

      • You can create various policies based on storage tiers to charge differential pricing and it is mapped to your storage policies.storagepolicy01
        • if not using Policy based storage then use based on Standard rate as below:storagepolicy02.png
    • Network ->

      • Data transmitted/received (MB), and network transmitted/received rate (MBPS) can be charged.Network01.png
    • Advanced Network ->

      • Pricing configurations:Pricing policy provides the flexibility to configure advanced chargeback mechanisms for network services, apart from charging primary network resources. Using advanced network pricing, users can apply variable and fixed charges for the following network services associated with edge. BGP Routing, DHCP, Firewall, High Availability, IP, IPV6, IP Sec, Load Balancer, L2 VPN, NAT, OSPF Routing, Static Routing, SSL VPN, Base rate and fixed costs can be applied for Edge Gateway sizes


    • Guest OS pricing ->

      • Guest OS can be charged uniquely. The charge can be applied based on the VM uptime, regardless of the uptime, or if the VM is powered-on at least once.gos01.png

    • Tag based and vCD metadata-based chargeback mechanism -> 

      •  Differential pricing can be established using tags or vCD metadata. Using vCenter tags or vCD metadata, tag key and key value can be referenced to apply base rate or fixed cost for VMs
  • Apply Policy ->

    • New Tenant App provides flexibility to the Service Provider administrator to map the created pricing policies with specific organization vDC. By doing this, the service provider can holistically define how each of their customers can be charged. The following vCloud Director allocation models are supported as part of the chargeback mechanism: Reservation pool Pay-as-you-go Allocation pool.assign.png
  • Exhaustive set of templates – >

    • Service Provider administrator can generate reports at various levels for a different set of objects. The following OOTB default templates are available:

  • Detailed Billing for Each Tenant ->

    • Every tenant/customer of service provider can review their bills using the vCD tenant app interface. Service Provider administrator can generate bills for a tenant by selecting a specific resource and a pricing policy that must be applied for a defined period and can also log in to review the bill details.
    • bill.png

This completes the feature demonstration available with vRealize Operations Manager Tenant App for vCloud Director 2.0. GO ahead and deploy and add native charge back power to your Cloud. 🙂

VMware vCloud Availability Installation-Part-10-Fully Automated Deployment

What i have learnt during deployments that  an automated installation and configuration of the vCloud Availability components is simple, time saving, faster and less error prone compared to the manual deployment. lets deploy it automatically with few clicks of the button.

For the automated installation of vCloud Availability, we must need to create a registry file containing information about the infrastructure and vCloud Availability components we are about to deploy.

Registry template file is located in vCloud Availability Installer appliance located at /root/.vcav/ and file name is – .registry.tmpl. this is self explanatory file about what option do you need to change and what not.

open this file with a text editor and save as “registry”  , here is my “registry” file for your convenience which you can modify based on your environment.

General Options:

Disabling all certification validation and specifying NTP server and SSH_PASSWORD for the entire environment,


Cloud Provider Management vCenter Information:

  1. This is identifier must be remain same and we will use the same in other commands and if you are changing this make sure you update in other commands.
  2. placement-locator – this parameter represents on which cluster your vCAV management VM will deploy. specify correctly.
  3. Make sure you have network Profile/Pool created (i have created with name “default”) and specify IP information accordingly.


Cloud Provider Resource (Tenant ) vCenter Information:

This is your tenant vCenter where your tenant vm resides , in my case i have single vCenter with separate cluster.Notice the identifier – vsphere vc.0 , you will reference this in deploying components. other information as suggested above.


vCloud Director Information:

  1. Notice the Identifier vcd vcd.0.
  2. Number 2 – in amqp parameter we are specifying amqp.1 , this means we need to create an identifier called amqp.1 in next section and since this will be identifier on docker host , so first we need to create docker host.


Docker Host Information:

  1. Again notice the identifier docker docker.0
  2. placement-vsphere  vc.mgmt (this is your vc.mgmt identifier , that means that this docker VM will get deployed on management vcenter.
  3. placement-address – this is the IP address of this VM.
  4. other options are self explanatory.


Message queue container on Docker Host Information:

  1. Again ensure the identifier is written and noted properly.
  2. Notice placement-docker – here we are specifying docker.0 which is docker host identifier in previous step we created.
  3. user – it is the user name that VCD will use to talk to Message queue server.
  4. password – it is the user name that VCD will use to talk to Message queue server.


Cassandra container on Docker Host Information:

  1. Notice the cassandra identifier
  2. Notice placement-docker – here we are specifying docker.0 which is docker host identifier in previous step we created on this docker host this cassandra host will get deployed.
  3. hcs-list – here we specified the vSphere Replication Cloud Service appliance identifier which will be deployed in next step.


vSphere Replication Manager Appliance Information:

  1. Again make a note of hms identifier.
  2. This host will get deployed in vc.mgmt.
  3. This VM will have ip address –
  4. This VM will have hostname – hms01.corp.local
  5. This hms will get registered with mgmt vCenter
  6. This hms will get registered with vCloud Director which we specified in indentifier vcd.0


vSphere Replication Cloud Service Appliance Information

  1. Make a note of hcs identifier.
  2. placement-vsphere is where this appliance will get deployed.
  3. placement-address is the ip address which will get assigned to this vm.
  4. hostname will be the name of this vm.
  5. vcd specified here , this appliance will get registreded to.
  6. Here we are specifying number of “cassanda” servers.
  7. message queuing server to registered with.


vSphere Replication Server Appliance Information:

  1. Make a note of hbr identifier.
  2. placement-vsphere is where this appliance will get deployed.
  3. placement-address is the ip address which will get assigned to this vm.
  4. hostname will be the name of this vm.
  5. vsphere specifies on which vcenter it is going to be registered.
  6. vcd specified here , this appliance will get registered to.


vCloud Availability Portal Host Information:

  1. Make a note of ui identifier.
  2. placement-vsphere is where this appliance will get deployed.
  3. placement-address is the ip address which will get assigned to this vm.
  4. hostname will be the name of this vm.
  5. vcd specified here , this appliance will get registered to.


vCloud Availability Administration Portal Host Information:

  1. Make a note of smp identifier.
  2. placement-vsphere is where this appliance will get deployed.
  3. placement-address is the ip address which will get assigned to this vm.
  4. hostname will be the name of this vm.
  5. vcd specified here , this appliance will get registered to.
  6. The mongodb-database property value is optional. Default value is vcav-smp , if you want you can use custom
  7. The mongodb-user property value is optional. Default value is vcav-smp.
  8. amqp will be used which we have specified in “amqp.1” identifier.
  9. this appliance will get registered with tenant ui which we have deployed in previous step under “ui.1” identifier.


save the file ensure there is no extension and copy to directory in vCAV appliance as below: /root/.vcav/ directory and run below command to validate you registry file , if out put is as below that means your registry file has been created correctly…


if you have configured registry file correctly and if all goes well then after around 20-30 minute appliance returns “OK” . which means we have successfully deployed vCloud Availability.


deployment of vCAV is simpler and less time consuming using automated one.only effort that you need to put in to create a proper registry file.

You can run a single task by running the #vcac next command. The vCloud Availability Installer Appliance detects the first task that is not completed and runs it. You can indicate which task you want to run by adding the #–task=Task-Number argument.

then follow my existing post number 9 

VMware vCloud Availability Installation-Part-9-Tenant On-Boarding

for tenant on-boarding. this completes the installation of vCAV. now you can work with your customers for the demo of DRaaS.

Here is my registry file for your reference.





VMware vCloud Availability Installation-Part-9-Tenant On-Boarding

Let’s Deploy your very known vSphere replication appliances and before we get in to that ensure that Tenant/customer has vSphere and vSphere Web Client installed and if vSphere is installed properly ,then in the vSphere Web Client, select the vCenter Server instance on which you are deploying vSphere Replication, click Manage > Settings > Advanced Settings, and verify that the VirtualCenter.FQDN value is set to a fully-qualified domain name.

Let’s On-Board tenant – Download vSphere Replication appliance ISO , mount the ISO and choose below three files during deployment of OVF from vSphere Client. we had multiple times deployed OVF , so not covering entire process in details , here are the screenshots of installation…


there are two configurations , i am choosing minimum with 2vCPU, for your environment you can choose based on recommendation for production.


Enter IP address and other details, ensure that this IP address is reachable to Cloud ( you can use NAT etc..)


Register your vSphere Replication appliance with vCenter SSO and after registering restart the services and ensure services are up and running.


Pair Sites

Login to vCenter and Click on “Site Recovery” that will take you to below screen , on this screen click on “Configure”.


Configure opens a new Window , Click on “NEW SITE PAIR”


First site must be your current vCenter and “Second Site” – Choose “Cloud Provider”

Cloud Provider Address – Enter the IP address or URL like (vcd.provider.com) of the vCD without /Cloud.

Enter Organization name which is configured on the cloud and your org cloud credentials and click Next.


if you do not have any connectivity issue , then you should see certificate warning. Accept Certificate warning by clicking on “CONECT”


Select your VDC and click Next.


Configure Network Mapping for your VMs in provider environment , and the best thing is you can select two networks, one for testing DR and another one is actual DR. ( How many Cloud providers has this option ?)


Configuring and enabling replication tasks.


this completes Tenant on-boarding , now Tenant can choose which VM they want to DR to Cloud.


VMware vCloud Availability Installation-Part-8-Integration and DRaaS portal access

As we have completed  the deployment of  all the individual components of vCloud Availability, we must need to configure them to talk/register to each other to support DRaaS.

1- Configure the vSphere Replication Manager

Configure vSphere Replication Manager with vCD using Below command.


Run below command to check if the HMS service started successfully.

hms022– Configure Cassandra

First import  vSphere Replication Cloud Service host certificates to Cassandra host.


Next is to register the Cassandra hosts with the lookup service.  run below command to register. you must see a successful message.


3- Configure vSphere Replication Cloud Service

Next is to configure the vSphere Replication Cloud Service VM, use below command to register vSphere Replication Cloud Service appliance to vCD, resource vCenter Server, and RabbitMQ server.


Run below command to check the status of service. it should return “OK” if service started successfully.


4- Configure vSphere Replication Server

this step is to attach vSphere Replication Server to vSphere Replication Manager and vCenter Server.


5- Configure vCloud Availability portal host

Use below command to configure the vCloud Availability Portal host and if it returns “OK” , then we have successfully configured vCloud Availability portal host.


6- Configure vCloud Availability Administration Portal

This portal runs a small “Mongo DB”. we must configure the vCloud Availability Administration Portal host with the vCloud Director server and its embedded MongoDB server then only services will start.


7 – Assign vSphere Replication Cloud Service Rights to the vCD Org Admin Role

before we enable VDC for replication , we must assign vSphere Replication Cloud Service rights to the vCD org administrator role.

SSH to VCAV appliance and run below command


see “–org” parameter i have put in “*” , that means all organisation’s admin will have vSphere Replication Cloud Service rights , if you want to enable on a particular organisation then instead of “*” , put organisation name.

8 – Enable org VDC for Replication

This step enable particular VDC for replication. run below command to get the list of “organisations” that we have and if you see the output of command , it says we have 4 organisations.


in the next command , let’s find out for organisation “T1” what is the vcd name on which we need to enable DRaas. you can check same thing using GUI also.


This is actual step to enable organisation “T1”  having VDC “T1-VDC” for enabling replication, and if everything goes right then we must see “OK” , that means our VDC is ready to use DRaaS.


This completes configuration, lets login to DR tenant portal using tenant portal URL , you need to use tenant credential for which this service has been enabled.



This is Service provider portal , on which you can check which orgs has been configured for DRaaS. here you will use your administration credential.


This completes service provider end configuration , in next post we will configure client end configuration and will see how to enable replication from customer data center.



VMware vCloud Availability Installation-Part-7-Create vCloud Availability Tenant and Administration Portal

The vCloud Availability Portal provides a graphic user interface to facilitate the management of vCloud Availability operations.

The vCloud Availability Portal back end (PBE) scales horizontally. You can deploy a new vCloud Availability Portal instance on demand connected to the same load balancer that all the vCloud Availability Portal instances are under. The load balancer must support sticky sessions, so that the same PBE instance processes user requests within a session. This setting ensures that all the information displayed in the vCloud Availability Portal is consistent.

vCloud Availability Portal Sizing

Deployment Type

Size and Sessions


Appliance size is 2 CPUs, 2 GB of memory, 10 GB of disk space, and 512 MB of Java Virtual Memory. Suitable for hosting up to 150 concurrent sessions.


Appliance size is 2 CPUs, 4 GB of memory, 10 GB of disk space, and 1.5 GB of Java Virtual Memory. Suitable for hosting up to 400 concurrent sessions.


Appliance size is 4 CPUs, 6 GB of memory, 10 GB of disk space, and 3 GB of Java Virtual Memory. Suitable for hosting up to 800 concurrent sessions.

Deploy the appliance using below command.




Create a new Variable with below information.

#export UI01_ADDRESS=

we will use this variable in subsequent commands. Next is to configure Trust


vCloud Availability Administration Portal

The vCloud Availability Administration Portal is a graphic user interface that helps service providers to monitor and manage their DR environments. This also need to be deployed using appliance sizing consideration.

vCloud Availability Administration Portal Sizing

Deployment Type

Size and Sessions


Appliance size is 2 CPUs, 2 GB of memory, 10 GB of disk space, and 512 MB of Java Virtual Memory. Suitable for hosting up to 150 concurrent sessions.


Appliance size is 2 CPUs, 4 GB of memory, 10 GB of disk space, and 1.5 GB of Java Virtual Memory. Suitable for hosting up to 400 concurrent sessions.


Appliance size is 4 CPUs, 6 GB of memory, 10 GB of disk space, and 3 GB of Java Virtual Memory. Suitable for hosting up to 800 concurrent sessions.

Now lets create  vCloud Availability Administration Portal host by running the following command.



Create a new Variable with below information.

#export UI02_ADDRESS=

if the deployment succeed then you will see that command returns IP address of the deployed Appliance. that represent that appliance has been deployed successfully.


Update the truststore file with the vCloud Availability Administration Portal virtual machine credentials using below command:

#echo ‘VMware1!’ > ~/.ssh/.truststore

Run trust-ssh command to trust the certificate vCAV FQDN.



now to validate that our deployments are ready for configuration , run below commands and must return “OK”.


“OK” means till now we have deployed components are ready for configuration.This successfully completes install of the all the appliances and components for vCAV. now we need to integrate these components to each other and with vCD.

VMware vCloud Availability Installation-Part-6-Create vSphere Replication Cloud Service Host & Replication Server

The vSphere Replication Cloud Service is a tenancy aware replication manager that provides the required API for managing the service and all the components. vSphere Replication Cloud Service registers it self as a vCloud Director extension and will be  accessible through the vCloud Director interface.

Lets Deploy vSphere Replication Cloud Service Host using below command..




Create a new Variable with below information.

#export HCS01_ADDRESS=

we will use this variable in subsequent commands.

Next is to configure trust for vSphere replication certificate by vSphere using below command.


if command response is “OK” , that means we have successfully deployed vSphere Replication Cloud Service Host.

Create vSphere Replication Server

As we know vSphere Replication Server handles the replication process for each protected virtual machine. ideally it should be deployed one per manager instance.Run command as below to deploy HBR01.


if the deployment has been successfully completed , then you would get VM IP as success  message.


Next is to  create a variable with the IP address of above deployed VM, you can create additional variable if you have deployed multiple hms. this variable we will use in further commands. (Variables are casesensitive)

#export HBR01_ADDRESS=

Next step is to trust the vSphere replication certificate by vSphere using below command and it should return “OK”


This completes deployment of vSphere Replication Server appliance for vCAV.



VMware vCloud Availability Installation-Part-5-Deploy vSphere Replication Manager

vSphere Replication Manager manages and monitors the replication process from tenant VMs to the cloud provider environment. A vSphere Replication management service runs for each vCenter Server and tracks changes to VMs and infrastructure related to replication. these appliances can be horizontally scaled based on the requirement.

In production environment we must deploy one vSphere Replication Manager for each Resource vCenter Server but in this lab i will be deploying in my management vCenter only as i don’t have two separate vCenter one for management and another one for tenant called resource vCenter.

Let’s Start the deployment , again make a SSH connection to VCAV appliance and run below command to deploy replication manager.

you do not need to specify the location of Replication manager appliance location as described in the documentation , command picks up automatically from within the appliances.

Location of appliances on the VCAV appliance is – /opt/vmware/share/vCAvForVCD/latest


Run command as below to deploy HMS01 on vCAV appliance.


I am using –debug just to understand what is happening behind the scene but you can ignore it if you want and monitor the progress in vCenter, it must be deploying a VM with Name called “hms01” with IP “” as specified in –vm-address option.


once succeed and result on appliance will be displayed as deployed Virtual Machine IP address , that means it is successfully deployed virtual Machine.


Repeat the same process to deploy additional hms ,if you have many resource vCenter ideally you should have one per vCenter..

Next is to  create a variable with the IP address of above deployed VM, you can create additional variable if you have deployed multiple hms. this variable we will use in further commands.

#export HMS01_ADDRESS=

Next step is to trust the vSphere replication certificate by vSphere using below command and it should return “OK”


This completes deployment of vSphere Replication Manager appliance for vCAV, which will help us  in managing and monitoring the replication process from tenant VMs to the service provider environment.

VMware vCloud Availability Installation-Part-4-vCD Configuration and IP Plan

In continuation to deploy and configure vCloud availability , till now we deployed vCAV appliance and prepared its dependencies. in this post we will configure vCD to be used as DR site and will Plan IP schema for vCAV appliances which will be deployed next.

First setup a trusted connection between the RabbitMQ host and the vCloud Availability Installer Appliance.


Register RabbitMQ host with vCloud Director by running the following command on the vCloud Availability Installer Appliance.


if command responds “OK” then configuration has been successfully applied. you can also verify in vCD UI.

3.pngRestart vCloud Director Service after configuring AMQP settings, by using

#Service vmware-vcd restart

Check vCD Endpoints:

This step to verify that our environment is properly configured for vCloud Availability installation, by checking the vCloud Director endpoints for known problems.


if everything  has been done properly then we should get response as “OK”. This completes pre-configuration before proceeding with the installation of VCAV Replication/UI Virtual Machines but before we get into the installation of appliances , we need to plan IP address and DNS names for those appliances.

Here is my IP planning sheet for your reference.

                                                  Planning Sheet  
Machine Type DNS Name IP Address
vCloud Availability Portal vcav.corp.local
Docker Host for Cassandra and RabbitMQ docker01.corp.local
HMS hms01.corp.local
HCS hcs01.corp.local
HBR hbr01.corp.local
UI01 ui01.corp.local
UI02 ui02.corp.local

This Completes this post , in next post we will install appliances using above Table.


VMware vCloud Availability Installation-Part-3-Install Cassandra and RebbitMQ


RabbitMQ is an Open Source AMQP server that can be used to exchanges messages within a vCloud Director environment.  in production environments for high availability and scalability purposes, you can configure the RabbitMQ servers in a cluster.


Cassandra is a free and open-source distributed NoSQL database management system that stores metadata and supports storage of the metadata for replication services. for High availability you must deploy 3 Clustered nodes.

Since i don’t have resources in my Lab, so i am going ahead and deploy Cassandra and RabbitMQ in a single VM using containers and this is enough for our Lab deployment.

In our Part-1 we deployed VCAV, connect to vCloud Availability using SSH and run below commands to start docker service on vcav host.

#systemctl start docker – and once commands succeed check status using..


Create Password Files on Your vCloud Availability Installer Appliance

  • # mkdir ~/.ssh = > Create a directory  called “ssh”.
  • # chmod 0700 ~/.ssh -> Changes the directory permission.
  • # echo ‘VMware1!’ > ~/.ssh/.root – creates a  file names “root”  with having password. “VMware1!”
  • # echo ‘VMware1!’ > ~/.ssh/.vcd  – create a file named “vcd” with having vCD admin password stored.
  • # echo ‘VMware1!’ > ~/.ssh/.sso – This file will store “SSO” password.
  • # echo ‘VMware1!’ > ~/.ssh/.vsphere.mgmt – This file will store “vSphere” password
  • # echo ‘VMware1!’ > ~/.ssh/.cassandra.root.password
  • # find ~/.ssh -type f -name ‘.*’ -print0 | xargs -0 chmod 0600


This completes creation of password files. now lets create a IP pool.

Add a Network Protocol Profile

Basically a vSphere network protocol profile contains a pool of IPv4 and IPv6 addresses, IP subnet, DNS, and HTTP proxy server.VC assigns those resources to vApps or to virtual machines with vApp functionality that are connected to port groups associated with the profile. let’s create a network profile which our VM’s will use during their deployment.

  1. Go to data center click the Configure tab , click Network Protocol Profiles and edit Default profile.
  2. docker03
  3. associate a port group with profiles , on which you want your deployed vms get connected.
  4. docker04
  5. Enter your Subnet, Gateway, DNS server address , don’t forget to enable the pool and specify the IP range , so in my case i have assigned 20 IPs start with .160.
  6. docker05
  7. Specify DNS domain name and DNS search path.
  8. docker06

this completes creation of Network IP pool and its setting that VMs will use while deploying vCAV component vms.

Deploy a Docker Host

To deploy a docker host on vSphere Management Cluster run below command on vCAV appliance.


but before running this command , you can see certain variables has been used in the command, so first lets create those variables..

  • $MGMT_VSPHERE_ADDRES -> export MGMT_VSPHERE_ADDRESS=vcsa-01a.corp.local
  • $MGMT_VSPHERE_USER -> export MGMT_VSPHERE_USER=administrator@vsphere.local


so after creation of variables we run the above command on vcav vm using vcav docker create… which successfully created a docker VM in our management cluster.


Download Rabbitmq container on vCAV appliance using below command. for this step your vCAV appliance must able to reach to internet or if you have your own registry like VMware Harbor then you can pull from there.


Download Cassandra container on vCAV appliance using below command.for this step your vCAV appliance must able to reach to internet or if you have your own registry like VMware Harbor then you can pull from there.


Create three new Variables and password file as below:

  • export AMQP_ADDRESS=
  • export DOCKER01_ADDRESS=
  • echo ‘VMware1!’ > ~/.ssh/.amqp

Create RabbitMQ Container

Now lets create RabbitMQ Container using below command on vCAV appliance. and command returned “OK” that means my container creation was successful.


trust the vCAV connection with RabbitMQ as below.


Create Cassandra Container

Now lets create Cassandra Container using below command on vCAV appliance. and command returned “OK” that means my container creation was successful.


You can check the connectivity using telnet with particular port numbers for RabbitMQ and Cassandra servers. this post completes RabbitMQ and Cassandra containers deployment , we will configure these in subsequent posts.



VMware vCloud Availability Installation-Part-2-Configure SAML Federation

Using the vSphere SSO service as the SAML identity provider for the vCloud Director System organisation can be a more secure alternative to LDAP or a local account. When vCloud Director is federated with vCenter SSO, enables you to import system administrators from vSphere and this is required for VCAV to work properly. so let’s configure it.

Login to vCD as system admin user and navigate to Administration > System Settings > Federation and click on Metadata (3) and download Metadata.  it will be like this



then go to vSphere and upload this downloaded vCD Metadata.


Choose the File downloaded by Clicking in “Import from File” which we have downloaded and click on “Import”. This will complete the the metadata import from vCD to vSphere.


Now we need to Download SSO metadata file and need to import to vCD. login to vSphere , Go to “Configuration” -> SAML Service Providers -> Click on “Download”


Go to vCD login with Administrator , then go “Administration” -> “Federation” -> Tick on “Use SAML Identity Provider” – > then Browse the File which we have downloaded in previous step – Click “Upload” and Click “Finish”



Once mutual metadata sharing is completed , on vCD go to Administrator -> Users -> Import Users – you will see new Source called “SAML”


Choose SAML and manually enter “administrator@vsphere.local”  and click ok.


and new user has been added to vCD with System administrator  role.

10logout and login with vSphere SSO credential like”administrator@vsphere.local” and its password , it should be a successful login.

There is one more important setting that we need to do on vCD appliances , go to /opt/vmware/vcloud-director/etc/global.properties and add – extensibility.timeout=60.


This completes our vCD pre-requisite configuration , in the next post i will deploy cassandra and rabbitmq.







VMware vCloud Availability Installation-Part1-Deploy Appliance

So in Previous post i tried to explain what problem VMware solving with vCAv , now let’s get in to the installing of the components , there are two ways to install vCAv either using automated way or manually running few commands on vCAv appliance which will automatically install and configure stuff , this we can call semi automated and in next few posts i will be installing is using semi automated way as this gives me more comfort around understanding of what component is getting installed and integrate with whom…..

So lets get into the installation mode and first thing we need few Linux VM for

Cassandra 01 Nodes
RabbitMQ 01 Nodes
Cloud Proxy 02 Nodes

Since this is Demo environment , so i am not considering HA for any VMs.So first lets create a Cent OS VM with all the required pre-requisite installed on this and then template it , which will help us in saving considerable amount of time. same approach can be taken to your production deployment with customisation specification.

For this Demo , i am creating a new Linux VM based on CentOS-7-x86_64-Minimal-1804.iso  and install the OS, Once OS installation is completed,  Connect to the VM with SSH and first update yum:

  • #yum update yum and Reboot the guest OS.

Install the packages required by vCD.


#yum install alsa-lib bash chkconfig coreutils findutils glibc grep initscripts krb5-libs libgcc libICE libSM libstdc++ libX11 libXau libXdmcp libXext libXi libXt libXtst module-init-tools net-tools pciutils procps redhat-lsb sed tar wget which

I would suggest to install NTP to keep the VM clock in sync:

  • #yum install ntp

Configure ntp servers ,using VI, change the lines beginning with server to NTP servers. All components connecting to vCD should share the same NTP servers for accurate timekeeping:

  • #vi /etc/ntp.conf

Start the ntpd service

  • #systemctl start ntpd
  • #systemctl enable ntpd

Check ntpd is syncing to correct ntp servers using – #ntpq -p

There are lots of features depend on DNS , so i would suggest to install DNS bind utilities and verify that VM is able to resolve DNS queries.

  • #yum install bind-utils (to install nslookup)
  • #nslookup VMNAME
  • #nslookup VMNAME.DOMAIN.COM
  • #nslookup

For this lab environment, I turn off selinux as well as firewall, while in production deployment please choose correct configuration.

Go to selinux file and edit – #vi /etc/sysconfig/selinux and change SELINUX=enforcing to “SELINUX=disabled”.


To disable firewalld, run the following command as root: #systemctl disable firewalld and  stop firewalld, run the following command as root: #systemctl stop firewalld

Install VMware tools , reboot the VM to takes effect all the above changes. now we are done with OS configuration , shutdown the VM and Change the VM as template and deploy 7 VMs from this template. while i am deploying These external VMs, also we need to deploy vCAv Appliance. download the appliance from here.  and deploy the appliance by following below steps:

Choose OVF


Select appropriate Cluster/ host location to deploy35

Accept EULA67

Enter Domain name , IP address and others settings as per requirement.8

and Click Finish to Deploy.9

This Completes Template preparation and deployment of vCAv Appliance.