Plan A Didn't Work. Thankfully, We Have 25 More Letters...
If you have been following our blog series on Lenovo XClarity, we are now ready to attempt a Management Domain deployment of VMware Cloud Foundation via Cloud Builder on our awesome gear on loan from Lenovo.
When we last left off, we had recovered our hosts and installed a fresh copy of ESXi 7.0 (custom Lenovo image) to our 4 VX servers. Next, we started with the Deployment Parameter Worksheet (downloaded from your Cloud Builder instance and version specific) and our first thought was to keep our maiden voyage simple, since we had met the minimum requirements:
ESXi Configuration - All ESXi hosts must be configured with the following settings:
- Static IP Address assigned to the Management interface (vmk0)
- Management Network portgroup configured with correct VLAN ID
- VM Network portgroup configured with the same VLAN ID as the Management Network
- TSM-SSH Service enabled and policy set to 'Start and Stop with Host'
- NTP Service enabled, configured and policy set to 'Start and Stop with Host'
We figured we would keep the maiden voyage fairly flat, then we will start building out a more realistic infrastructure once we had our feet wet. Turns out our ship was taking on water.
You may recall reading (or someone telling you) that the parameter spreadsheet does a pre-flight check. Turns out it actually does validate all those entries. Some failures result in a hard stop. So much for Plan A.
After examining the sheer NUMBER of errors, we came to the conclusion that we needed to fall back and build out our lab infrastructure to mimic a real deployment since VCF doesn't have a "lab button". This is actually a good thing. The whole point of deploying the cloud operational model we continually preach about is creating an environment that is scalable and resilient without having to go back and change anything in the underlay. So…for that to happen, we have to have the underlay functioning properly, regardless of the kind of deployment.
A great example of this is saying "yes" to Application Virtual Networks (AVNs).
AVNs allow for the complete abstraction of hardware and decoupling of VM Networks from the underlay to provide integration with external cloud providers and support seamless failover. However, it produces Plan B, meaning these components need to exist.
Now, facing the exercise of deploying the proper supporting infrastructure, we reflected on the recent lessons learned. Lenovo's XClarity Administrator and Controller worked absolute magic in allowing us to remotely recover and re-install our 4 VX nodes. Further, it allowed us to repair a hopeless boot error by directly mounting the ESXi ISO and once again prevent what we thought was a certain road-trip to the datacenter. We've now become addicted to that functionality and support…and want to introduce that not only in our lab, but in every design where Lenovo XClarity Administration is part of the solution.
Sidebar: We must continually adapt our thinking to the Cloud Operational Model and build recoverability (especially remote) into everything we do. So, what's the best way to do that?
As with any datacenter, production or lab, there is at least a management network. This default network (untagged) is what allowed us to complete all the previous activities. This is the default vmk0 for our ESXi hosts. We decided to just leave that alone. This way, we could totally screw up anything on the "production" side and still have complete recovery. Not only a good idea for labs, but a GREAT idea for production. Nobody wants to hear Captain America tell Iron Man, "You could have saved us!!" Lenovo XClarity Administrator already did.
Making a List, Checking It Twice
Since Cloud Builder is going to be checking for supporting infrastructure, look at the Deployment Parameter Worksheet and compare it against what already exists in your environment and determine what/where will need to be configured before proceeding.
We decided to use the Lenovo RackSwitch G8272 to do all the heavy lifting. The VLAN interfaces will act as the gateway for the required networks and provide inter-VLAN routing. Now, we essentially have this topology.
You will still need to configure and support DNS and NTP. We simply added a second network adapter to our existing DNS server and dropped it on the VCF-MGMT-2017 network. This also changed where we deployed Cloud Builder. On the maiden voyage, we deployed on the "real" management network. Now, we will deploy Cloud Builder on the "VCF-MGMT-2017" network since host connectivity is one of the first validation checks. You will also need to configure BGP so the proper peers can be formed.
We tried in this blog to clarify that there really aren't any shortcuts on deploying the VCF Management Domain with Cloud Builder. And this is a good thing. Sometimes Plan B is for the best - it makes you think about why Plan A didn't work.