Hydra 1303

All News Items

Stranger Things - Scrubbing ESXi Nodes

from the Hydra High Council Oct 20th 2020

IF you have been keeping up with our series on managing nodes with Lenovo XClarity Administrator, you have discovered (as we have) how powerful XClarity Administrator is. We have been able to recover hosts and deploy a fresh ESXi install…without any root credentials. We have been able to establish a software-based console (XClarity's Controller) that allows us to continue to connect via the Lenovo firmware regardless of the ESXi vmk management IP address. We have even been able to recover a host that was in terminal PXE boot by forcibly mounting the ESXi ISO and then repairing the storage option back to default settings.


Now on to Stranger Things. We want to deploy VCF in our lab on loan from Lenovo. So, this would be considered a "brownfield" deployment because these hosts were in a vSAN cluster in some unknown vCenter in a former life. But wait…didn't we do a clean install? Yes, we did, but that doesn't fix everything. Follow along with us down the checklist to prep your re-purposed nodes for VCF. Stranger Things…indeed.


Note: This blog will focus on correcting storage errors on the hosts prior to doing the VCF deployment. We discussed networking requirements in a previous blog "Plan A Didn't Work". We will assume you read that one.


We will jump ahead here just a bit to show what the pre-flight check would look like if you did not clean up your vSAN dishes.


Here is the attempted deployment cycle:

  • Fresh install of the appropriate ESXi image on 4 hosts for Management Domain
  • Backing network configuration completed
  • Deployment of Cloud Builder OVA
  • Completion and upload of the deployment worksheet
  • Error (probably more than just this) on the deployment validation


As you can see from these two screen shots, there were existing vSAN partitions and these cannot be over-written, they must be removed first.


You will need to:

  • SSH to each host
  • Or use the DCUI. (Enable ESXi Shell and Alt-F1 to open, Alt+F2 to close)


Enter esxcli vsan storage list

For each VSAN Disk Group UUID, you will need to remove it.

Enter esxcli vsan storage remove -u <VSAN Disk Group UUID>


Enter esxcli vsan storage list again to confirm there are no vSAN disk group UUIDs.


This is considered "scrubbing the host." vSAN dirty dishes are now cleaned and put away.


You could attempt the deployment validation again, but we would recommend re-installing ESXi again. If you have Lenovo XClarity Administrator…this is a breeze. Just for emphasis, here is a quick overview of what you need to do to setup your hosts for VCF deployment.


    1. Install ESXi with Lenovo XClarity Administrator. Recall setting the proper hostname and IP address on the deployment settings edit tab for each node.


    1. DCUI to each node and verify network settings. Make any necessary changes here, particularly VLAN assignment for the management network.


    1. Browse to each node.Verify network settings. You can also correct management network VLAN assignment from here Networking > Port Groups > VM Network > Edit to change VLAN ID if needed.


    1. Configure and Enable SSH


    1. Manage > Services > TSM-SSH on Actions > Policy > Start and Stop with Host


    1. Services > Start TSM-SSH


    1. Configure and Enable NTP


    1. System > Time & Date


    1. Edit NTP Settings > Edit Time Configuration to Use Network Time Protocol (enable NTP client)


    1. Edit policy to Start and Stop with the Host


    1. Add NTP Server
    2. Save


    1. Services > Start ntpd


  1. Note date and time is correct. You will need to verify this again after Cloud Builder has been deployed.

Note: We will have a separate blog on deploying Cloud Builder and the actual Management Domain Bring Up process. That part is much easier from a greenfield perspective. But what's the fun in that?!? The primary focus on this series has been how to approach and execute a deployment or lab with nodes that have been re-purposed.


Before running the deployment validation check again, it is worth checking timing. We noticed any time drift can cause all sorts of issues. Your Cloud Builder appliance and nodes should be using the same NTP source.

    1. Check time on Cloud Builder by Launching Web Console on the VM.
    2. The CLI credentials are ones you entered on the deployment wizard.
    3. Type date to display current time and date


  1. Browse back to one of your hosts (if you closed the tab already) and compare timing. Ours are off a few seconds because of taking the screenshots.

At this point, run the deployment validation again to verify the vSAN issues are resolved and hopefully you are welcomed with a host of green sponges…err checkmarks.


Stranger Things epilogue: Bear in mind anytime you want to re-do all of this (like a lab in our case) or repurpose other nodes, same methods apply. Happy Scrubbing!!