I’m very frequently asked what is the best way to run IP Storage (Software iSCSI, Software FCoE, NFS, vSAN, etc…) in a vSphere environment. More specifically, I’m asked how one should go about designing it. This blog post is a suggestion on what non-Network administrator (read vSphere Administrator) can ask/request of the Network Administrator (underlay) to maximize the performance of the IP Storage solution.
Some warning to the non-Network Administrator: you will need to know some networking stuff.
In the diagram below you can see some vmk ports (the usual suspects) and one big dvPortgroup to represent all the VM workloads in the host.
Illumination: You could put all of your vmk ports (except VXLAN) in the same dvPortgroup, but you would potentially create an operational nightmare.
Now add up the MAXIMUM (peak) egress bandwidth (that will leave the host) of each dvPortgroup. This is a job for the non-Network Administrator. If she doesn’t have this information, she needs to get it. We can’t effectively move forward without it.
And before I forget, the image is correct. DO NOT USE multiple vDS in one ESXi host for different traffic types and NEVER use the standard switch (unless you have no vDS license or for Software iSCSI because VMware made you to).
Now that you added up all the bandwidth, add a single VMNIC (NIC) to the host that is BIG enough to support the bandwidth and connect to the Top of Rack (ToR) as shown in the diagram.
If you have budgetary restrictions, then go with two VMNICs, as shown in the diagram below. But NEVER go beyond 2 VMNICS.
Right now we have a network solution that will meet the network requirements for your IP Storage (and everything else in that host). But we don’t have redundancy. As a matter of fact, we have multiple single points of failure (assuming a single NIC). So to provide us with redundancy, multiply your network components by 2, as shown in the diagram below.
Please notice that we now have two ToRs in each example.
We are almost done. The vDS is a Layer 2 entity and as such it won’t efficiently load share egress traffic on all interfaces. The solution to this is to configure link aggregation, and I strongly recommend that you go with LACP using IP Hash, as shown in the next diagram.
Network Administrators reading this (you were not supposed to read this by the way) would notice that this configuration won’t work unless you configure MLAG in the two ToR. And you are correct, thus the following diagram shows the final configuration.
And now that I described a methodology to understand what is needed to support IP Storage, here is a list of requirements/requests you can provide your Network Administrator to support IP Storage:
- After you determine the VMNIC you need, ask the Network Administrator (requirement) for the number of ToR interfaces and speed for the ESXi host NICs.
- Ask the Network Administrator (requirement) for switch (ToR) resiliency/redundancy.
- Ask the Network Administrator (requirement) for LACP. Let her know you will use IP Hash.
Note: This tells the Network Administrator that she needs to configure MLAG. It is her discretion on what LACP hash to use.
- Ask the Network Administrator (requirement) for 9000 MTU
Note: There is NO reason why you wouldn’t configure Jumbo Frame to the highest possible MTU
- Provide the Network Administrator the traffic types (per dvPortgroup) and ask (request) for a VLAN number for each, the IPs, subnet masks, and default gateway.
That’s it (drop the mic). If you follow these recommendations you would have an ESXi host network design that will meet your IP Storage needs (and all the host’s needs actually).
Illumination: If you have iSCSI then you may choose to go with MPIO instead of LACP, which would require you have two discreet VMNICs dedicated for iSCSI. I vented a bit about this in my old blog.
Illumination: I know I left out the QoS conversation. That’s because QoS would come into play IF the ToR has either HIGH ingress bandwidth utilization on the Access ports and/or HIGH level of oversubscription on the uplinks without enough egress buffer capacity. I suppose I may write a blog post about this some day.