LACP “To Be or Not to Be” – LACP Virtual and Physical world
Now that we have come into a new era of data center design infrastructure and we are leveraging the spine and leaf design, we quite often get asked by design and implementation engineers, "Should we use LACP? I've heard it's not a good idea.…"
To clarify, LACP was never bad. What we have noticed throughout the years is misconfiguration of ports and channels, giving LACP a bad reputation.
What we need to realize today is that, independent of the network switching vendor you are using, we have to consider the vSphere vDS configuration that will be used with LACP.
Here at Hydra1303, we have been using LACP configurations without any issues. Today, we would like to show you a couple of tricks to ensure that you have a smooth implementation, and can leverage LACP on your infrastructure, virtual and physical.
So first of all, before we get into all the CLI and recommendations, here are a few considerations. You are probably wondering what the benefits of are using LACP are, and what we gain with adding this configuration to the design. I would say you the answer is resiliency. You build resiliency for all of your VLANs running in the dvUplinks.
So if you think of the infrastructure, today LACP is supported on a self-driven data center - the VLANs that will carry your "special" traffic. Think of the design using network virtualization and software defined storage. On average, you end up with 6 to 10 VLANs. The normal suspects are a VLAN for vSAN, a VLAN to transport NSX (VXLANs), management, vMotion, provisioning, backups etc.
This is the traffic for which you are building resiliency with LACP and adding dynamic failover. Take a look on Picture 1 below, to gain a better insight. This picture describes what makes a spine and leaf design great. Which is the fact that we eliminate STP. On a spine and leaf design, there is only routing between the switches with either OSPF or BGP. BGP seems easier, because no area design and LSA propagation has to be considered. Each leaf is a top of the rack switch servicing a pod.
Picture 1: Spine and Leaf Design
Now you are probably wondering where the LACP configuration is and how is that done with the vDS in vSphere to make that all work.
Let's take a look at the topology first, and then we can get into the details for configurations and troubleshooting, which will give you the tools you need in your toolbox to successfully implement LACP on your infrastructure.
Independent of your hardware vendor, the topology considered here on Picture 2 is two leaf switches, with one ESXi host. The host has two 25 gig NICs - one connected to Switch A and one to Switch B.
Switch A is on rack 1 and Switch B is on rack 2.
The LACP configurations is done with the vDS on vSphere and with Switch A and Switch B. Between Switch A and Switch B, you have another LACP port channel that will give you the resilience we talked about.
The layer 2 and VLANs are only between ESXi vDS and the top of the rack Switch A and Switch B. Every other link from the leaf going to Spine is layer 3 only, and routing.
Picture 2- Leaf Switch and ESXi connections
To gain an understanding of what has to happen in the vSphere side of the house, let's take a look into this configuration from the point of view of ESXI.
On the vDS, we will create a LACP bundle with the two dvUplinks, and we will allow all traffic and use IP hash. NIO is not a requirement unless you actually see that shares are required, because there is possible contention of resources.
Monitor your links, and if you see fit, turn on NIO and proceed with caution, because of the way shares will work with the different types of traffic. Make sure you have used the appropriate amount of shares for the right traffic. For example, vMotion will never have more shares than vSAN.
Picture 3 shows the point of view from the ESXi and the vDS.
Picture 3- ESXi and vDS view with LACP
Configuration and implementation of LACP overview:
The switches are in pair in a pod - Switch A and Switch B. Switch A connects to the first port on the ESXi host (VMnic1) and Switch B connects to the second port on the ESXi host (VMnic 2).
If you are thinking about leveraging vSphere auto-deploy, you will need to configure your Switch A, which is connected to the first VMnic, with LACP priority.
The following is the current expected configuration on the leaf switches
Switch A Sample Configuration
Picture 4- Sample of Port-Channel configuration on Switch A
Picture 5- Sample of interface E1 configuration on Switch A
Switch B Sample Configuration
Picture 6 -Sample of Port-Channel configuration on Switch B
Picture 7- Sample of interface E1 configuration on Switch A
Toolbox tips and tricks:
Here are samples of commands to help you. This will depend on your vendor, but they are very similar from vendor to vendor as you know it.
# show int status - Allow you to verify layer one and connectivity to the port
#show mlag config-sanity - This allows you to verify that configuration match and is consistency.
Other commands that are helpful for troubleshooting
#show mlag detail
#show mlag issue
#show mlag interfaces
Note: For consistency and simplicity, in this design, the port channel numberand interfaces all match. Therefore Port-Channel 1 is PO 1 and it is bound to Ethernet 1 on both switches.
Note: The mode on this MLAG has to be active in order to work with the vDS on the ESXiHost
Note: The port channel has to be a trunk and allow the proper VLAN's to the host
Caution: spanning-tree portfast edge and spanning-tree bpdu-guard must be enabled, even though this is a trunk port. The reason for this is so the ESXi host does not time offon lag formation, while the spanning tree goes through the phases of convergence. If commands are not enabled, the host potentially will not form the MLAG and will revert to its default standard switch configuration.