NSX-T Routing: Part II
In this, Part 2 of the NSX-T Routing blog post, we will be discussing advanced features like Active/Active vs Active/Standby modes and BGP.
High Availability Modes
In Part 1, we talked about the Logical Routers components: the Distributed Router (DR) and the Services Router (SR). We discussed the span of the DR (all transport nodes) and the SR (NSX Edge only) in a separate dedicated post so that we could focus on High Availability in Part 2.
When we talk about High Availability (HA), we focus on a specific component of Logical Routers, which also go by the name of Gateways. It is a component that is not distributed... the SR.
For the Tier-0 Gateway, the default HA mode is Active/Active. This mode is preferred when the goal is to have more bandwidth in/out your NSX domain (with up to 8 Edges sending/receiving traffic). This mode is Highly Available, meaning if an Edge fails, traffic can be redirected through another available Edge. However, this mode is for Routing only.
In the following diagram, we see a Tier-0 Gateway configured in Active/Active mode using only two Edges (maximum 8) and two uplinks (1 per SR). Two SR components are instantiated in two different Edges and both are active, providing more bandwidth in/out. Keep in mind that DRS anti-affinity rules should be in place to keep those two Edges on separate ESXi hosts.
If stateful services like NAT, VPN, FW, etc... need to be configured, then Active/Standby mode is required. NSX Manager has a mechanism that does not allow you to configure stateful services if the HA mode is not set to Active/Standby. The reason is that with stateful services, the connection must be tracked. For this to happen, the traffic must go through the same device that's doing the tracking. With Active/Active mode, asymmetric routing can cause the traffic to traverse a different device on its return.
In the next diagram, we have a Tier-0 Gateway in Active/Standby mode with NAT configured. Only one SR component can be active at a time (SR 1 in this case). SR 2 will take over if Edge 1 goes down or if SR 1 BGP peering is lost.
For the Tier-1 Gateway, only Active/Standby is supported. Since this Gateway has no direct connection to the outside world, there is no need for extra bandwidth to handle north/south traffic. The concept is the same what was previously mentioned for the Tier-0 Active/Standby. Only one SR will be active with another SR serving as standby. vSphere HA is highly recommended to recover any downed Edges.
Dynamic Routing Protocol
NSX-T only supports BGP as a Dynamic Routing Protocol and that is just perfect. Back in the day, common practice was to only use BGP as an exterior routing protocol, taking care of Internet traffic, while within the Enterprise, using an Interior Gateway Protocol like OSPF, EIGRP, RIP, or IS-IS. BGP still handles the exterior Internet traffic, but nowadays in modern data centers, people are running BGP inside the Enterprise network because of how much control and flexibility it provides.
By configuring BGP between the Tier-0 Gateway and the Top of the Rack switches (L3 switches), you can achieve end to end connectivity, meaning that new networks added on the Overlay (virtual environment) or to the Underlay (physical environment) can easily be reachable dynamically when configured to do so.
In the next diagram, we see a Tier-0 Gateway with 4 uplinks (two per SR) and 4 BGP peers. With this topology, we have redundancy at the Edge, Uplink, and ToR switch levels.
What about routing between Tier-1 and Tier-0 Gateways? For that, no BGP is needed.
NSX-T Routing is simplified by eliminating the need for a Routing Protocol between routing tiers. The way this happens is by configuring route advertisements. The Tier-1 Gateway can be configured to advertise different types of routes, like "connected routes" for instance. So, by simply connecting a Tier-1 Gateway to an existing Tier-0 Gateway and toggling the option to advertise the necessary routes, those routes will automatically propagate to the routing tables of the Tier-0 and upstream L3 devices.
Hopefully, this blog helps you better understand how routing works in NSX-T. Stay tuned for the next episode in the NSX-T series!!