Getting the Edge
A Lunch Meeting... Like in Olden Times
Recently, one of us had our first lunch meeting in about 6 months. The meal was outdoors,no one too close or near, with a fan blowing in their direction for good measure. We are frequently asked about NSX-T documentation/content, but it was not until this meeting that it hit us that there seems to be confusion among customers, partners, and even inside of VMware on the difference between the NSX-v and NSX-T. It seems the more you know about the Edge Services Gateway (NSX-v Edge), the more confused you would be about the Edge Node. We think we can see why that could be.
More Different Than You Think
The NSX-v Edge and the Edge Node share very little in common, other than a name (Edge) and that both provide Network Services. An NSX-v Edge is closer to a traditional physical router whereas, an Edge Node is more about the desegregation of networking via microservices. We think to better understand how the Edge Node functions, it's best not to think of the Edge itself, but rather the Network Services NSX-T provides.
Network entities have one primary job: to take ingress traffic, determine the egress port, and send traffic on its way. Network entities can do this using Layer 2 (switching) or Layer 3 (routing) information. If you want to get sophisticated, you can setup the Network device to use Layer 4 information (such as NAT) or even Layer 7 (Load Balancing) information.
You can think of Network entities as providing Networking Services based on the type of information and criterion they use to do their core job. For example, if you just need a Network entity to be the default gateway for your workload, then the Network entity would provide a Default Gateway Service. A Network entity could provide as few or as many different Network Services as you want, and OEMs would charge you accordingly based on the number of Network Services you enable for the Network entity. Those services are executed sequentially, and this succession is critical to understand NSX-T Network architecture.
For the Network architecture of NSX, VMware opted to make the Default Gateway Service incorporated within the ESXi hypervisor (via VIBs), as well as the Switching Service (Logical Switch). The same happens with KVM for NSX-T. In a marketing move, VMware called the Default Gateway Service the Distributed Logical Router…which now they just shorten to Logical Router. In NSX-T, they also included the Default Gateway Service and Logical Switch inside another OS, namely Photon OS (Edge Node). We'll come back to that in a bit.
The Default Gateway Service and the Logical Switch Service were the only NSX-T Network Services VMware included inside ESXi (and KVM). But VMware included ALL NSX-T Network Services inside Photon OS. This design decision presents a particular challenge when you want to provide additional Network Services for workloads.
How Does It Work?
If you have Workload1 in ESXi01 and it wants to simply communicate (think of a ping) with Workload2 in ESXi01, but on a different subnet, then the Default Gateway Service running the ESXi hosts can provide the Network Service required.
But if the two workloads need to get fancier than that, like use NAT, or start Load Balancing, then the ESXi host won't be able to provide those additional services. You would need to leverage the Photon OS, where those Network Services may be provided.
So how do you architect a Network platform where the Network Services are provided by different Network entities? By connecting the Network entities via a concealed Logical Switch (which VMware calls the Auto-Plumbed Transit Segment…so yeah, we'll just call it concealed Logical Switch in this blog post, represented in green in the next diagram).
When NSX-T is asked to provide a Network Servicethat is not available in ESXi (for example NAT), NSX-T Manager instantiates the requested Network Service in Photon OS and creates a concealed Logical Switch. It then connects the Default Gateway Service (both of which are present within the Photon OS and the ESXi hosts) and the Network Service (which only resides in the Photon OS) to the concealed Logical Switch. So now traffic from one workload receives the Default Gateway Service in the ESXi host it is running, the traffic is forwarded via the concealed Logical Switch to the Photon OS that provides the other Network Service. Then the traffic is forwarded to its destination (the other workload) via the Logical Switch the other workload is connected to.
Be aware that once the traffic reaches the Photon OS, multiple Network Services can be provided before the traffic is forwarded onward.
Time For Some Corrections and Clarifications
Earlier in this blog post, we stated that a Logical Router is the new name for the Distributed Logical router. That is not completely accurate. It is probably best to think of a Logical Router as a Network entity that has a self-contained routing table (or multiple if using VRF). The reason it is best to think about it this way is because VMware decided (because it could) to create two different types of Logical Routers. One is called the Tenant 0 (T0) and the other Tenant 1 (T1).
The reasons behind this design decision are inconsequential for the topic of this blog post. Suffice it to say that both T0 and T1 provide Network Services as described above but NSX-T restricts the types of Network Services that a T1 Logical Router can provide. For example, a T1 router can't provide any Dynamic Routing Services (BGP).
Something that we know causes confusion, and requires clarification in NSX-T, is the use of the terms Distributed Router (we know, we know) and the Services Router. The Logical Router happens to be composed of the Distributed Router and sometimes the Services Router as well. If the Network Service (not including Layer 2 Services like Switching) can reside in the hypervisor, then you have a "Network entity" within the Logical Router called the Distributed Router that provides that Network Service. If the Network Service only resides in the Photon OS, then you have a "Network entity" within the Logical Router called the Services Router.It is the Distributed Router and the Services router that connect to the concealed Logical Switch.
And for a final clarification, we had mentioned that the Default Gateway Service is present in the Photon OS: that's not quite the full story…it would be more accurate to say that Photon OS instantiates the Distributed Router that provides the Default Gateway Service.
And Now to Tie It All Up with Edge Nodes
We've been saying Photon OS all along. The Photon OS can be deployed in a Virtual Machine or on a bare metal server. Which one you use mostly depends on the type of performance that is required and/or some use cases where you may want to provide NSX-T Network Services to workloads outside the NSX-T domain. But it is not really the Photon OS that is providing the Network Service; it is NSX-T code that has been embedded inside the Photon OS. An entity running the Photon OS and providing NSX-T Network Services, as we have described in this blog post, is called an Edge Node.
Also, at the risk of REALLY over simplifying, you deploy Edge Nodes in Edge Node clusters. There are a few reasons for this, but the most important ones are that you get Network Services high availability (which is something you want for stateful Network Services such as VPN) and it givesNSX-T Manager the flexibility to instantiate the requested Network Service in the Edge Node,which makes the most sense at that moment in time.
We hope that this helps to bring some clarity when dealing with Edge Nodes. At Hydra 1303 we provide assistance in designing and implementing NSX-T environments that best align with your business goals. Feel free to reach out, we are here to help.