It’s been a while since I wrote my last post. Many many things happened both in my life as well as work. Not an excuse for my lack of posting, but I do plan to get back to blogging more consistently in the year 2023.
What do we have here today?
In today’s edition, I bring you I script I wrote to move T1 gateways across Edge Clusters in NSX-T. It can programmatically move hundreds of T1s in a couple minutes.
Why would I need to move my T1 Gateways across Edge Clusters?
There are multiple scenarios that would trigger the need to move / evacuate T1 gateways to a different Edge Cluster. The most common are:
During a NSX-V to NSX-T migration, Migration Coordinator will, by default, put all T1 gateways in the same Edge Cluster that is being used for the T0s. In an architecture where there is a dedicated Edge Cluster for T0 ECMP / Uplink, and Edge Cluster/s used for T1 Stateful services (such as load balancing) this is not ideal.
A 10-node XL Edge Cluster can only host up to 400 Small load balancers. Going over this limit will require to build an additional Edge Cluster. vRA can only deploy Load Balancers to a single Edge Cluster at any given time per any given Network Profile. If we reach the limit, we either create the new cluster and change the network profile to the new cluster, or, we can migrate the current T1s to the new cluster, and keep using the one we previously had in the network profile
Rebalancing of T1 Gateways across Edge Cluster for maintain a similar number of T1s across all edge clusters.
How do I use this script?
In the initial comments of the script there is an explanation of the usage
<################################################
Move (T1s) across edge clusters
Author: @ldelorenzi - Jan 23
Usage:
moveT1s.ps1 -nsxUrl <NSX Manager URL (with HTTPS)) -sourceClusterName <Edge Cluster Name> -destinationClusterName <Edge Cluster Name> -execute <$true/$false> -count <count of load balancers to move>
Credentials will be asked at the beginning of the run
################################################>
To dive a little bit deeper into these parameters:
nsxUrl: The NSX-T manager we will be hitting with this script. Including HTTPS
sourceClusterName: The name of the Edge Cluster that hosts the T1 Gateways we want to move
destinationClusterName: The name of the Edge Cluster that will receive the T1 Gateways from the source cluster
execute: By default, the execute flag is set to false. This means that if no value is used in the script, it will default to false. If execute is false, the script will only show us what T1s were found in the source cluster and therefore will be moved to the destination cluster
count: If you don’t want to fully evacuate the cluster and you want to just move some T1s from source cluster to destination cluster, you can set a value for the count parameter and that will limit the amount of T1s that are moved.
Interesting things about the Script
If you look at the code you will see that I built my own wrapper for invoke-RestMethod called restCall – this function has logging included as well as retries. If you’re going to have a lot of REST API calls in your scripts, it could make sense to include something like this!
The ‘movement’ of T1 Gateways actually involves patching the T1 SR object with its new cluster ID. The script finds the Edge Cluster IDs using the names provided at the beginning of the run. This makes it friendlier for users / admins since just the name can be used instead of having to find the id.
Closing Note
I hope you enjoy this post and make use of this script in your environments. If you liked this, please share it!
On today’s post, as a continuation of the previous post (in which we talk about the VCF MGMT Domain) I will show a step by step guide of how to do a complete deployment of a VCF Workload Domain, subject to some specific constraints based on a project I was working on, using VCF’s API!
What’s this non-standard architecture like?
In this specific environment, I had to play around the following constraints
4 hosts with 256GB of RAM using vSAN, check the previous post for information about the MGMT domain!
3 Hosts with 256GB of RAM, using vSAN
3 Hosts with 1.5TB of RAM, using FC SAN storage
Hosts using 4×10 NICs
NIC Numbering not being consistent (some hosts had 0,1,2,3 – other hosts had 4,5,6,7 – even though this can be changed editing files on the ESXi, it is still a constraint and can be worked around using the API)
With this information, the decision was to:
Separate the Workload Domain into 2 clusters, one for NSX-T Edges and the other one for Compute workloads, given the discrepancies in RAM and storage configuration, they could never be part of the same logical cluster.
This looks something like…
It is impossible to deploy this using the GUI, due to the following:
Can’t utilize 4 Physical NICs for a Workload Domain
Can’t change NIC numbering or NIC to DVS uplink mapping
So we have to do this deployment using the API! Let’s go!
Once we have the token, we can use it in other API calls until it expires and we just either refresh it or create a new one. All the VCF API calls that are generated to SDDC manager (not internal API calls) will require the usage of a bearer token.
List of steps to create a workload domain
Commission all hosts from SDDC manager and create network profiles appropriately to match the external storage selection – In this scenario, we will have a network profile for the vSAN based hosts, as well as another network profile for the FC SAN based hosts. Hosts can also be commissioned via API calls (3.65 in the API reference) instead of doing it via the GUI, but the constraints I had did not prevent me from doing it via GUI.
Get all the IDs for the commisioned hosts – The API Call is “2.7.2 Get the Hosts” and it is a GET call to https://sddc_manager_url/v1/hosts using Bearer Token authentication
Create the Workload Domain with a single cluster (Compute) – The API Call is “2.9.1 Create a Domain”
Add the Secondary Cluster (Edge) to the newly-created workload domain – The API Call is “2.10.1 Create a Cluster”
Create the NSX-T Edge Cluster on top of the Edge Cluster – The API Call is “2.37.3 – Create Edge Cluster”
For each of these tasks, we should first validate our JSON body before executing the API call. We will discuss this further.
You might ask, why don’t you create a Workload Domain with two clusters instead of first creating the Workload Domain with a single cluster and then adding the second one?
This is something I hit during the implementation – If we check the Clusters object on the API, we can see it is an array, so it should be able to work with multiple cluster values.
"computeSpec": { "clusterSpecs": [
The info on the API call also points to the fact that we should be able to create multiple clusters on the “Create Domain” call.
Even worse, the validation API will validate an API call with multiple clusters
However, I came to learn (after trying multiples times and contacting the VCF Engineering team, that this is not the case)
For example, if our body looked something like this (with two clusters), the validation API will work!
However, when we go ahead and try to create it, it will fail, and we will see the following error on the logs.
ERROR [vcf_dm,02a04e83325703b0,7dc4] [c.v.v.v.c.v1.DomainController,http-nio-127.0.0.1-7200-exec-6] Failed to create domain com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException: Found multiple clusters for add vi domain. at com.vmware.evo.sddc.common.services.adapters.workflow.options.WorkflowOptionsAdapterImpl.getWorkflowOptionsForAddDomainWithNsxt(WorkflowOptionsAdapterImpl.java:1222)
So, as mentioned earlier, we need to first create our domain (with a single cluster), and then add the 2nd cluster!
1: Create a Workload Domain with a Single Cluster
We will first create our Workload Domain with the compute cluster, which in this scenario, uses external storage, and will use the secondary distributed switch for overlay traffic.
This is my API call body based on the API reference, to create a Workload Domain with a single cluster of 3 hosts, using two VDS, 4 physical NICs numbered from 0 to 3 and external FC storage, using the host IDs that I got after the previous step.
The DVS that is going to be used for overlay traffic must have the isUsedByNsxt flag set to true. In the case of a 4 NIC and 2 VDS deployment such as this one, it shouldn’t have any of the management, vMotion or vSAN traffic.
With the body, to execute the VALIDATE and EXECUTE api calls, we will do the following: (high level overview since we can use any REST API tool such as Postman, curl, invoke-restmethod, or any wrapper from any language that can execute REST calls)
The list of steps will be the same for all the POST API calls, changing the URL to match each specific call.
We should continue editing and retrying in case of errors until we get the validation to pass, do not attempt to execute the API call without validating it first!
The deployment will start and after a couple minutes we will see in the SDDC console that it was successful.
If it were to fail for whatever reason, we can troubleshoot the deployment by checking where it failed on the SDDC console as well as checking logs, but as long as the validation passes, it should not be a problem with the body we’re sending.
2: Adding a 2nd Cluster to the existing workload domain
To add a cluster to an existing domain, the first thing we need is to get the ID of the domain, that can easily be done with a GET call to https://sddc_manager_url/v1/domainsand selecting the ID of the workload domain we just created.
Once we get the ID, this is the body (following the API reference) to add a new cluster to an existing domain.
Even though we don’t need the cluster to be prepared for NSX-T (since it will only be used for Edges) setting the isUsedByNSXT flag to true will make the secondary VDS used by the uplink portgroups once we create a T0, which is what we want in this scenario – otherwise, we would not be using the 3rd and 4th NICs at all.
As discussed earlier, we should first run the POST call to validate in this case, the URL is https://sddc_manager_fqdn/v1/clusters/validations and after the body is validated, proceed with the creation removing validation from the URL
Last but not least, we need to create our NSX-T Edge Cluster on top of the 2nd cluster on the domain!
3: Create NSX-T Edge Cluster
The last piece of the puzzle is creating the NSX-T Edge Cluster, to allow for this workload domain to leverage overlay networks and communicate to the physical world.
To create the NSX-T Edge Cluster, we first need to get the Cluster ID of the cluster we just created (how many times can you say cluster in the same sentence?)
Now that we have the ID, this is the body to create two Edge Nodes, configure management, TEP and uplink interfaces, configure a T0 and a T1 instance, as well as configuring BGP peering on the T0 instance!
As mentioned before, please run the VALIDATE call first, in this scenario, a POST call to https://sddc_manager_fqdn/v1/edge-clusters/validations – after validation is passed, proceed to execute the call without the validations on the URL.
After this procedure is finished, we will have our workload domain with two clusters as well as a T0 gateway completely configured and ready to go! Simple and quick, isn’t it?
Closing Note
Leveraging APIs for VCF can help us not only to work with architectures or designs that are not able to be implemented due to GUI restrictions, but also greatly speed up the time we take in doing so!
I hope you enjoyed this post, and if you have any concerns, or want to share your experience deploying VCF via API calls, feel free to do so!