VMware Cloud on AWS: From Zero to TKG
Gilles Chekroun
Lead VMware Cloud on AWS Solutions Architect
---
With the recent release of Tanzu Kubernetes Grid (aka TKG), the updated preview from William Lam and the excellent post from Alex Dess, I wanted to use the Terraform work I did in previous blogs here and here and try to automate the complete deployment from creating the SDDC, configuring the NSX-T networking and security and deploying the TKG clusters.
I wanted also to give credits to Tom Schwaller for helping me on various traps in this whole process.
Lead VMware Cloud on AWS Solutions Architect
---
With the recent release of Tanzu Kubernetes Grid (aka TKG), the updated preview from William Lam and the excellent post from Alex Dess, I wanted to use the Terraform work I did in previous blogs here and here and try to automate the complete deployment from creating the SDDC, configuring the NSX-T networking and security and deploying the TKG clusters.
I wanted also to give credits to Tom Schwaller for helping me on various traps in this whole process.
Terraform + Ansible = buddies
In this exercise, I will use Terraform to deploy the VMware Cloud on AWS infrastructure and Ansible to configure and deploy the TKG clusters.
Recap on TKG+ on VMC
There are many many posts around TKG and the short description is that Tanzu Kubernetes Grid leverages Cluster API to bring declarative statements for the creation, configuration and management of Kubernetes Clusters.
VMware Tanzu Kubernetes Grid Plus on VMware Cloud on AWS enables you to deploy your SDDC in the cloud, with all the required components needed to architect and scale Kubernetes to fit your needs.
Software Setup
My setup is the following:
- Macbook as local host to run terraform and ansible playbooks.
(vmc)$ terraform version
Terraform v0.12.24
(vmc)$ ansible --version
ansible 2.9.6
(vmc)$ python --version
Python 3.6.5
Lab Setup
VMware Cloud on AWS SDDC
- Deployed using Terraform VMC provider.
Attached VPC
- Deployed using Terraform AWS provider
EC2 as TKG CLI Host
- Using TKG official AMIs with Kubernetes installed (available here) but also coded in variables.tf
S3 hosting all TKG Binaries
- Simplest way to host our TKG OVAs (download from here) and use GOVC to deploy templates in VMC vCenter
Step 1
Credentials and bash script
I decided to export my credentials to my ENV variables for a few reasons:
- my AWS console is now controlled internally by VMware and is changing my AWS access keys and secret keys on a regular basis
- I will need to somehow get these variables to my EC2 with Ansible to sync the content of the S3 bucket to my EC2 using AWSCLI
- Terraform can also import variables if they are in the format TF_VAR_xxxx
- Easy to export variables in a shell script and
To make things easy, I have a deploy-lab.sh script that will prompt for all the variables needed. This script actually deploys the complete environment.
#!/usr/bin/env bash clear echo -e "\033[1m" #Bold ON echo " ===========================" echo " TKG on VMC deployment" echo " ===========================" echo "===== Credentials =============" echo -e "\033[0m" #Bold OFF DEF_ORG_ID="xxxxx" read -p "Enter your ORG ID (long format) [default=$DEF_ORG_ID]: " TF_VAR_my_org_id TF_VAR_my_org_id="${TF_VAR_my_org_id:-$DEF_ORG_ID}" echo ".....Exporting $TF_VAR_my_org_id" export TF_VAR_my_org_id=$TF_VAR_my_org_id echo "" DEF_TOKEN="xxxxx" read -p "Enter your VMC API token [default=$DEF_TOKEN]: " TF_VAR_vmc_token TF_VAR_vmc_token="${TF_VAR_vmc_token:-$DEF_TOKEN}" echo ".....Exporting $TF_VAR_vmc_token" export TF_VAR_vmc_token=$TF_VAR_vmc_token echo "" ACCOUNT="xxxxx"" read -p "Enter your AWS Account [default=$ACCOUNT]: " TF_VAR_AWS_account TF_VAR_AWS_account="${TF_VAR_AWS_account:-$ACCOUNT}" echo ".....Exporting $TF_VAR_AWS_account" export TF_VAR_AWS_account=$TF_VAR_AWS_account echo "" ACCESS="xxxxx" read -p "Enter your AWS Access Key [default=$ACCESS]: " TF_VAR_access_key TF_VAR_access_key="${TF_VAR_access_key:-$ACCESS}" echo ".....Exporting $TF_VAR_access_key" export TF_VAR_access_key=$TF_VAR_access_key echo "" SECRET="xxxxx" read -p "Enter your AWS Secret Key [default=$SECRET]: " TF_VAR_secret_key TF_VAR_secret_key="${TF_VAR_secret_key:-$SECRET}" echo ".....Exporting $TF_VAR_secret_key" export TF_VAR_secret_key=$TF_VAR_secret_key echo "" export ANSIBLE_HOST_KEY_CHECKING=False echo "" echo -e "\033[1m" #Bold ON echo "===== PHASE 1: Creating SDDC ===========" echo -e "\033[0m" #Bold OFF cd ./p1/main terraform apply cd ../../ export TF_VAR_host=$(terraform output -state=./phase1.tfstate proxy_url) read -p $'Press enter to continue (^C to stop)...\n' cd ./p2/main echo -e "\033[1m" #Bold ON echo "===== PHASE 2: Networking and Security ===========" echo -e "\033[0m" #Bold OFF echo ".....Importing CGW and MGW into Terraform phase2." if [[ ! -f ../../phase2.tfstate ]] then echo "Importing . . . . ." terraform import -lock=false module.NSX.nsxt_policy_gateway_policy.mgw mgw/default terraform import -lock=false module.NSX.nsxt_policy_gateway_policy.cgw cgw/default fi echo ".....CGW, MGW already imported." terraform apply echo "" read -p $'Press enter to continue (^C to stop)...\n' echo -e "\033[1m" #Bold ON echo "===== Ansible will prepare the TKG environment ===========" echo -e "\033[0m" #Bold OFF cd ../../ansible/playbooks echo "====== 1) Gathering Terraform outputs ========" ansible-playbook ./10-terraform-info.yaml echo "====== 2) Prepare EC2 ========" ansible-playbook ./11-open_terminal.yaml echo "====== 3) Open Terminal window ========" ansible-playbook ./12-copy_files_to_EC2.yaml echo "====== 4) Deploy templates in vCenter ========" ansible-playbook ./13-deploy_templates.yaml echo "====== 5) Deploy TKG Clusters ========" ansible-playbook ./14-Deploy_TKG_clusters.yaml
Terraform variables
The variables.tf file will contain important parameters to set BEFORE we can start anything.variable "AWS_region" {default = "eu-central-1"} variable "TKG_net_name" {default = "tkg-network"} variable "TKG_photon" {default = "photon-3-v1.17.3_vmware.2"} variable "TKG_haproxy" {default = "photon-3-capv-haproxy-v0.6.3_vmware.1"} variable "TKG_EC2" {default = "tkg-linux-amd64-v1.0.0_vmware.1"} variable "TKG_S3_bucket" {default = "set-tkg-ova"}Note that the file name for photon, haproxy and EC2 have NO EXTENSIONS
Step 2
Terraform Phase 1
In this lab I will use 2 phases:
- Phase 1 for:
- Deploying the AWS attached VPC with a subnet and an EC2 that will be our TKG CLI host.
- Deploying a 1 node SDDC
- Phase 2 for
- Configuring all NSXT segments, Groups, FW rules needed for TKG
First we need to compile the Terraform VMC provider from the source.
- Create a tmp directory and execute:
- Create a tmp directory and execute:
git clone https://github.com/terraform-providers/terraform-provider-vmc/
cd terraform-provider-vmc/
go get
go build -o terraform-provider-vmc
chmod 755 terraform-provider-vmc
Place the compiled binary in the main terraform directory for phase 1 and do:
rm -rf .terraform
terraform init
chmod 755 terraform-provider-vmc
Place the compiled binary in the main terraform directory for phase 1 and do:
rm -rf .terraform
terraform init
The most important part in phase 1 is the terraform output.
We will export all output parameters in JSON format and use them for our ansible playbooks as well.
The outputs appear at the end of terraform apply in the following format:
terraform output -state=../../phase1.tfstate -json > outputs.json
and we get the outputs.json file in the format:
Place the compiled binary in the main terraform directory for phase 2 and do:
rm -rf .terraform
terraform init
Define the TKG network to be created by the NSX module. It must de DHCP enable with enough IP addresses for our clusters deployments:
To do that we need to do:
terraform import -lock=false module.NSX.nsxt_policy_gateway_policy.mgw mgw/default
We will export all output parameters in JSON format and use them for our ansible playbooks as well.
The outputs appear at the end of terraform apply in the following format:
Outputs:
GOVC_vc_url = https://vcenter.sddc-3-127-179-50.vmwarevmc.com/sdk
SDDC_mgmt = 10.10.10.0/23
TKG_DNS = ec2-3-125-18-211.eu-central-1.compute.amazonaws.com
TKG_EC2 = tkg-linux-amd64-v1.0.0_vmware.1
TKG_IP = 3.125.18.211
TKG_S3_bucket = set-tkg-ova
TKG_haproxy = photon-3-capv-haproxy-v0.6.3_vmware.1
TKG_net_name = tkg-network
TKG_photon = photon-3-v1.17.3_vmware.2
cloud_password = <sensitive>
cloud_username = cloudadmin@vmc.local
key_pair = keypair
proxy_url = nsx-3-127-179-50.rp.vmwarevmc.com/vmc/reverse-proxy/api/orgs/84e84f83-bb0e-4e12-9fe0-aaf3a4efcd87/sddcs/a4565d5c-1d34-42e9-95c6-07ec52870510
vc_url = vcenter.sddc-3-127-179-50.vmwarevmc.com
They can be converted to JSON with a simple command:terraform output -state=../../phase1.tfstate -json > outputs.json
and we get the outputs.json file in the format:
{ "GOVC_vc_url": { "sensitive": false, "type": "string", "value": "https://vcenter.sddc-3-127-179-50.vmwarevmc.com/sdk" }, "SDDC_mgmt": { "sensitive": false, "type": "string", "value": "10.10.10.0/23" }, "TKG_DNS": { "sensitive": false, "type": "string", "value": "ec2-3-122-115-96.eu-central-1.compute.amazonaws.com" }, "TKG_EC2": { "sensitive": false, "type": "string", "value": "tkg-linux-amd64-v1.0.0_vmware.1" }, etc. . .
The TKG EC2 instance
I am using the AMI provided by VMware for every AWS region and this includes Kubernetes already. I just need to add docker and we are good to go.
Since i want to use this EC2 to provision the TKG templates in my vCenter, I will also install GOVC and JQ (I like JQ).
To do that, at EC2 instance creation, I can supply a "user-data.ini" code that will be executed upfront.
I am using a t2.medium instance with 20GB of disk.
resource "aws_network_interface" "TKG-Eth0" { subnet_id = var.Subnet10-vpc1 security_groups = [var.GC-SG-VPC1] private_ips = [cidrhost(var.Subnet10-vpc1-base, 200)] } resource "aws_instance" "TKG" { ami = var.TKG-AMI[var.AWS_region] instance_type = "t2.medium" root_block_device { volume_type = "gp2" volume_size = 20 delete_on_termination = true } network_interface { network_interface_id = aws_network_interface.TKG-Eth0.id device_index = 0 } key_name = var.key_pair[var.AWS_region] user_data = file("${path.module}/user-data.ini") tags = { Name = "GC-TKG-vpc1" } }The user-data looks like:
#!/bin/bash sudo yum update -y wget https://github.com/vmware/govmomi/releases/download/v0.22.1/govc_linux_amd64.gz gunzip govc_linux_amd64.gz mv govc_linux_amd64 govc sudo chown root govc sudo chmod 755 govc sudo mv govc /usr/bin/. sudo yum install jq -y sudo amazon-linux-extras install docker -y sudo service docker start sudo groupadd docker sudo usermod -aG docker ec2-user sudo chmod 666 /var/run/docker.sock
Terraform Phase 2
Before we can start with phase 2 we need to compile the NSXT terraform provider from the source as we have done similarly for VMC provider.
- Create a tmp directory and execute:
git clone https://github.com/terraform-providers/terraform-provider-nsxt/
cd terraform-provider-nsxt/
go get
go build -o terraform-provider-nsxt
chmod 755 terraform-provider-nsxt
chmod 755 terraform-provider-nsxt
rm -rf .terraform
terraform init
Define the TKG network to be created by the NSX module. It must de DHCP enable with enough IP addresses for our clusters deployments:
/*================ Subnets IP ranges =================*/ variable "VMC_subnets" { default = { TKG_net = "192.168.2.0/24" TKG_net_gw = "192.168.2.1/24" TKG_net_dhcp = "192.168.2.3-192.168.2.254" } }Once that's done, we need to import the SDDC NSXT components into terraform since VMC is a pre-build architecture.
To do that we need to do:
terraform import -lock=false module.NSX.nsxt_policy_gateway_policy.mgw mgw/default
and
terraform import -lock=false module.NSX.nsxt_policy_gateway_policy.cgw cgw/default
only once and only if we have NO terraform state file for Phase 2 (the deploy-lab.sh takes care of that)
Then, the NSX module will create TKG segment, Management Gateway rules and Compute gateway rules like:Step 3
Ansible setup
Disclaimer - I am not an expert in Ansible and I am sure there are better ways to achieve what I want but so far, I am happy with what I did ;)
Inventory
Here we have a super simple environment that consists on 2 hosts:
- My Macbook as a localhost
- The EC2 instance we want to configure.
Since the EC2 is a dynamic resource and will have dynamic public IP, I will add it to my inventory within my playbooks.
There are other ways to do that like using ec2.py module that will return a list but here I only have one instance.
Playbooks
1) Get the terraform output variables
Nothing special here2) Open a terminal window
Cool command using Mac osascript:
osascript -e 'tell app "Terminal" to do script "ssh -oStrictHostKeyChecking=no -i '{{aws_dir}}{{key_pair.value}}.pem' ec2-user@{{ TKG_IP.value }}"'
3) Copy files to EC2
I will need the json outputs, some shell scripts and ini files4) Sync my S3 bucket and deploy the templates
This one is a bit tricky.
On all AWS EC2 linux, the AWSCLI is installed but needs the AWS credentials.
Since I did not want to transfer my credentials (i could but...) I will check my local ENV variables and use the "environment:" keyword for the EC2 and lookup in my local ENV for AWS access and secret keys.
In that script, I check if my TKG folder and Resource Pool are created and for the templates deployment I use GOVC.
Tom told me that it's better to deploy a VM, do a snapshot and mark it as a template. This will be faster for future cloning! (Thanks Tom)
The second line imports the OVA in the "Templates" folder using the imported specs
The third line creates a snapshot
And the last lines marks it as a Template.
After the templates deployment, i simply unzip the TKG CLI binary, place it in /usr/bin and mark it executable.
the command tkg get mc will actually create the .tkg directory but return an empty management cluster.
The last script "config.sh" will append a few variables that are specific to our environment to the config.ini file to create a config.yaml file for management cluster deployment.
The variables are:
VSPHERE_NETWORK: tkg-network
VSPHERE_SERVER: vcenter.sddc-3-127-179-50.vmwarevmc.com
VSPHERE_USERNAME: cloudadmin@vmc.local
VSPHERE_PASSWORD: <encoded:xxxxxxxxxx>
And last line creates a small cluster called "tkg-cluster-01" with 4 workers nodes, 1 control plane and 1 load balancer. The --plan=dev can be changed to prod for more substantial environment.
Thanks for reading.
On all AWS EC2 linux, the AWSCLI is installed but needs the AWS credentials.
Since I did not want to transfer my credentials (i could but...) I will check my local ENV variables and use the "environment:" keyword for the EC2 and lookup in my local ENV for AWS access and secret keys.
# ================================================= # Sync templates from S3 and deploy # ================================================= - name: Get templates from S3 hosts: TKG_EC2 gather_facts: true vars_files: - ../credentials.yaml - ../outputs.json environment: AWS_ACCESS_KEY_ID: "{{ lookup('env','TF_VAR_access_key') }}" AWS_SECRET_ACCESS_KEY: "{{ lookup('env','TF_VAR_secret_key') }}" tasks: - name: Sync templates from S3 using the AWS CLI, deploy in vCenter and install TKG binaries shell: | aws s3 sync s3://{{ TKG_S3_bucket.value}} . ./deploy_templates.sh gunzip "{{ TKG_EC2.value }}".gz sudo mv "{{ TKG_EC2.value }}" /usr/bin/tkg sudo chmod 755 /usr/bin/tkg tkg get mc ./config.shThe next step is "syncing" my S3 bucket and deploy my templates.
In that script, I check if my TKG folder and Resource Pool are created and for the templates deployment I use GOVC.
Tom told me that it's better to deploy a VM, do a snapshot and mark it as a template. This will be faster for future cloning! (Thanks Tom)
govc import.spec ${PHOTON}.ova | jq ".Name=\"$PHOTON\"" | jq ".NetworkMapping[0].Network=\"$NETWORK\"" > ${PHOTON}.json govc import.ova -dc="SDDC-Datacenter" -ds="WorkloadDatastore" -pool="Compute-ResourcePool" -folder="Templates" -options=${PHOTON}.json ${PHOTON}.ova govc snapshot.create -vm ${PHOTON} root govc vm.markastemplate ${PHOTON}The first line imports the OVA specs and adds the proper TKG network name
The second line imports the OVA in the "Templates" folder using the imported specs
The third line creates a snapshot
And the last lines marks it as a Template.
After the templates deployment, i simply unzip the TKG CLI binary, place it in /usr/bin and mark it executable.
the command tkg get mc will actually create the .tkg directory but return an empty management cluster.
The last script "config.sh" will append a few variables that are specific to our environment to the config.ini file to create a config.yaml file for management cluster deployment.
The variables are:
VSPHERE_NETWORK: tkg-network
VSPHERE_SERVER: vcenter.sddc-3-127-179-50.vmwarevmc.com
VSPHERE_USERNAME: cloudadmin@vmc.local
VSPHERE_PASSWORD: <encoded:xxxxxxxxxx>
5) Finally deploys a TKG management cluster and a small
tasks: - name: Ensure docker daemon is running service: name: docker state: started become: true - name: Deploy TKG clusters - this task will take 20 mins - check VMC vCenter shell: | yes | tkg init --infrastructure=vsphere tkg create cluster --worker-machine-count=4 --plan=dev tkg-cluster-01The "yes | tkg" will bypass the warning: You are about to provision a Kubernetes cluster on a vSphere 7.0 cluster that has not been optimized for Kubernetes.
And last line creates a small cluster called "tkg-cluster-01" with 4 workers nodes, 1 control plane and 1 load balancer. The --plan=dev can be changed to prod for more substantial environment.
Small video for demo
Code
Complete code in my GitHub here.Thanks for reading.
Comments
Post a Comment