AWS Transit Gateway Introduction (VI)

In the previous post, we covered VPC Peering, which is a quick and easy way to create a connection between two VPCs. We also discussed its limitations, primarily that it is non-transitive. This means if VPC 'A' is peered with VPC 'B', and VPC 'B' is peered with VPC 'C', VPC 'A' cannot communicate with VPC 'C' through VPC 'B'. Because of this, to connect multiple VPCs together, you need to create a full mesh, where every VPC has a direct peering connection to every other VPC.

AWS VPC Peering
In this post, we will continue to expand on VPC connectivity by looking at what AWS VPC Peering is and how to configure one.

This complexity (when you have many VPCs) is why, in this post, we will look at AWS Transit Gateway (TGW). A Transit Gateway is an incredibly important networking resource in AWS that solves these scaling challenges. You will see the TGW featured in many modern AWS architecture diagrams because of the flexibility and simplicity it provides.

As always, if you find this post helpful, press the ‘clap’ button. It means a lot to me and helps me know you enjoy this type of content. If I get enough claps for this series, I’ll make sure to write more on this specific topic.

What is AWS Transit Gateway (TGW)?

An AWS Transit Gateway (TGW) acts as a network hub that you can use to interconnect your Virtual Private Clouds (VPCs) and on-premise networks.

Suppose you have 10 VPCs and 3 on-premise locations connected via Site-to-Site VPNs (in the same region), and they all need to communicate with each other. Instead of creating a complex mesh of VPC peering and VPN connections, you can use a Transit Gateway (TGW). You simply attach all 10 VPCs and all 3 VPN connections to the central TGW, and they can then communicate with each other through this single hub. Of course, you still have full control over the routing and can define exactly which networks are allowed to talk to each other.

Just like any other networking resource, you create the Transit Gateway under the VPC service.

Transit Gateway Pricing

There is no charge for just creating the Transit Gateway itself; costs begin once you attach your networks to it. For each VPC you attach, you pay an hourly price of about $0.06 in the London region, and you also pay $0.02 for every gigabyte of data that the TGW processes.

When you attach a VPN, you pay for both the VPN connection (about $0.05 per hour) and the TGW attachment fee ($0.06 per hour).

💡
As usual, I don't cover pricing in great detail, so please make sure you read and understand the pricing properly before you deploy anything to the cloud and get surprised with a bill 🙂

Trasit Gateway Concepts

Similar to how a VPC has route tables that you associate with subnets, a Transit Gateway (TGW) also has its own route table. You associate your attachments, like VPCs and VPNs, with this TGW route table, and you can also choose to propagate routes automatically (more on this later).

As you may know, if you follow the series, I have two VPCs in the same AWS account.

VPC Name

VPC CIDR

Private Subnet

Availability Zone

lab-vpc

10.200.0.0/16

10.200.10.0/24

eu-west-2a

dev-vpc

10.201.0.0/16

10.201.10.0/24

eu-west-2a

We have one EC2 instance in each of these private subnets, and our goal is to attach both VPCs to a new Transit Gateway and enable them to ping each other.

Now, let's cover an important concept of TGW attachments. Even though our lab only uses one AZ, let's pretend we also have workloads in a second subnet in Availability Zone 'b'. When you attach a VPC to a TGW, you must choose which subnets the TGW will place its own ENIs into. For the TGW to be able to route traffic to an Availability Zone, it must have a network interface present in that AZ. Because of this, AWS recommends creating small, dedicated subnets in each AZ just for the purpose of placing the TGW's ENIs.

If you have workloads in AZ 'a' and AZ 'b' but you only choose a subnet from AZ 'a' during the VPC attachment, the TGW will have no presence in AZ 'b'. As a result, even though the VPC is attached, the TGW will not be able to forward any traffic to your resources in AZ 'b'.

So, in a nutshell, when you attach a VPC to a TGW, you are essentially telling the Transit Gateway to place one of its own network interfaces (an ENI) into your VPC for each Availability Zone where you need connectivity. To do this, you must choose a subnet within each of those AZs for the TGW to use. While you can use an existing subnet, the best practice recommended by AWS is to create a small, dedicated subnet in each AZ just for this purpose.

Creating the Transit Gateway

Let's first create the Transit Gateway. From the VPC console, you can navigate to 'Transit Gateways' and choose to create one. For this lab, we are going with all the default values for now.

The only things I'm defining are a name (tgw-01) and the ASN. If you look at the default options, you will see that 'Default route table association' and 'Default route table propagation' are ticked by default. This means that when we attach a VPC to this TGW, the attachment will automatically be associated with the TGW's default route table, and the VPC's CIDR will be automatically propagated (added as a route) to this default route table. Once you click create, wait for a few minutes for the TGW to become available.

Transit Gateway Attachments

Once the TGW is available, we can start attaching our VPCs. But first, remember we mentioned that the TGW needs to place an ENI in each Availability Zone where we need connectivity. To accommodate this, I have created two, dedicated subnets in each of our VPCs, one in AZ 'a' and one in AZ 'b'.

💡
Please note that you do not need a full /24 subnet just for the Transit Gateway attachments. You can actually divide a single /24 into smaller blocks for each AZ, or go with an even smaller CIDR like a /29, which is more than enough. For this lab, however, I'm using a /24 for each dedicated subnet just to keep things simple and not make your life harder (and mine) with subnetting calculations 😄

Now we can go to 'Transit Gateway Attachments' and create the attachments for both our lab-vpc and dev-vpc. When you create the attachment, you need to select the VPC you want to attach and, most importantly, choose the subnets where the TGW will place its ENIs. Since we have workloads in two AZs (the imaginary workload in AZ-b), I will select the two dedicated TGW subnets we created. If you have workloads in all three AZs, remember to create and select subnets in all three.

You might have noticed that the eu-west-2c subnet option is greyed out. This is because the AWS console knows that we do not currently have a subnet created in that specific Availability Zone within the VPC we are attaching. In most production environments, you are likely to have resources running in all three AZs for high availability, so it is standard practice to also place TGW attachment subnets in all three AZs.

Transit Gateway ENI

To show you that the Transit Gateway places an ENI in the subnets you select, you can head over to the EC2 service console and click on 'Network Interfaces' under 'Network & Security'.

As you can see in the screenshot, there are now four new ENIs with the interface type 'transit_gateway'. Each one has been placed in one of the dedicated subnets we created across our two VPCs and has been assigned a private IP from that subnet's range. This entire process is invisible to us and we don't have to do anything manually; the Transit Gateway service does it for us in the background when we create the attachments.

Transit Gateway Route Tables

Next, let's look at the default route table that was created with our Transit Gateway. First, if you check the 'Associations' tab, you will see that both of our VPC attachments are already associated with this route table. This happened automatically because we left 'Default route table association' ticked when we created the TGW.

Next, if you look at the 'Propagations' tab, you will see that propagation is enabled for both VPC attachments. Again, no surprises here, as we also left 'Default route table propagation' enabled.

Finally, and most importantly, if you check the 'Routes' tab, you will see that the route table now contains routes for both VPC CIDRs (10.200.0.0/16 and 10.201.0.0/16). These routes were added automatically because propagation is enabled.

The Transit Gateway route table is an important concept to understand, so make sure you understand it properly. Anytime traffic arrives at the TGW from one of the attachments, the TGW receives the traffic and checks which route table that attachment is associated with to make its routing decision. If your attachment is not associated with any route table, or if it is associated but the route table doesn't have a specific route to the destination, the traffic is simply dropped. Remember that when you associate an attachment but don't propagate its route, it means the route table still doesn't have the route to that attachment's CIDR.

VPC Subnet Route Tables

Please note that for traffic to flow, you must now also update the subnet route tables within each VPC. You need to add a new route in each VPC's route table that points to the Transit Gateway as the next hop for any traffic destined towards the other VPC's CIDR block.

Testing and Verification

Finally, it's time for testing. With the Transit Gateway created, the VPCs attached, and the route tables updated, we can now verify that our instances can communicate.

I'm going to SSH into the EC2 instance in the lab-vpc (which has the IP 10.200.10.187) and then try to ping the private IP of the instance in the dev-vpc (10.201.10.160). The traffic should flow from the source instance, through the TGW, and to the destination instance.

As you can see from the screenshot, the ping succeeds as expected. This confirms that our Transit Gateway is correctly routing traffic between the two VPCs.

TGW Custom Route Tables

Remember when we created the Transit Gateway, we went with the defaults for 'Default route table association' and 'Default route table propagation'. Let's go back and modify the TGW by unticking them to see what happens.

Before I do this, I have deleted both VPC attachments so we can start fresh. You can do this by simply going to the 'Transit Gateway Attachments' section, selecting the attachments, and deleting them.

Then, I selected the TGW and chose to modify it, unticking both the association and propagation options.

After saving this change, I created a brand new TGW route table and deleted the original default one that was created automatically with the TGW. In this new route table, the default association and propagation are set to 'No'

With our new configuration in place, I then went ahead and re-attached both the lab-vpc and dev-vpc to the Transit Gateway, just as we did before.

At this point, all we have is one empty TGW route table and two attachments that are not associated with any route table. Even if we try to ping between the EC2 instances, it's not going to work. The ping from the source EC2 instance will travel to the TGW, but because the VPC attachment is not associated with a route table, the Transit Gateway doesn't know what to do with the traffic, so it will just be blackholed.

So, let's go ahead and associate our VPC attachments with this new route table. To do this, you select the route table, go to the 'Associations' tab, and create an association for each of our two VPC attachments. You will need to wait a few minutes for the state to change to 'Associated'.

Even now, if you try to ping, it's still not going to work. The ping from the EC2 instance will come to the TGW, and the TGW will look at the route table associated with the source attachment. However, that route table is still empty because we disabled automatic propagation and haven't added any routes manually.

Of course, you could add the routes manually, but let's enable propagation for both attachments to this route table instead. To do that, again, select the route table, go to the 'Propagations' tab, and create a propagation for each of the two VPC attachments. Once enabled, the routes will automatically appear, and the ping test will finally succeed.

TGW Attachments, Associations and Propagations

You might be wondering, 'Okay, the end result is the same, so why would we even bother managing this manually when the default route table can do it all automatically?'. That's a great question. For our simple lab, where everything needs to talk to everything else, the default setup is more than enough.

However, in a real-world environment, you will often need to create isolated routing domains. Now, let's look at a more realistic scenario to understand why manually managing TGW route tables is so useful. Imagine you have multiple 'Prod' VPCs, multiple 'Dev' VPCs, and a 'Shared Services' VPC that might contain tools like monitoring or authentication servers. The security requirement could be that the 'Prod' VPCs should never be able to talk to 'Dev' VPCs, but both 'Prod' and 'Dev' need to be able to access the 'Shared Services' VPC.

To achieve this, we would first attach all of our VPCs to the Transit Gateway. Then, instead of using the default route table, we would create three new, separate TGW route tables - one for 'Prod', one for 'Dev', and one for 'Shared Services'.

For the 'Prod' route table, we would associate all of our 'Prod' VPC attachments to it. Then, for the propagations, we would only enable route propagation from the other 'Prod' VPC attachments and the 'Shared Services' VPC attachment. This means the 'Prod' route table will only learn routes to other 'Prod' VPCs and the shared services, effectively isolating it from the 'Dev' environment. We would then repeat this exact same logic for the 'Dev' route table, associating the 'Dev' VPCs and only propagating routes from other 'Dev' VPCs and the 'Shared Services' VPC.

Finally, for the 'Shared Services' route table, we would associate our 'Shared Services' VPC attachment. But for the propagations, we would enable route propagation from all of the 'Prod' and 'Dev' attachments. This allows the shared services VPC to learn the routes back to all the environments that need to access it, while the 'Prod' and 'Dev' environments remain completely isolated from each other. This is how you build secure, multi-tenant networks in AWS using a Transit Gateway.

💡
The key concept to remember here is that you can only associate one attachment with one single TGW route table. However, you can propagate the routes from that attachment to multiple different TGW route tables.

Closing Up

I hope you found this introduction to AWS Transit Gateway useful. We have seen how to create a TGW, attach our VPCs to it, and use both the default and custom route tables to control traffic flow.

In the next post, we will continue with this topic and cover how to connect on-premise networks using a Transit Gateway VPN attachment. We will also look at how to share a single Transit Gateway across multiple AWS accounts using Resource Access Manager (RAM).