A common situation we see in customer networks is when there are resources with overlapping IP address ranges that must communicate with each other. Frequently this occurs when companies are acquired and have used the same private (RFC1918) address ranges. However, it can also occur when a service provider with a unique IP range must provide access to two different customers that each have the same IP range.
Network overlaps can also occur unintentionally. Make sure that you check the documentation of services and applications when building your VPCs to avoid conflicts with predefined IP addresses.
This post discusses some ways in which you can overcome this obstacle for IPv4-based networks.
Option 1: Renumber IP networks
This is always the first suggestion we make to customers. It won’t work in the service provider scenario above. However, if there’s an opportunity to renumber the networks, then it’s the best option. Although changing a network configuration isn’t easy, it avoids long-term pains such as:
· Increased network management costs: Most of the other solutions presented below require appliances or services which will have a charge attached to them. Renumbering a network isn’t free (after all, time and people cost money, too). But in the long term, it removes the ongoing cost of running the components required to connect overlapping networks together.
· Increased complexity: Generally, connecting two or more networks that overlap together is difficult! In the long term, it may prove to be increasingly complex as the application landscape grows and changes or as additional networks are added.
· Complex troubleshooting: When things go wrong, trying to figure out what’s happening; where it’s happening; and what to do about it, is complex enough without having to deal with overlapping IP addresses. This can all be confusing and mean that troubleshooting takes much longer than it otherwise could.
· Compatibility issues: All the following solutions utilize Network Address Translation (NAT) in some way. Some applications won’t work with NAT, and others will have limitations in how they can be used. You may not have applications today that doesn't work with NAT, but they could be deployed in your environment in the future. Renumbering completely avoids this problem.
· Utilizing NAT also means additional management overhead: Because applications use overlapping IP addresses, firewall rules will be complex as you keep track of and update the original and NAT IP addresses that the application use.
In general, we strongly recommend renumbering overlapping networks where possible as it is cheaper and easier in the long term.
Option 2: AWS Private Link
In 2017 AWS launched Private Link. This is a Hyperplane-based service that makes it easy to publish an API or application endpoint between VPCs, including those that have overlapping IP address ranges. It’s also ideal for service providers who must deliver connectivity to multiple customers, and thus have no control over the remote IP address range. Furthermore, it provides the same benefit to customers with complex networks where IP addresses overlap. This is by far the simplest option presented here, as it requires no change to the underlying network address scheme.
In the following diagram, you can see an application that resides in the “Provider” VPC. It has a Network Load Balancer (NLB) attached to it, and by using Private Link we can share the NLB with multiple “Consumer” VPCs. Here, the consumer VPCs overlap with each other and with the provider – the worst-case scenario.
In each consumer VPC, the Private Link endpoint appears as an Elastic Network Interface with a local IP address. In the provider, VPC, connections from the consumer VPC appear to come from a local IP address within the producer VPC. The underlying Hyperplane service is performing a double-sided NAT operation to make Private Link work.
There are added security benefits:
When establishing the Private Link connection the provider must send the owner of the consumer VPC a request. Then, the owner must approve it – the same way that VPC peering works. There’s no way for a provider to create a consumer-facing Private Link without approval.
Only configured TCP ports are allowed between the consumer and provider. This makes sure that the consumer only has access to specific resources in the provider VPC and nothing else.
There’s a way for the application in the provider VPC to establish a connection to the consumer VPC.
Finally, there is a scalability benefit – an application can be published by a provider to hundreds of consumers’ VPCs.
Redundancy comes built into Private Link in the form of the NLB. This delivers traffic to the back-end servers and consumer VPC configuration. Moreover, you choose which subnets to place endpoints in. The following diagram shows a multi-subnet environment that would be set up across multiple availability zones.
One common question from customers is how to achieve this connectivity with on-premises networks. In the following example, we have a provider VPC that’s connected to multiple independent consumers, who are in turn connected to AWS via VPN. Note that the consumers all have overlapping IP addresses, even with the provider VPC. The only challenge is to find an IP range that will be allocated to the VPC where the VPN service is attached that doesn’t overlap with the on-premises range. In this example, the on-premises clients will connect to an IP address allocated to the Private Link endpoint in the VPN VPC.
This solution also works with AWS Direct Connect as seen for Customer C in the diagram. Customer C also has a different IP range in the VPN VPC – perhaps because 172.16.0.0/16 was already in use in their network so the intermediate network must be different for them. This isn’t an issue, as the IP address range in that VPC only needs to not conflict with anything in the networks that Customer C uses. Therefore, there’s a huge range of flexibility in what can be chosen.
Setting up this option is straightforward, as it has no additional maintenance, is highly redundant, and is also highly scalable. Furthermore, it provides separation between the customer networks. If you’re creating applications in a service provider environment, then consider architecting your solution so that Private Link can deliver this level of network flexibility for you.
Note that there’s a cost for Private Link as per Aws. Some applications may not work with this solution as applications must present as a single TCP port. If you have an application that uses UDP or has multiple TCP ports and the clients must maintain back-end and server affinity then Private Link isn’t appropriate for you.
Option 3: Use multiple IP addresses ranges in VPCs
You may have an application that’s broken into different tiers – a front-end that responds to users or other application requests; and then one or more “back-end” tiers comprising middleware, databases, caches, and so on. In this environment, you can choose to have a set of front-end subnets that have non-overlapping IP addresses while the back-end subnets do overlap with other applications.
The following diagram shows three application VPCs connected to Transit Gateway. Note that the VPCs have overlapping IP address ranges but different front-end subnets are advertised to Transit Gateway so that they can each be reached by end users. This requires that automatic route propagation to Transit Gateway be disabled as not all of the subnets in each VPC should be advertised.
Until recently, the biggest drawback to this architecture was that the applications couldn’t communicate with each other, as there was no way to create a more specific route in each VPC to allow connectivity to the front-end subnet in another VPC. For example, in VPC A you couldn’t create a route for 10.0.20.0/23 because it’s more specific than the VPC address range.
The launch of more specific routing in VPCs has resolved this problem. In each front-end subnet, you can modify the VPC route table so that other 10.0.x.x networks (in this example, 10.0.20.0/23 and 10.0.30.23) are routed to Transit Gateway.
This doesn’t solve the challenge of how to administer servers that reside in the back-end subnets. One way of doing this is to place a bastion host in the front-end subnet of each VPC. This will let administrators reach the back-end subnets by using SSH or RDP to that intermediary host.
This option means that if you had to renumber just some of the overlapping networks, then you can do less work (by only changing the front-end subnets) while mitigating most of the risk(by not having to run complex NAT solutions to have applications and users communicate). However, there are additional costs – bastion hosts, NAT or proxy instances, and private endpoints for AWS services. We also strongly encourage that this infrastructure is deployed and managed using automation to keep administration costs as low as possible.
Although this diagram shows the web server (or any other front-end component of the application) in the front-end subnet, you could easily deploy load balancers to that subnet and keep the Amazon Elastic Compute Cloud (AmazonEC2) components in another subnet using a non-reachable IP address range.
This option lets you deploy back-end workload subnets that have thousands of IP addresses without worrying about whether those overlap with other applications. Furthermore, you can only use the minimum number of IP addresses for front-end subnets to make sure that the application is reachable from external (to the VPC) networks.
Option 4: Hide subnets using Private Gateway
We recently (in 2021 as of when this was written) launched Private NAT Gateway. In the same way that NAT Gateway lets you “hide” an entire VPC network range from the Internet (making it appear to come from a single Elastic IP address), Private NAT Gateway lets you do that when connecting from a VPC to other private networks. Instead of using an Elastic IP address and an Internet Gateway, Private NAT Gateway uses the private IP address that it’s allocated from within your VPC as the address that the VPC is “hidden” behind.
This is useful in an environment where you want to connect from a VPC to your on-premises networks or other VPCs, but don’t want to connect directly to resources in the VPC.
The following diagram illustrates how Private NAT Gateways work:
Note that the VPC IP address range is 10.0.0.0/16 but two extra subnets have been added (10.31.10.0/24 and 10.31.11.0/24) which are outside of the original VPC IP address range. A Private NAT Gateway has been added in each availability zone (note that as with Internet-facing NAT Gateways, only one is required but two are recommended for redundancy) to each of the subnets with the secondary IP address ranges. The NAT Gateways will use an IP address from that subnet to translate IP addresses of the workloads from the back-end subnets.
In Transit Gateway, a route to the front-end subnets has been added so that return traffic can be sent back to the Private NAT Gateways. Within the VPCs, traffic from the back-end subnets will be routed to the Private NAT Gateways in much the same way that Internet-facing NAT Gateway route tables operate.
In this case, managing instances in the back-end subnets would need to be done using SSM or bastion hosts in the front-end subnets. If application deployment was automated then there would be no need for human management of those hosts. This is a far more desirable outcome.
In this post, we’ve shown several ways of dealing with overlapping IP networks. The following table shows a comparison between the options:
Remember that renumbering the networks that conflict is by far the best option (in terms of cost, complexity, and visibility) in the long term. For service or application providers that have no control over the networks to which they connect, Private Link is designed specifically to deal with that problem.