Switch to VPC Endpoints from Nat Gateways to Reduce Bandwidth Charges

andrewguenther · on June 19, 2022

NAT Gateways also support bandwidth, availability, and performance properties that most customers just don't need. AWS networking services which charge by GB processed are always a cost disaster waiting to happen.

I built my own NAT AMI[1] that costs <$4/month and has zero bandwidth charges and is good enough for most people.

[1]: https://github.com/AndrewGuenther/fck-nat/

aynsof · on June 19, 2022

My first question was answered in the readme:

  "But what about AWS' NAT Instance AMI?"

  The official AWS supported NAT Instance AMI hasn't been updates since 2018, is still running Amazon Linux 1 which is now EOL, and has no ARM support, meaning it can't be deployed on EC2's most cost effective instance types. fck-nat.

I had no idea this was the case. Thank you for making this!

worldofmatthew · on June 18, 2022

Or don't use AWS as other providers provide multiple terabytes of bandwidth for a couple of dollars per month.

victor106 · on June 19, 2022

Which providers are these?

hhw · on June 19, 2022

Pretty much any colo or dedicated/bare metal provider. $0.07 to $0.12 per Mb is the going rate for most carriers at any appreciable volume, and even the higher end carriers are less than 3x that.

To be fair, big tech despite their massive volume pay much higher rates than small networks because the carriers charge them enough to fully cover their costs to build out their networKs, while they make all their profits from selling their excess capacity to the little guys for pennies on the dollar.

hermitdev · on June 19, 2022

At $0.07/MB, your talking over $70K to transfer a TB. Am I missing something here? Because I'd say $70k is quite a bit more than "a couple of dollars" the GP mentions. (No, I dont deal with cloud pricing at all, so I'm a little perplexed here)

hhw · on June 19, 2022

Network at the wholesale level is always measured in Mbps using 95th percentile, not in data transferred (average sustained, equivalent to 50th percentile because we're talking about 5 minute samples of interface counters over the course of a month). Note I used a small b in Mb. Depending on the variability of traffic patterns, that usually works out to be on average ~200GB* transferred per Mbps of 95th percentile over a period of a month. Meaning a TB would work out to about $0.35.

*A long, long time ago, I looked at about 1000 co-location customers' MRTG stats and compared their monthly 95th percentile Mbps to their average sustained data transfer in GB, and something like 90% of them were between 150GB-250GB per Mb and 98% of them were between 180GB-220GB. Many people assume 324GB which would require their traffic to be perfectly flatlined throughout the month, which obviously rarely ever happens.

Kudos · on June 19, 2022

They're talking about throughput not transfer. Like a 100Mbps link, not 100MB data transferred total.

datalopers · on June 19, 2022

Just allocate your web crawlers a public IPs and use an internet gateway. Such a weird self-enforced limitation they’ve envisioned, not clear why they think there’s no choice but NAT

busterarm · on June 19, 2022

Why make your web crawlers reachable via the internet?

You're one misconfigured security group away from your shit being owned.

datalopers · on June 19, 2022

I mean you said it yourself.. that's why you have security groups? You're either paying a ton to use NAT gateway, or setting up a dedicated box to act as your in-house NAT, or configuring security groups. It's pretty simple: no inbound connections except from the VPC.

Also even if there's no firewall at all, how does that mean your machines are getting owned? My boxes listen on precisely one port: a heavily locked down sshd (which isn't listening on that interface anyway)

fuzzybear3965 · on June 19, 2022

I don't understand

> listen on one port ... which isn't listening on that interface anyway

Would you mind please elaborating, here? And which interface does sshd not listen on?

datalopers · on June 19, 2022

sshd is only listening on port 22 on the private IP (the VPC) not the public IP of each machine. I then connect into my VPC through a bastion host running wireguard.

terom · on June 19, 2022

That alone will not prevent connections to port 22 on the public IP: the 1:1 non-port-based NAT means that any incoming packets to the public IP will show up at your instance with the private IP as their destination adderess. The TCP/IP stack on your instance knows nothing about the public IPv4 address.

Bluecobra · on June 19, 2022

To elaborate more, in AWS the Internet Gateway modifies the private IP to public IP and vice versa. There’s no public IPs being routed with a VPC, it’s all RFC 1918. When they mention private or public subnets, it just means if the subnet has a route to the IGW or not and if it has a public IP assigned.

This was pretty confusing to learn at first.

Hallucinaut · on June 19, 2022

Not sure if there is some confusion, or I'm missing the point, but I thought GP's point was clear: the security group would have no inbound rules for 0.0.0.0/0. So the instance would never see the requests unless they originated from GP's internal VPC.

datalopers · on June 20, 2022

I don't use NAT gateway. That's why I explained the setup.

4khilles · on June 19, 2022

The starting point should be the simpler solution. What's the argument for introducing NAT? Why is the firewall (that you need anyways) insufficient?

jpgvm · on June 19, 2022

If you need a generic replacement for NAT Gateway you can build one using Transit Gateway and a pair of router VMs running GRE+BGP.

Amazon has most of an example here using quagga (but CloudFormation, ick): https://github.com/aws-samples/aws-transit-gateway-connect-s...

solatic · on June 19, 2022

Another way to reduce NAT gateway charges is to simply not need them in your network architecture: IPv6-only, immutable infrastructure (i.e. pulling updates while building the AMI in a different environment so that there is no need to pull from the Internet in production)...

miyuru · on June 19, 2022

you don't need to go IPv6 only, just adding dual-stack IPv6 with egress only gateway can reduce bandwidth costs.

sigstoat · on June 19, 2022

so i keep having people froth at the mouth about how it is more secure to use VPC endpoints to communicate to AWS services.

is there any practical mechanism by which not using the VPC endpoints can be insecure, which does not also affect VPC endpoints?

i usually get some hand waving about man in the middle attacks, but i figure if you can insert that into an AWS data center you can probably just see all the VPC internal traffic anyways. and one should be using TLS to talk to the services anyways.

Galanwe · on June 19, 2022

> is there any practical mechanism by which not using the VPC endpoints can be insecure,

Hum, yes. If you're not using VPC endpoints, basically you're routing all your AWS traffic to the open internet.

Not only is it wasteful and slow, but that also means you have to open free lunch internet egress on your instances, or implement some convoluted DNS based firewalling.

Using VPC endpoints also allows you to capture AWS service traffic and apply policy on it, such as whitelisting which buckets are accessible, thus avoiding internal data leaks.

Honestly there is just no argument for _not_ using VPC endpoints. Even on a purely architectural point of view, having you AWS traffic being routed out to internet just to get back in makes no sense.

If you are still unconvinced, the pricing argument of course still stands.

sigstoat · on June 19, 2022

> If you're not using VPC endpoints, basically you're routing all your AWS traffic to the open internet.

yes this is what i normally hear. but by “the open internet” you mean “some other spot in the AWS data center”. what’s the risk here? somebody is going to slip some bad routes into AWS over BGP? and then also fake the SSL certs? this seems like some mission impossible stuff.

> Honestly there is just no argument for _not_ using VPC endpoints.

unless something has changed in the last three months they’re not available for all services. would you advocate against using those services?

> If you are still unconvinced, the pricing argument of course still stands.

sure that and the exfiltration argument are reasonable enough.

thanks for response

Galanwe · on June 19, 2022

> some other spot in the AWS data center

Well I can only guess so much of the underlying egress internet routing of AWS. At worst, if no explicit region is specified, it will reach the global aws endpoint through internet which is likely in a complete different part of the world than where you are, redirect to the local endpoint, and back.

> what’s the risk here?

Minimal, though I'm not sure the question is really relevant.

It's a bit as if you design a house with no internal doors, and you have to get out the window and back through the front door whenever you want to change room. I guess that wouldn't make you house less safe, though it's definitely a design that smells weird.

There is a real security point to make on the fact that it forces you to open access to egress internet though, and that is not to be taken lightly. There is no reason to allow a server full egress internet, and accessing AWS through internet basically forces you to do so, or leaves you implementing DNS based firewalling which is error prone, less secure, and overall a pain to setup.

> unless something has changed in the last three months they’re not available for all services. would you advocate against using those services?

No. Though I would (and do) strongly recommend implementing either DNS based firewalling, or a dynamic ruleset based on AWS ip ranges (they publish it as JSON).

bscanlan · on June 19, 2022

> Well I can only guess so much of the underlying egress internet routing of AWS.

> At worst, if no explicit region is specified, it will reach the global aws endpoint through internet which is likely in a complete different part of the world than where you are, redirect to the local endpoint, and back.

There's no need to guess:

From https://aws.amazon.com/vpc/faqs/#Peering_Connections

"When using public IP addresses, all communication between instances and services hosted in AWS use AWS's private network. Packets that originate from the AWS network with a destination on the AWS network stay on the AWS global network, except traffic to or from AWS China Regions."

In practice there is not much risk from accessing AWS services using public endpoints, you just need to take AWS at their word.

spydum · on June 19, 2022

If you don't use vpc endpoints, usually you end up having to open network traffic to all of a particular aws service, not just yours.

This means as an attacker, I can also go spin up (s3/lambda/etc) and if your app was vulnerable in some way, I could exfil data to those aws services (or leverage them in RFI/CSRF/other attacks) in a way that shouldn't be possible.

sigstoat · on June 19, 2022

> you end up having to open network traffic to all of a particular aws service, not just yours

> I could exfil data to those aws services

I don't think any of the VPC endpoints have any restrictions on them; that is, once I've made an S3 endpoint in my VPC, it can be used to access any bucket. So... it seems like it could be used just as easily to exfiltrate data?

cldellow · on June 19, 2022

I haven't done it, but the docs seem to imply that some VPC endpoints, including S3, can support the usual policy documents to describe which actions/principals/resources are permitted: https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpo...

different_sort · on June 19, 2022

That is indeed the case.

Galanwe · on June 19, 2022

> once I've made an S3 endpoint in my VPC, it can be used to access any bucket. So... it seems like it could be used just as easily to exfiltrate data?

On the contrary, and that is one of the strong points of using VPC endpoints. You can whitelister the buckets that are accessible through the endpoints, which means it is in fact the only way to prevent data exfiltration.

barefeg · on June 19, 2022

Is it possible to do this in EKS?

calgoo · on June 19, 2022

There is a aws doc on their page somewhere where they explain how to setup eks on private vpc subnets and using the endpoints to to access the needed aws services.

Edit, link: https://docs.aws.amazon.com/eks/latest/userguide/private-clu...

abofh · on June 19, 2022

Strictly speaking yes, but there are limitations depending on what you do - you can't, for example, do route53 or ACM API's via private link (private-CA does, but not the general purpose CA) - so things like external-dns and the ACK ALB+Certificate configuration can't work via auto discovery without NAT or being on a public instance.

But EKS by itself, you can absolutely do with private subnets and private link API's.

x3n0ph3n3 · on June 19, 2022

VPC Endpoints are not free, and if you use enough different AWS services, the cost starts to really go up.

WatchDog · on June 19, 2022

Misleading title, VPC endpoints aren’t a replacement for NAT gateways. Traffic destined for certain AWS services can be offloaded to a VPC endpoint, when it otherwise may have been routed via your NAT gateway.

However you still need something to do NAT if you are running an ipv4 private network and need to access the internet

nine_k · on June 19, 2022

You may not need a NAT proper if all you expose / do is HTTP requests.

Run a proxy / LB on an instance that has a public IPv4 and a VPC network interface. Don't NAT or route between these networks, just let the proxy listen on the public interface, and contact the backend servers on the private interface.

sigstoat · on June 19, 2022

as you note they’re only available for some services, and sometimes only partially for a service.

check the docs very carefully before getting your hopes up about dropping the bar gateways.