Just allocate your web crawlers a public IPs and use an internet gateway. Such a weird self-enforced limitation they’ve envisioned, not clear why they think there’s no choice but NAT
I mean you said it yourself.. that's why you have security groups? You're either paying a ton to use NAT gateway, or setting up a dedicated box to act as your in-house NAT, or configuring security groups. It's pretty simple: no inbound connections except from the VPC.
Also even if there's no firewall at all, how does that mean your machines are getting owned? My boxes listen on precisely one port: a heavily locked down sshd (which isn't listening on that interface anyway)
sshd is only listening on port 22 on the private IP (the VPC) not the public IP of each machine. I then connect into my VPC through a bastion host running wireguard.
That alone will not prevent connections to port 22 on the public IP: the 1:1 non-port-based NAT means that any incoming packets to the public IP will show up at your instance with the private IP as their destination adderess. The TCP/IP stack on your instance knows nothing about the public IPv4 address.
To elaborate more, in AWS the Internet Gateway modifies the private IP to public IP and vice versa. There’s no public IPs being routed with a VPC, it’s all RFC 1918. When they mention private or public subnets, it just means if the subnet has a route to the IGW or not and if it has a public IP assigned.
Not sure if there is some confusion, or I'm missing the point, but I thought GP's point was clear: the security group would have no inbound rules for 0.0.0.0/0. So the instance would never see the requests unless they originated from GP's internal VPC.