I'm blocking connections from AWS to my on-prem services

xyzzy123 · 2024-08-30T00:58:57 1724979537

All the major cloud providers publish machine-readable lists of their ranges, e.g.

https://docs.aws.amazon.com/vpc/latest/userguide/aws-ip-rang...

https://www.microsoft.com/en-us/download/details.aspx?id=565...

https://support.google.com/a/answer/10026322?product_name=Un...

etc etc...

m3047 · 2024-08-30T01:02:50 1724979770

I haven't checked today, but e.g. Amazon's JSON file belies the fact that they own 3.0.0.0/8 in practice if not in fact. I'm not going to play "let's block Amazon's huge JSON list". I've got 53 rules. I'm willing to step on some fingers / toes. I can get to sites in AWS just fine.

oneplane · 2024-08-30T01:15:26 1724980526

Amazon does have 3.0.0.0/8. Granted, it is noted as two separate /9 blocks, but it's all Amazon.

Perhaps a better use of your time (because de-balkanization of the internet would be more of an academic exercise than a mass market reality; consumers have already self-selected into social media and crappy LLM outputs which is a far bigger matter and driver than ephemeral resources or reverse DNS) would be making your TLS work in a way that doesn't allow for easy abuse. So automatic redirects, HSTS, and a certificate that is standards-compliant.

huggingmouth · 2024-08-30T16:42:10 1725036130

> consumers have already self-selected into ... crappy LLM outputs

Tangent, but last I checked, it was all the big tech giants pushing it down our throat. Unless it's your full time job to find loopholes and workarounds, there is no reasonable way for consumers to opt-out.

zamadatix · 2024-08-30T02:28:11 1724984891

> I haven't checked today, but e.g. Amazon's JSON file belies the fact that they own 3.0.0.0/8 in practice if not in fact.

While it may seem more useful to aggregate the ranges in some points of view it'd be significantly less useful from other points of view. E.g. those who want to whitelist any IP ranges matching a specific DC, service, availability zone, or country.

You can always aggregate the detailed list but you can't do the inverse on your own.

hughesjj · 2024-08-30T03:47:11 1724989631

And you don't even have to invent the aggregation from scratch

https://cidr-aggregator.pages.dev/

Bender · 2024-08-30T14:09:58 1725026998

I don't remember where I found this but there is also some perl code that will do it. I wish they added a comment so I could give them credit. I use it to build block lists for adding null routes on hobby web servers using a few blocklists from around the web and for importing data from BGP AS databases. It keeps my routing table below 300K. It's only for ipv4.

[Edit] I think this might be where I found it [1]

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Net::CIDR::Lite;
    
    my $ipv4String='[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}';
    
    if(defined $ARGV[0] && $ARGV[0] eq '-h'){
    print "usage: $0
    
    This script summarizes your IP classes (if possible). Input IPs with mask one per line. End with CTRL+D. Optionally, redirect a file to stdin like so:
    $0 < cidr.txt ";
    exit;
    }
    
    
    print "Enter IP/Mask one per line (1.2.3.0/24). End with CTRL+D.\n";
    
    my $cidr =Net::CIDR::Lite->new;
    
    while(<>){
    if(/($ipv4String\/[0-9]{1,2})/){
    my $item=$1;
    $cidr->add($item);
    }
    else{
    print "Ignoring previous line.\n";
    }
    }
    my @cidr_list = $cidr->list;
    print "======Aggregated IP list:======\n";
    foreach my $item(@cidr_list){
    print "$item\n";
    }

[1] - https://adrianpopagh.blogspot.com/2008/03/route-summarizatio...

quectophoton · 2024-08-30T16:29:34 1725035374

If using Python, there's a nice method too: https://docs.python.org/3/library/ipaddress.html#ipaddress.c...

theginger · 2024-08-30T12:04:00 1725019440

It was ge sold to Amazon a few years ago. There was a thread about it here https://news.ycombinator.com/item?id=24840405

ThePhysicist · 2024-08-30T06:21:57 1724998917

It's crazy how fast people start attacking your infrastructure on the Internet these days. I recently started announcing one of my /23 subnets (512 addresses) over BGP for an anycast setup, once the route was announced and traffic flowing to the router tcpdump blew up with port scan activity on all IPs of the range. Of course that doesn't have anything to do with the route, it's just that tons of people seem to indiscriminately and continuously scan for open ports on all IP ranges (my range has been unannounced for many years before so it wasn't on someones list of active servers).

I find it a shocking that people still expose internal web services (e.g. Gitlab) openly to the Internet, in my opinion you should at least have one additional layer of protection through a VPN or similar mechanism so that your services aren't discoverable from the public Internet.

I only expose SSH from a single bastion host, which is the only host that's publicly reachable, something that I'd like to get rid off in the future as well by adding a VPN layer on top.

immibis · 2024-08-30T13:14:51 1725023691

Alternatively, recalibrate your expectations. A port scan is nothing, hardly even qualifies as an attack. It's Internet background radiation. You want to go to space, there's background radiation in space, you want to go on the Internet, there's background radiation on the Internet. And if you aren't running vulnerable services, this port scanning radiation is equivalent of eating another banana per month.

It only bothers you because you're looking directly at it in your console. If you bring an overly sensitive Geiger counter on your aeroplane ride, you might be alarmed, but if you're not aware of it, it won't hurt you at all.

I am also surprised by internal services being exposed to the internet, but that's for two reasons: (1) I don't trust most programs' authentication systems, and (2) I don't want people to know which services I'm using internally - not from a technical security standpoint, but sometimes just privacy. But things that are supposed to be on the Internet, that I trust to have a strong front door (or be suitably sandboxed) - they can be on the Internet all day.

Port scanning is just the Internet equivalent of walking around the city taking notes on whose lights are on. Stalkerish? Maybe a bit. But it's public info.

By the way: ssh -w makes a VPN tunnel interface, but it won't auto-configure the rest of the VPN like actual VPN products do.

jcalvinowens · 2024-08-30T06:35:13 1724999713

> something that I'd like to get rid off in the future as well by adding a VPN layer on top.

What VPN software would you use? Personally I've never found anything I consider as trustworthy as OpenSSH.

imhoguy · 2024-08-30T08:11:21 1725005481

Wireguard. Actually I setup also 2nd backup tunnel in case some upgrade or change messes up the first one.

tmdetect · 2024-08-30T13:14:35 1725023675

+1 to WireGuard. For people new to it, there are some great scripts which set up and configure it for you like https://github.com/Nyr/wireguard-install

throwaway63467 · 2024-08-30T07:06:37 1725001597

I use OpenVPN for historical reasons but today I’d go for Wireguard, much simpler, faster and integrated in the kernel, connectionless so much less friction when e.g. rebooting or changing networks.

com · 2024-08-30T06:58:26 1725001106

Wireguard is quite good too, and if you’re up for some complication in your life you can do full mesh quite easily with it if your online infra is a bit distributed.

RGamma · 2024-08-30T09:27:06 1725010026

There's also helpers for wireguard meshes. Or become dependent on yet another service like tailscale (at least there's headscale) or zerotier.

running101 · 2024-08-30T11:04:53 1725015893

It has no dhcp

immibis · 2024-08-30T13:15:48 1725023748

you can statically configure if you don't have a zillion hosts

com · 2024-08-30T12:56:59 1725022619

I use IPv6 magic

Sammi · 2024-08-30T07:48:29 1725004109

Aren't the cool kids using https://tailscale.com/ these days?

WhyNotHugo · 2024-08-30T12:23:04 1725020584

multicast/mDNS is broken, and it doesn't seem that it will be fixed anytime soon. This prevents hosts discovering each other as if they were on non-virtual LAN.

Personally, I find that having to set up an OIDC provider is too much overhead for a VPN. In a corporate setting, you likely have something already, but for individuals or small teams it's too much extra work.

iudqnolq · 2024-08-30T12:52:52 1725022372

How could that work with their architecture? They configure your device to use a DNS server running locally in their app. That resolves their device names to their internal device IP addresses. Their device names default to hostnames, just like mDNS does.

So to give an example if I enter http://geder in my browser I want that to resolve to 100.100.5.10 regardless of if I am on my home network (where geder is) or if I am on a train.

From my perspective half the reason to use tailscale is that it replaces why I'd want mDNS with less bugs.

WhyNotHugo · 2024-09-04T10:22:35 1725445355

That requires rewriting all software to follow tailscale's model instead of mDNS. Additionally, discovery would no longer work when devices are on the same physical network.

aragilar · 2024-08-30T14:01:27 1725026487

Except that mDNS is required for loads of things (via DNS-SD, which is basically the main reason to use mDNS).

diggan · 2024-08-30T11:55:27 1725018927

Ain't no cool kids (in my world) using centralized for-profit services for essential things like that.

crote · 2024-08-30T12:56:41 1725022601

Pretty much all of it is open-source, and there's a self-hosted open-source alternative available for the only closed-source cloud-hosted component[0] - and that's even actively being promoted by Tailscale![1]

[0]: https://headscale.net/

[1]: https://tailscale.com/opensource#encouraging-headscale

diggan · 2024-08-30T13:22:04 1725024124

Seems the cool kids are using Headscale then if anything, rather than Tailscale :)

RGamma · 2024-08-30T09:24:28 1725009868

Yep, don't ever put up badly configured public SSH. It's gonna be pwned in literal seconds. The net increasingly feels like what's going on behind Cyberpunk's blackwall.

itsTyrion · 2024-08-30T15:58:00 1725033480

> literal seconds

Under what condition, the root pw being "admin"?

_nalply · 2024-08-30T16:32:11 1725035531

One really never can be sure but I do this and I hope it is enough:

- put ssh on a port not 22

- only allow key-based logins

- don't allow root logins

- keep software up to date

Not that I am an expert... So please tell me if I have a hole somewhere in my setup.

immibis · 2024-08-31T14:31:23 1725114683

It's defense-in-depth. Really, all you need is:

- don't have a guessable password

With extremely rare exceptions (the NSA might), attackers don't have some magic sauce that breaks SSH. Even the recent SSH vulnerability was very hard to actually exploit (but you should have updated ASAP anyway). Their strength is that they just guess passwords all day long on the whole internet. If one server has "admin", or "root", or "1234", they'll get in instantly. If one server has "alcatrazquinine" they'll get in less instantly. If one server has "XgMTaJR35a7gSpXTD2T", they won't ever get in. This is secure from all the people scanning ssh keys. Well, don't use that exact password I just published.

Key authentication is preferred for two reasons. One is that if you accidentally connect to the wrong server you won't transmit your password to that server. The other is that you can store your key in a file and use it automatically so that just typing "ssh myserver" gets you all the way to a shell prompt. That's very convenient.

Not allowing root logins can make sense for auditing reasons (so you can see which user logged in and then used sudo), but if this is just your private server, there isn't really much reason to avoid it. If it makes you feel better, just pretend your name is "root". It also makes sense if you subscribe to the philosophy of "typing sudo in front of every command helps prevent mistakes," which I don't.

Using a port other than 22 can remove provide a very slight decrease in bandwidth and CPU load, and a bigger decrease in log file output, from processing failed logins by scanners. If these things actually matter to you, go ahead. I promise they don't. Doing it for security is either paranoia or security theater, depending on whether your password is "XgMTaJR35a7gSpXTD2T" or "1234".

Keeping software up to date: of course.

mgarciaisaia · 2024-09-03T17:39:42 1725385182

Thanks for the suggestion. Can you share some other safe passwords like that one to avoid reusing them? ;-)

_nalply · 2024-09-01T20:42:47 1725223367

Thanks, that was great.

arnavpraneet · 2024-08-30T16:23:55 1725035035

way more common than you think

gmuslera · 2024-08-30T12:52:00 1725022320

"These days"? 25+ years ago it was pretty common to get scripts/bots checking your exposed web servers all time (I mean, it was a pretty frequent appearence in web access logs) and turning on firewall rejected access log gave you a permanent traffic of attempts for a lot of known and unknown ports.

If you will expose something (even some deep hidden web component) make sure that it is not a door to your data and infrastructure.

What had changed a bit since then is from where that traffic comes, and deciding if its right to receive/block it or not. Legal servers/services, end user side proxies, cloud providers, the amount of crawlers had increased a lot, and so on.

z3t4 · 2024-08-30T13:11:24 1725023484

I often test web sites on some random port before going live, and at times the customer calls and say "nice work" because they have found their site in Google search. So one of the biggest source for these scans are Google crawlers. I know I shouln't do that, and I'm not complaining... Within seconds after putting a site up on a random port you get scans, mostly for Wordpress exploits. Some companies will instantly put your IP in their firewall block list if you attempt to access an uncommon port, so be careful if you still surf the web via telnet.

BlueTemplar · 2024-08-30T09:49:09 1725011349

This sounds to be an issue specific to IPv4 - wouldn't going IPv6-only make IP scanning impractical for bad actors ?

bauruine · 2024-08-30T10:40:10 1725014410

Yes it's only IPv4 where it's practical to scan the whole address space in minutes but there are methods to find IPv6 addresses [0] certificate transpareny logs are also scanned for hostnames to get AAAA records. But from my experience it's multiple orders of magnitude less than v4.

[0]: http://netpatterns.blogspot.com/2016/01/the-rising-sophistic...

ThePhysicist · 2024-08-30T12:30:18 1725021018

I'd love to switch everything to IPv6, but reachability is not yet there, I estimate it will be another 10 years with the current rate of adoption.

kachapopopow · 2024-08-30T10:11:35 1725012695

You still have to announce your used ranges, so unless you announce /64's it's pretty much the same thing.

bauruine · 2024-08-30T10:32:04 1725013924

Most ISPs don't allow BGP announcements smaller than /48 so you don't get any usefull information from that.

ThePhysicist · 2024-08-30T12:27:23 1725020843

But the smallest range you can announce is /48, that's still way too vast to scan completely.

Dylan16807 · 2024-08-30T12:19:40 1725020380

Depends on how many people are using random-ish addresses and how many are using ::2 and friends.

sulandor · 2024-08-30T10:05:21 1725012321

yes, but not impossible and it comes with other problems

osigurdson · 2024-08-30T13:10:02 1725023402

If one self hosts a VPN, are there any security benefits over properly configured SSH? I assume you must have to have an IP address somewhere. Bad guys will be scanning that I assume. If the key is to use a managed VPN service, what magic are they adding?

Legitimate question. I assume there might be a benefit, I'm just not sure what it is.

nine_k · 2024-08-30T13:41:40 1725025300

It depends, I think, on the complexity of the VPN.

SSH allows to do so many things, has so many config options. To the opposite, Wireguard is very simple, allows basically one thing (authenticate by a key, then pass packets), and is much harder to misconfigure.

(OpenVPN, on the other hand, does not have this advantage.)

osigurdson · 2024-08-30T14:15:02 1725027302

If true, interesting that it all comes down to likelihood of misconfiguration. However, I can't see anything fundamental that a VPN adds either. I wonder, if there was an ssh that couldn't be configured incorrectly and also had a little "anti-hack" tech built in (e.g. disallow more than N connection attempts per minute, etc., what ever is normally done).

nine_k · 2024-08-30T19:18:07 1725045487

After authnz, SSH runs a shell (or other specified remote program), while Wireguard just sets up a network interface. I think it's really hard to make Wireguard run something remotely as you connect, AFAICT.

You can achieve a somehow similar result running by OpenSSH as `ssh -N -D`: do not run anything on the remote end, work as a socks5 proxy.

immibis · 2024-08-31T14:19:50 1725113990

ssh -w sets up a network interface

ssh -w -N sets up a network interface and does not run a shell

It still runs over TCP, so it's not ideal. TCP-over-TCP is a recognized antipattern that causes extra retransmissions, wasted bandwidth and delays.

m3047 · 2024-08-30T15:33:23 1725032003

Other options include port knocking and "no cat" (a web or email interface which selectively pokes holes for ssh based on magic incantations).

immibis · 2024-08-30T00:36:13 1724978173

That's a shame, really. You are risking that none of your stuff will show up on search engines, not even Marginalia. You are risking that none of your stuff will be saved in the Wayback Machine. Maybe you want that, in which case you should block all the clouds and data centers, just to be sure. You might even be blocking your site from some small ISPs based on where they run their CGNAT gateway (I doubt this, but it's possible).

As far as I noticed, ping with a spoofed source address is the only actual abuse mentioned in the article. It should go without saying that you can't tell if a spoofed ping packet came from AWS, because the source address is the address the spoofer wants you to send a reply to, not the spoofer's address. And a much less invasive mitigation would be rate-limiting pings to, say, 10 per second.

While the Internet is becoming balkanized this is mostly because of social media siloing itself to generate advertising and data revenue and to extract profit from AI training data (e.g. the Reddit/Google exclusivity deal) rather than because of providers blocking IP ranges.

I certainly don't understand the rational mindset behind blocking certain providers over some pings and then complaining about IP connectivity becoming balkanized. The balkanization is caused by the ones doing the blocking.

SoftTalker · 2024-08-30T02:55:54 1724986554

Nothing shows up on search engines anymore except huge ad-laden content farms and very large sites like wikipedia, stackoverflow, and big news/media company sites, and of course YouTube.

I have not seen individual blogs or small enthusiast sites in search results for quite some time.

If I want wikipedia or stackoverflow I can just search those sites directly. I'd like an option to exclude all the "usual suspects" and see some more long tail stuff.

immibis · 2024-08-30T03:33:14 1724988794

Try Marginalia Search

_ugfj · 2024-08-30T07:41:20 1725003680

Have you tried Kagi?

Brian_K_White · 2024-08-30T10:08:33 1725012513

kagi, & even more kagi small web

bcye · 2024-08-30T13:39:27 1725025167

Small web literally gives 0 results a lot of times, blocking or deranking websites that don't have a "reject all" button works like a charm though

wmf · 2024-08-30T01:19:57 1724980797

I see it more as a lament that they have been "forced" to take actions that may cause balkanization as a side effect. However the overall tone is that of weaving a grand historical narrative out of a mosquito bite.

Laforet · 2024-08-30T01:47:04 1724982424

I had to read the article twice to be sure that it was a utilitarian move (however questionable it might be) rather than a grand ideological stand that the article seems to spend much time portraying.

FWIW, data center IP addresses are already being treated as second class citizens by major content/service providers, and this has become an escalating barrier to self hosting. I am honestly not sure what the author is trying to accomplish.

chrisweekly · 2024-08-30T02:47:22 1724986042

> "data center IP addresses are already being treated as second class citizens by major content/service providers, and this has become an escalating barrier to self hosting"

Could you please expand on this a bit?

thrdbndndn · 2024-08-30T03:11:02 1724987462

DC IPs are often:

1. totally blocked by some services (especially those related to copyright, like almost all the streaming services), 2. treated as suspicious by lots of CDNs (so you would get captchas more frequently; have stricter rate control, etc.)

chrisweekly · 2024-09-02T20:51:13 1725310273

Thanks. Still interested in OP's response too.

Also, what qualifies as a data center?

Laforet · 2024-09-05T22:28:44 1725575324

Hi, OP here. I did not respond since another poster had beaten me to it but here we go.

The reply above yours is mostly correct though I have to admit that “data center IP” could be a bit of a misnomer when it comes to IP reputation. There are essentially 4 categories:

- Residential landline connections are the most mundane but are also least restricted because this is where your average users are found. The odds of bad actors on the same network is fairly low, and most ISPs will overlook minor transgressions to not incur additional customer support costs.

- Mobile data connections are often behind CG-NAT. Blocking entire IP range tends to generate a lot of false positives so it doesn’t happen very often.

- Institutional IP ranges (such as 17.0.0.0/8 or any org that maintains their own ASN) tends to get a pass as well because they tends to have their own IT and networking department to take collective responsibility if something untoward was to happen.

- This leaves public could and hosting services on the lowest tier because these networks have very low barrier of entry for bad actors . Connections from these IP addresses are also far more likely to be bots and scrapers than a human user so most TDS systems are all too happy to block them.

immibis · 2024-08-31T14:35:22 1725114922

It is ideological. No utilitarian explanation was provided.

Tijdreiziger · 2024-08-30T11:45:59 1725018359

> You might even be blocking your site from some small ISPs based on where they run their CGNAT gateway (I doubt this, but it's possible).

Why wouldn’t (even small) ISPs run their CGNAT gateway in their own IP space?

Running a CGNAT gateway in the cloud would lead to a lot of problems. I think your subscribers wouldn’t be able to watch Netflix and the like, since they wouldn’t be on a residential IP. It would probably also lead to more anti-bot CAPTCHAs from Cloudflare and Google.

Are there actually any known examples of this?

altairprime · 2024-08-30T01:28:17 1724981297

You frame ‘not showing up in search engines’ as a drawback, a downside. What underlies that assumption for you? Could you help me understand it? I don’t share the viewpoint, but I’m still working to understand it.

rkeene2 · 2024-08-30T01:45:56 1724982356

The logic seems clear to me: if you want people to be able to make use of/benefit from your published knowledge/artifacts you would want people to be able to find it when they are searching for that kind of published knowledge/artifact. People use search engines to perform the search for published knowledge/artifacts. Thus, publishing knowledge/artifacts but restricting them such that they cannot be found is a set of simultaneous contradictory actions.

altairprime · 2024-08-30T02:24:51 1724984691

Noted, thank you; I’d still like to hear OP’s viewpoint, though.

noirbot · 2024-08-30T03:49:23 1724989763

I'd be curious about your opinion. I have no real love for search engines, but if I'm going to the trouble of putting things on the internet in public vs. just sending them to people directly, presumably it would be because I want people outside of my circle of friends to see it. If you're not out personally marketing/reposting your blog/site places, search engines are pretty much the only other way people would find you.

And even if I saw your site on here, say, and liked it, if I don't bookmark it immediately, I'd still go to a search engine to try to re-find it in the future if I wanted to go back. Not to say you need to be doing paid ads or trying to raise your SEO or whatever, but I've had times where I remember some unique phrase from an article I read years ago and Google can use that to find the original source.

m3047 · 2024-08-30T04:35:07 1724992507

> putting things on the internet in public vs. just sending them to people directly

A large proportion of the resources I host on-prem are just that. Stuff that's too large for email, or isn't static, or may get updated. Unless people host their own email server (like I do), it's going to e.g. Gmrgle anyway if you email it to them. Maybe it gets blocked, maybe it gets fubared.

Uploading it to an on-prem server and sending the link in an email is no more trouble and any issues are easier to debug.

m3047 · 2024-08-30T04:21:36 1724991696

Simple answer is I find search engines mostly useless. There is no investment in curation, compared to site-specific search. I run and have access to a few semi-private curated lists of resources, good enough for most of what I need and certainly for the kinds of resources I (and other like-minded people) host solely on-prem. Furthermore "search engines good" does nothing about the "not good" traffic; it's a not even wrong argument from my POV.

This wasn't some huge technical lift for me to implement. Trust me on that. I got tired of Amazon stinking up my logs and decided that since I can't discriminate based on reliable information about the services being hosted there which have a legitimate need to reach out, I just don't need their help. Really, I'm helping them by ensuring no spurious pongs or SYN/ACKs come from me. See? I'm helping the best I can.

If you think this is heavy-handed and arbitrary, take a close look at email and domain reputation providers sometime.

Need a version of nc which does multicast and is written in python? Well, you can't get that from an Amazon address anymore... unless you've mirrored it. How many people care? How many people care about precinct-level voting patterns for King County Washington from roughly 2005-2009?

It's not a "grand narrative", I've been playing with the internet since it was possible to do so legally. If explaining that history is grandiose for you, that's you. I host on-prem for my convenience and nobody else's. Enjoy the article... or not.

immibis · 2024-08-30T08:51:35 1725007895

Can I know your IP address ranges? I'd like to block them.

arcza · 2024-08-30T09:43:09 1725010989

The author has said for a long time they have had a "no crawl" policy anyways, I don't think the author cares and with the enshittification of Google they are probably in the right anyways.

arcza · 2024-08-30T09:33:08 1725010388

I've been blocking Hetzner, Digital Ocean, Linode, OVH and Contabo for a while. You can do this with pfBlocker NG by blocking ASNs, or UFW rules (https://blog.abctaylor.com/ufw-and-firewalld-rules-to-block-...)

nickjj · 2024-08-30T10:52:18 1725015138

One concern with doing this as a whole is you may end up blocking legit organizations from accessing your site. If you're selling something that could be a problem.

For example, the org might be self-hosting WireGuard or another VPN solution on a cloud provider and people are connecting through that so their outgoing IP address comes from a cloud provider.

theelous3 · 2024-08-30T10:54:54 1725015294

You can whitelist ranges or whatever for larger customers, but that doesn't suit every form of product or client size ofc.

fpoling · 2024-08-30T12:59:45 1725022785

A big and and not so big enterprises these days uses VPN and similar solutions with exit nodes in the cloud so such blocks essentially prevents access to your web site from a work computer.

blueflow · 2024-08-30T09:42:29 1725010949

oof. Why Hetzner?

arcza · 2024-08-30T10:00:10 1725012010

Due to firewall logs showing DNS amplification attack attempts

Dylan16807 · 2024-08-30T12:28:37 1725020917

Why go beyond blocking direct DNS access?

(Ideally you'd make then switch to TCP by truncating UDP responses to specific clients but that sounds like a hassle to set up so it's understandable to skip that.)

immibis · 2024-08-30T13:17:16 1725023836

Everyone is attempting all attacks all the time from everywhere. Why not secure yourself so the attempts fail?

oneplane · 2024-08-30T13:34:07 1725024847

At that point secure would be 'offline'... It's not like botnets, "unlocker" farms and P2P doesn't originate from residential netblocks all day long.

The idea of "I just want the legitimate traffic" is a simple one, but the implementation of the idea has very little to do with "I will just block the big bad cloud!".

immibis · 2024-08-30T22:49:19 1725058159

Securing yourself means not being vulnerable to the attacks. Who cares if you are exposed to an internet radiation banana equivalent? Why worry? You'll hurt yourself more from the worry than from the radiation.

Blocking huge IP ranges is knocking yourself half offline, and it doesn't even stop you being "attacked". I'd start blocking if and only if there is some actual problem for your server (e.g. excessive CPU or bandwidth usage), not just because big bad scary cloud.

blueflow · 2024-08-31T11:56:49 1725105409

> Who cares if you are exposed to an internet radiation banana equivalent?

Me, because i would like to read all of the syslog without meaningless noise.

immibis · 2024-08-31T14:18:54 1725113934

Then don't log the noise. Every log gets filled with noise if you aren't careful about choosing what to log.

blueflow · 2024-09-01T18:36:54 1725215814

Nonsense. There is no log level that separates between noise and legitimate data.

immibis · 2024-09-05T14:37:03 1725547023

Then stop trying to separate it and just acknowledge these logs are (almost?) worthless?

okr · 2024-08-30T09:51:02 1725011462

I think it should be reciprocal, like in the real world. If someone blocks a provider, a provider should be allowed to block back. Maybe with some automatism. So it is fair and each party has information about what is going on. Or using real guns instead of these children games in the sandbox.

ninkendo · 2024-08-30T11:09:50 1725016190

So if I run a web server at home and I’m constantly attacked by AWS IPs, I shouldn’t be able to block them without myself being unable to access the lion’s share of the web hosted on AWS? Doesn’t that seem sort of extreme?

icedchai · 2024-09-01T13:09:49 1725196189

I run a web server at home, and have for decades. The constant scans is something you realize is "normal" and just ignore.

blueflow · 2024-08-30T10:09:17 1725012557

The internet is not like twitter - a block is practically bidirectional.

chipdart · 2024-08-30T11:13:24 1725016404

> I think it should be reciprocal, like in the real world. If someone blocks a provider, a provider should be allowed to block back. Maybe with some automatism. So it is fair and each party has information about what is going on. Or using real guns instead of these children games in the sandbox.

I don't think your take makes any sense whatsoever. Beyond the puerile "I'll block you too", what exactly do you hope to achieve with this nonsense?

immibis · 2024-08-30T22:49:06 1725058146

Fewer blocks.

chipdart · 2024-08-31T04:42:27 1725079347

Can you elaborate? It sounds like puerile specious reasoning at best.

immibis · 2024-08-31T14:33:14 1725114794

If blocking someone works in both directions, you won't block half the internet based on spurious reasoning, because you'll be blocked from half the internet based on your own spurious reasoning. You'll carefully consider who to block.

okr · 2024-09-01T13:04:13 1725195853

Yeah, exactly, thank you.

bdcravens · 2024-08-30T01:40:57 1724982057

In considering this, my first thought was it would block bona fide desktops running in AWS, especially for service offerings like Amazon Workspaces. However, it looks like the IP space for such services are publicly documented if the need arises to specifically whitelist those IPs.

All that said, it's trivial to use proxies or VPNs to bypass any blocks.

Sammi · 2024-08-30T07:51:08 1725004268

> All that said, it's trivial to use proxies or VPNs to bypass any blocks.

Maybe for you sure. The large drove of people flocking to serverless these days suggests that even most technical people don't want anything to do with their own networking or infrastructure.

immibis · 2024-08-30T08:49:55 1725007795

Proxy selling is a successful business (access on the order of 10 million residential IP addresses for on the order of $100 per month), and they don't say how they get access to those. Probably, the more people block access to non-residential IP addresses, the more money botnet vendors make.

bdcravens · 2024-08-30T16:43:30 1725036210

At least among the "ethical" ones, my understanding is that it's freeware that has been packaged with proxy access (presumably disclosed to the end user, but that's a matter of interpretation)

rkagerer · 2024-08-30T07:54:23 1725004463

I did this a long time ago after dozens of Amazon servers started slowing down our on-prem servers. I think it might have been some kind of attempted SSO stuff but never did entirely track it down. Just wrote a script to periodically download a list of their IP ranges and block 'em all, and moved on.

UI_at_80x24 · 2024-08-30T01:12:28 1724980348

I understand the appeal. If I wanted to wall-off large swaths of the internet from what I create I would do the same thing. It's not much different then blocking entire countries.

The desire to limit the noise and only allow a "small circle of friends" is also appealing.

But I do that for specific services, not my domains in general. Mumble server: only open to the 3-4 countries that my friends are in, and none of the 'cloud providers'. Tech blog: world+dog can see it.

I am firmly in the 'We all benefit from shared knowledge' camp. So if my notes on modem init strings for my 300-baud C64 modem can help one other person; they won't go through the same pain I went through, and the world will be a better place.

I get the desire, for many reasons. That's cool. You do you.

m3047 · 2024-08-30T04:01:14 1724990474

I wrote the "Zen InterSLIP Dialing Script" and it literally went around the world.

Amazon is too large to ignore. I understand that a lot of ICMP and SYN traffic is garbage. I'd be happy to help out and block it (and I do have mitigations in place); in fact I do, by default. That's part of why Amazon is a PITA because they trigger my mitigations "at scale". Amazon doesn't help sort the wheat from the chaff: "send a PCAP (for a ping issue)". I don't learn anything by sending Amazon stuff and hearing nothing. I don't need their good traffic any more than I need their bad traffic... or the traffic which is spoofed which is attacking them.

If they can't see fit to help me help them, I don't need any of it. I'm just keeping my life simple.

Dylan16807 · 2024-08-30T12:41:02 1725021662

Is the traffic causing you problems though? Or do you just dislike that they don't want to fix this?

m3047 · 2024-08-30T15:50:38 1725033038

Problems? What kind of problems? Discovering word embeddings was sort of a happy accident of LLMs: back propagation "improved" them from a similarly random starting point as the NNs.

It's not a technical problem as in overwhelming any resource. It overwhelms me, for starters. Secondly my existing mitigations suggested it, I resisted the move for some of the reasons people here are saying it's a bad idea; I finally concluded the benefits outweighed the costs and I'd try it and see what happens.

So far, so good. Once the fire burns out it should be great.

immibis · 2024-08-30T22:48:30 1725058110

You're overwhelmed by the fact that LLMs exist, so you want to sabotage LLMs by censoring your own site from them and half the internet? Feeding them a site filled with garbage non-facts would have a greater effect.

Why do you publish a blog, if not for people to find?

m3047 · 2024-08-31T00:58:06 1725065886

> Feeding them a site filled with garbage

I can turn that back on for you if you miss it. I saved the Content Imposition Disorder Study Group website, it's still there.

User23 · 2024-08-30T05:44:39 1724996679

This makes me miss the Internet. It’s really hard to explain how wonderful it was pre-commercialization. The sour pleasure of having been exactly right about how that would turn out isn’t nearly satisfactory compensation for the loss.

Yeul · 2024-08-30T10:10:28 1725012628

Ah yes nostalgic bullshit. It was always about the money.

RGamma · 2024-08-30T10:45:09 1725014709

Not with the fierceness and at the scale of today. Boiling frog and all.

hello_computer · 2024-08-30T09:07:45 1725008865

AWS is a boy scout compared to places like DigitalOcean, OVHcloud, ColoCrossing, Scaleway, Tencent, or even Google. I think DigitalOcean, in particular, has made a terrible mistake marketing to the “cybersecurity” community.

m3047 · 2024-09-03T17:15:40 1725383740

In fact (if you read the post) I mention blocking "some small hosting providers as a proof of concept": that would be DigitalOcean, for the reason you state.

cmeacham98 · 2024-08-30T11:40:26 1725018026

Bias disclaimer: AWS is my current employer.

Maybe I'm missing something obvious, but if the author believes the ping traffic is being spoofed, how could they know AWS is the source?

mrweasel · 2024-08-30T12:16:22 1725020182

AWS has a terrible reputation as being the source of absolute massive amounts of abuse and poorly written scrappers and crawler (hard to tell the difference between a bad crawler and an active attack).

From experience I've seen AWS be the source of overwhelming traffic so many times that we in some cases resorted to the same solution, blocking AWS completely.

I don't know if AWS doesn't care or is just slow to react. Maybe reporting is to difficult, I don't know.

So the obvious you're missing is: AWS IS a huge source of "bad" traffic and getting a misbehaving customer shutdown is too hard, while renting insane amounts of capacity is too easy for bad actors.

Edit: I've almost never seen GCP or Azure being the source of the same amount of crazy traffic.

knallfrosch · 2024-08-30T11:54:40 1725018880

Where's the summary? I didn't quite get what the problem is that you're trying to solve.

Data scraping? DDOS attacks? Bandwidth trouble? Security?

nullc · 2024-08-30T10:07:49 1725012469

I've found that blocking china is also a pretty good improvement on abusive traffic relative to disruption.

jeroenhd · 2024-08-30T08:56:39 1725008199

I should probably do this for most of my stuff. I run some servers that require cloud-to-cloud networking, but the only inbound stuff I see coming from cloud services is bots, scanners, and scrapers. I've had to block off China's largest ISP because some broken scraper kept re-downloading the same image assets for no reason, and kept popping up on other subnets.

I don't think anyone will miss my stuff if they're part of the small minority of people accessing the internet through a VPN hosted in large data centres.

The biggest challenge for implementing this will probably be figuring out how to block inbound connections but keep outbound connections working. I'm sure there's a good nftables rule I can come up with eventually.

sulandor · 2024-08-30T10:10:27 1725012627

> The biggest challenge for implementing this will probably be figuring out how to block inbound connections but keep outbound connections working.

nah, there's connection tracking for that.

   ctstate new srcip <amazon-set> drop

m3047 · 2024-08-30T15:53:51 1725033231

For TCP you're just looking for packets having SYN without ACK.

    client                  server
       -------- SYN --------->
      <-------- SYN+ACK -----

aorth · 2024-08-30T08:52:56 1725007976

It has really gotten terrible. Between DDoS and bots from massive tech companies, it seems like I have several events a year where thousands or tens of thousands of IPs from a single datacenter (Singapore!) are making requests to some of my infrastructure concurrently. What can we do?

I opted for CIDR aggregation and rate limiting of data center ISPs in nginx for one of my frontends. There are reasonable limits for normal IPs too. Not all of us have the capacity or desire to scale.

m3047 · 2024-09-03T17:41:21 1725385281

This is the chauvinism I maybe failed to surface effectively enough. People in the cloud "just scale"; and big cuddly cloud says "ooooh, we block DoS and we won't charge you for the ingress traffic" when they whine. Whereas it really should be alleged DoS, it's not like I won't implement mitigations.

I publish telemetry outing the worst offenders. What is Amazon doing for me, as a non-customer, to support our presumably shared goal? So maybe it's not really a shared goal.

Here's an analogy which is very visceral for me: I have a chronic condition, which has never been "root caused" in my case by the school of western allopathic medicine. They put me on (several) drugs, I've taken them for a decade.

Due to the ongoing enshittification of medical care over the past several years, I ended up having to see a naturopathic physician (with prescribing privileges, and not covered by my insurance) to get the prescriptions renewed because the western docs simply had better things to do than schedule and keep an appointment. Dude does the things a western doc should do, looks at the labs, and says with all seriousness: have you considered maybe you're allergic to drug "Y"?

Holy fuck well that causes all kinds of problems identifying a replacement therapy, but that's not the point of the story here.

Point is they've never done root cause, blamed the increasingly worse side effects on the condition they never root caused, put me on more drugs for the side effects, and told me to just live with it because that's what happens when you get older: this is what happens when you get trapped in an ecosystem.

So I have identified a replacement therapy, it maybe doesn't work so well for the symptom drug Y was supposed to mitigate but it works. On the other hand, here are the side effects: constellation of symptoms of the actual condition (drug Y was intended to mitigate only one of them) has virtually disappeared, I sleep better, I have more energy, and I've lost nearly 20 pounds in the last six months.

In summary, it's not that hard to run my own server (I have the skills, knowledge and specialized bespoke tools to support doing it my way). It keeps my professional skills sharp and gives me real practical intelligence about the fight in the streets.

perching_aix · 2024-08-30T08:19:03 1725005943

https is broken for the site

m3047 · 2024-08-30T15:55:05 1725033305

Accept my signing certificate and try again. /s

perching_aix · 2024-08-30T16:16:49 1725034609

im good, thanks

ruthmarx · 2024-08-30T10:07:21 1725012441

The cynic in me thinks this won't accomplish much. They can/do just buy data from other companies that scrape or some subsidiary.

This isn't a technical problem, it's a legal/social problem.

theideaofcoffee · 2024-08-30T01:56:43 1724983003

Yawn. Old man yells at cloud (literally). So he's taking his little netblock ball and going home because of some failed purity tests: bad or nonexistent PTRs, excessive ICMP, oh my! The gentleman's agreements that held together the early internet and web and the unwritten practices like that are long gone, get with the times, there ain't any going back to how it was. Otherwise, feel free to disconnect entirely if you don't want to deal with the new reality.

I'm going to going out on a limb and guess that all of this traffic that isn't related directly to AWS, but its customers. You can set PTRs for your allocated elastic IPs with a request to support. But then again nobody is going to do it because... it doesn't matter. It may have mattered when you were hosting with a block that you actually truly owned, before the ICANN times, but no more. No one cares. Everything is ephemeral, so why should the reverse matter when things get cycled through addresses multiple times per day? If you're seeing excessive anything, then it's probably time to reach out to the abuse contact published in the whois. Let me help you with that:

   OrgAbuseHandle: AEA8-ARIN
   OrgAbuseName:   Amazon EC2 Abuse
   OrgAbuseEmail:  trustandsafety@support.aws.com
   OrgAbuseRef:    https://rdap.arin.net/registry/entity/AEA8-ARIN

   Comment:        All abuse reports MUST include:
   Comment:        * src IP
   Comment:        * dest IP (your IP)
   Comment:        * dest port
   Comment:        * Accurate date/timestamp and timezone of activity
   Comment:        * Intensity/frequency (short log extracts)
   Comment:        * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.

Use modern features built in to modern versions of common packages and products: rate limiting, redirects, filters, and on and on. If you're just blocking to block to make some sort of statement into the void, you're just hastening that balkanization.

cdchn · 2024-08-30T02:26:18 1724984778

AWS doesn't mess around with abuse reports either. If you send a report and it checks out, they're opening a case with the customer in question telling them to explain themselves.

alyandon · 2024-08-30T14:23:54 1725027834

I ended up blocking Amazon SES because I was receiving hundreds of obviously spam emails across all my inboxes every day for weeks. I dutifully sent multiple reports of this obvious spam to the listed spam/abuse contacts for Amazon SES.

The end result of my efforts was:

  1) No feedback at all from my reports to Amazon - not even an acknowledgement that my report had been received
  2) The spam continued unabated for weeks until I finally had enough and just blocked the entire Amazon SES service

That was a few years ago and maybe they are more responsive now. They sure as hell weren't responsive back then.

hughesjj · 2024-08-30T03:44:26 1724989466

But but that requires an async unidirectional communication with another human (the horror!)

xcdzvyn · 2024-08-30T02:52:23 1724986343

I'm slightly ashamed to admit I don't know what RPZ, PTR, or SPF are, nor do I understand the asides about AI or reverse DNS.

What precisely is his problem with Amazon?

arcza · 2024-08-30T09:42:07 1725010927

Meaningless low-value and non-human traffic wasting resources and/or training models. Their blog is for real humans per the author's writing, which is a valid stance to take.

issafram · 2024-08-30T03:25:54 1724988354

[flagged]

tomcam · 2024-08-30T06:06:41 1724998001

Reader mode is the way but tbh after the first 1500 words or so I didn’t know where it was going. I’m old and life is short,

runxel · 2024-08-30T14:14:28 1725027268

I agree. Horrible writing. For a while I thought it's just some LLM output.

Not worth reading anyway, so you've lost nothing, OP.

jaktet · 2024-08-30T03:29:12 1724988552

https://12ft.io/http://consulting.m3047.net/dubai-letters/ba...

NetOpWibby · 2024-08-30T08:04:09 1725005049

Bless

enjoyyourlife · 2024-08-30T03:39:37 1724989177

You can use Reader mode on Firefox

dredmorbius · 2024-08-30T03:56:37 1724990197

Not only does Reader Mode work beautifully, but, colour choice notwithstanding, the article's CSS styling is sufficiently vanilla that my own enhanced Reader Mode stylesheet (drop caps and a few other fiddly bits) works beautifully. That's often ... not ... the case.

nullc · 2024-08-30T10:02:45 1725012165

I think it is a good font color. Nice contrast.

Also if you're the sort that cares enough about such things to make a post complaining you should have tools to change them or override per your preferences (e.g. reader mode).

huimang · 2024-08-30T04:08:48 1724990928

It's high contrast, better than a lot of websites. You'll be fine.

tonetegeatinst · 2024-08-30T06:16:30 1724998590

Firefox on mobile, using dark mode. That font actually hurts my eye, and I straigup closed the site. Many I'll write a script to scrape the text into a terminal output so I can read it without feeling like I'm having a headache. Probably just output the text via a terminal windows.

baq · 2024-08-30T06:49:08 1725000548

You don't need to do this, Firefox on mobile has a very good reader mode.

Lucasoato · 2024-08-30T06:37:40 1724999860

From iOS Safari I see it pink on dark blue… nothing wrong there with these colors, actually they’re pleasant to look at. The font as well seems nice, is there anything wrong with it?