"Type: Permanent; SubType: General; Code: smtp; 550-5.1.1 The email account that you tried to reach does not exist. Please try 550-5.1.1 double-checking the recipient's email address for typos or 550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1 https://support.google.com/mail/?p=NoSuchUser y128si147264pfg.177 - gsmtp"
This is pretty much the worst response possible. Hard bounces mean that email delivery services are going to start automatically removing, or at least stopping delivery to, entire slews of email addresses.
A lot of clean up is going to be needed as a result of this.
To add some more details, when using a 3rd party email delivery service, those services will either black-list or just outright remove email addresses when they get a hard bounce "email address no longer exists" message back.
Some providers make re-adding an address after a hard bounce a non-trivial task, since after all, the authority on that email address just said it doesn't exist.
I really cannot believe they did not immediately hack in a new rule to their SMTP server: never return a 5xx (permanent failure), instead return a 421 (temporary failure try again later).
That simple fix buys them 24-72 hours to solve this properly.
Yeah, it burdens servers sending mail to them because now they have to hold on to all mail (including mail that really is permanently undeliverable) for another day or so, but that's still better than what's happening right now.
5xx error results in suppression list addition of an email, so future emails won't be delivered (by most ESPs), and not returning MX response would probably be just as bad, or worse (or result in millions/billions of emails being re-queued due to timeouts?)
His solution would result in exponential retry failures baked into most services, which would buy them a few hours, and result in no lost emails, and no suppression list additions.
This outage seems to have lasted for about 2.5 hours. Probably this was fixed by rolling back whatever caused it. (I don't think the rollout was finished before they resolved it; my mail server sends a lot of emails to Gmail addresses, and even at peak I was only seeing maybe about 1/3 mails be rejected.)
There is no way that putting in a hardcored hack like that would have been faster. Making the change is, of course, fast.
But then you need to review it (and this is a super risky change, so the review can't be rubber stamped). Build a production build and run all your qualification tests. (Hope you found all the tests that depend on permanent errors being signalled properly). And then roll it out globally, which again is a risky operation, but with the additional problem that rolling restarts simply can't be done faster than a certain speed since you can only restart so many processes at once while still continuing to serve traffic.
The kind of thing you describe simply can't be done by changing the SMTP server, in 2.5 hours. The best you could get is if there was some kind of abuse or security related articulation point in the system, with fast pushes as required by the problem domain but still with the sufficient power to either prevent the requests from reaching the SMTP server at all, or intercept and change the response.
As a trivial example, something like blocking the SMTP port with a firewall rule could have been viable. Though it has the cost of degrading performance for everyone rather than just the affected requests.
My mail server logs show about 20 failures in all of the last week until yesterday 20:43 CET, then 350 failures between 20:43-00:21, then nothing after that. So fair enough, from the client side rather than the status page it looks like 3.5 hours rather than 2.5.
But still, given that resolution time, the suggested solution of changing the SMTP server is absolutely ludicrous.
Yes. I email hundreds of thousands of Gmail users each week (yes, double opt in, they all want the mails!) and we immediately delete any user for whom any Gmail error comes up at all in order to keep a solid delivery record with them. Sounds like we might have deleted 80% of our list if we'd sent today..!
So new think to do: Quarantine addresses instead of deleting them and if for one provider most addresses fail don't give them another (maybe manually triggered) try later one.
(And if no such thing is detected deleted quarantined mail addresses.)
My guess is that how most email service providers handle this - they don't actually delete the email and just have a flag on it - bounced, complain, unsub. This way the list owner can run an export and see all the status code.
Yes, we're unusual in not relying on third parties for list management. We can rollback. Or I might just comment out the 'unsub on hard bounce' code for the rest of the week..! :)
Yes, most likely! That is a common approach for 'soft bounces' in most list management systems (e.g. MailChimp).
The problem here is Gmail has been throwing out "NoSuchUser" errors which are an instant unsub in most systems because Gmail takes repeated delivery to non-existing addresses into account for deliverability purposes.
I'm extremely paranoid about email hygiene, tiny bounce rates and high delivery rates, so we aggressively unsubscribe troublesome addresses (often to the point of getting reader complaints about it) for many reasons beyond that, however.
That better describes what I was trying to say, yes. Reputation then affecting deliverability.
Over 80% of our subscribers use Gmail so to say I'm paranoid about maintaining a good record with them is an understatement ;-) Gmail is a huge weak link for us.
Logically you'd expect unsubscribe to only act after lots of bounces of this format when the address has been receiving mail fine before. It also seems reasonable not to trust such bounces for the entire domain for a while when this happens to lots of other addresses that have worked fine before. Not that I expect software currently works this way, but it does seem like a common sense thing to code in.
I mean, it's possible, but you'd need to queue up a day's worth of bounces, do the analysis, and then handle the bounces asynchronously later on to do that.
Most systems operate more immediately in isolation on individual addresses than that right now, because such analysis is generally not needed (until today, of course ;-)).
Mail agents already queue emails that bounce though; it's a matter of changing the conditions for when you retry and/or unsubscribe. I imagine you can do the analysis in real time too... just look at the bounce and see if it pertains to an email you sent to in the past, and if so, increment some rolling counter for that domain.
Mailgun send a warning mail about increased bounces from our account. Sure, they know what's going on... but we send 4-5 digit mails per hour - it's a lot of bounces
That means I can't just resend the the emails blindly, because I'm too scared to trigger some sort of automatic suspension...
(I don't do this regularly, so I'm not familiar with all features... additional mail verification could help probably ....)
They should be returning 421 for backend outages so that sending servers queue and retry the emails. 550 can be interpreted by some as deleted [1] or even banned accounts in some cases. Maybe someone here could convince them to change the logic that occurs during an outage.
Yah. Maybe there's an unexpected way that things can fail resulting in 550's. But maybe at Google's scale you should have some kind of kill switch to stop answering SMTP or to not send permanent errors at all, so that you could flip a switch and prevent the worst consequences of this rather than let it go on for a couple of hours.
A lot of people will lose transactional email messages, because of this.
I'd absolutely hate to be hit by this at this time. Thankfully I've made an time investment to run my own mail server years ago. A handful of times it broke down, it either went offline or started returning 4xx codes due to misconfigured or broken milter after an update. Neither meant lost messages from normal senders that use queuing MTAs.
Same for me, mainly for privacy concerns. And I back it up daily to my local NAS. It's so easy to configure and run your own mail server, that I'm surprised we are the minority in the tech community.
> It's so easy to configure and run your own mail server
Is it? Is dealing with IP reputation, getting your emails accepted by major providers, and being on the hook for fixing everything yourself very easy? I haven't tried, so I don't have personal experience, but I've heard enough horror stories to think that it's not a good use of my time.
Sending side of the MTA can be set up manually in about an hour on a Debian server, with dmarc, dkim, spf, etc. Make that a day if you want to read up on and understand each of the things in more detail, if you haven't configured them before. There's really not much to play with in this direction for a typical personal mail server.
Receiving side is where there is a great range of options, and many things to try and have fun with. You can have anything from a single catchall mailbox with no filtering, no GUI, and a simple IMAP or POP3 access for MUA, to a multi-account, multi-domain setup with server side filtering, database driven mailbox and alias management, proper TLS, web MUA access, etc. It can also be built up gradually, starting from very simple setup to something more complicated so that you never lose account of how things work.
Mine are accepted by Gmail so I am good. Considering how dominant Gmail is, that's all that really matters.
Regarding getting a bad IP rating, normally that's due to having an insecure config, like acting as an open relay, or not having DKIM enabled. There are lots of tutorials online about this, if you know Linux it really is easy.
I had an IP reputation issue and managed to resolve it after some time.
TLDR: Before you spin up a mail server, check if your IP address is on any of the blacklists [0]-[1] as well as Proof Point's list [2]. If it is, then try and get a different IP address.
I spun up a hosted server on Digital Ocean and received an IP address. I checked several black lists from a few email testing/troubleshooting sites [0] and [1] and all was groovy; my IP address wasn't on any list.
I got a bunch of 521 bounces when I tried emailing a neighbor who had an att.net address.
So, I checked the troubleshooting websites, and my IP address was listed as clean.
My logs said I should forward the error to abuse_rbl@abuse-att.net, so I did.
Those emails were never delivered, because abuse-att.net had its own blacklist. I was getting 553 errors. In the logs, the message from their server told me to check https://ipcheck.proofpoint.com.
Proof point runs their own blacklist that some enterprises use (e.g. att and apple [3]). I checked their list, and lo and behold, my IP address from Digital Ocean was blocked [2]. Digital Ocean wasn't able to remove the IP address from their blocklist and suggested I spin up a new droplet with a different IP address.
I didn't want to do that, so I sent Proof Point an email that went unanswered; the email asked them to remove my IP address. I forgot about the issue for five or six months (this is a personal server), and ran into the issue again a few months ago. So I sent Proof Point an email again, this time with different wording emphasizing that "my clients" were having delivery issues. Within a day, they removed my IP address from their block list.
So, my main suggestion is to check if your IP address is on any of the blacklists as well as Proof Point's list before you start on your server. If it is, then try and get a different IP address.
Does anyone have more "enterprise" lists, like Proof Point, to check?
I also had the same hard bounce (when emailing from a non-gmail address -- fastmail -- to a gmail address). Sent it again minutes later and then it worked.