It logged me out and told me that my credentials were incorrect; I thought my credentials had been stolen, so I'm kinda personally glad that it seems to be happening to a lot of other people too. I know that's a bit selfish, but :shrug:
There are quite some harsh comments here below. You can't plan for every possible failure point, who knows what part of a system/infra out of everything that they have went down and triggered this behaviour. Some things you just can't catch/predict. Especially in huge systems like theirs. I would expect people here to understand things like these and not just call people names for something like this, we all know things seem simple/clear from the outside, but the job of debugging and fixing something like this take quite some effort.
This is a company with one of the largest digital infrastructures in the world. An outage is understandable, inability to tell they're having an outage and inform users appropriately is not. Stop making excuses for people who are literally awash in resources.
> Stop making excuses for people who are literally awash in resources.
This is a pretty weird outlook to have - looking at any group awash with resources, whether it be governments or other companies, and you can clearly see that even with those resources, failures still happen.
You can jump up and down and pretend that this is solvable, or you can look at reality, look at all the evidence of this happening over and over to almost everyone, and conclude with some humility that these things just happen to everyone.
(Looking this reality in the face is one of the things motivating my beliefs around e.g. AI safety, climate change, etc.)
It is always better for the company's rep for the issue to have been on your end. Admitting fault comes with a potential liability. It's gaslighting written as an SLA
You can't plan for every contigency, but you can reserve potentially scary message for situations where you know they are correct. An unpected error state should NOT result in a "invalid credentialiald error".
Pushing people to unnecessarily reset credentials increases risk. Not only does it increase acute risk, but it also decreases the value of the signal by crying wolf.
The argument here is the kind of nonsense cargo cult security that pervades the industry.
- in general, if the system is broken enough to be giving false-negatives on valid credentials, it's broken enough that there isn't much planning to be done here because the system's not supposed to break. So if they give me "Sorry, backend offline" instead of "invalid credential," they've now turned their system into an oracle for scanning it for queries-of-death. That's useful for an attacker.
- in the specifics of this situation, (a) credential reset was offline too so nobody could immediately rotate them anyway and (b) as a cohort, Facebook users could stand to rotate their credentials more often than the "never" that they tend to rotate them, so if this outage shook their faith enough that they changed their passwords after system health was restored... Good? I think "accidentally making everyone wonder if their Facebook password is secure enough" was a net-positive side-effect of this outage.
So your approach to security is to never admit that an application had an error to a user, but to instead gaslight that user with incorrect error messages that blame them?
This is security by obscurity of the worst kind, the kind that actively harms users and makes software worse.
No. My approach to security is to never admit that an application had an error to an unauthenticated user.
That information is accessible to two cohorts:
- authenticated users (sometimes; not even authenticated users get access to errors as low-level as "The app's BigTable quota was exceeded because the developers fucked up" if it's closed source cloud software)
- admins, who have an audit log somewhere of actual system errors, monitoring on system health, etc.
Unfortunately, I can't tell if the third cohort (unauthenticated users) is my customers or actively-hostile parties trying to make the operation of my system worse for my customers, so my best course of action is to refrain from providing them information they can use to hurt my customers. That means, among other things, I 403 their requests to missing resources instead of 404ing them, I intentionally obfuscate the amount of time it takes to process their credentials so they can't use timing attacks to guess whether they're on the right track, I never tell them if I couldn't auth them because I don't recognize their email address (because now I've given them an oracle to find the email addresses of customers), and if my auth engine flounders I give them the same answer as if their credentials were bad (and I fix it fast, because that's impacting my real users too).
To be clear: I say all this as a UX guy who hates all this. UX on auth systems is the worst and a constant foil to system usability. But I understand why.
You are absolutely correct. That would be a much better experience.
That said, getting there strikes me as pretty challenging. Automatically detecting a down state is difficult and any detection is inevitably both error-prone and only works for things people have thought of to check for. The more complex the systems in question, the greater the odds of things going haywire. At Meta's scale, that is likely to be nearly a daily event.
The obvious way to avoid those issues is a manual process. Problem there tends to be that the same service disruptions also tend to disrupt manual processes.
So you're right, but also I strongly suspect it's a much more difficult problem than it sounds like on the surface.
> That said, getting there strikes me as pretty challenging. Automatically detecting a down state is difficult and any detection is inevitably both error-prone and only works for things people have thought of to check for. The more complex the systems in question, the greater the odds of things going haywire. At Meta's scale, that is likely to be nearly a daily event.
Well, in principle, the frontend just has to distinguish between HTTP status 500 (something broken in the backend, not the fault of the user) and some HTTP status code 4xx (the user did something wrong).
The "your username/password is wrong" message came in a timely manner. So someone transformed "some unforeseen error" into a clear but wrong error message.
And this caused a lot of extra trouble on top of the incident.
But there's something off here. I wouldn't expecting to be shown as logged out when the services are down. I'd expect calls to fail with something aka 500 and an error showing "something happen edited on our side". Not all the apps going haywire.
At the scale of Meta, "down" is a nuanced concept. You are very unlikely to get every piece of functionality seizing up at once. What you are likely to get is some services ceasing to function and other services doing error-handling.
For example, if the service that authenticates a user stops working but the service that shows the login form works, then you get a complex interaction. The resulting messaging - and thus user experience - depend entirely on how the login page service was coded to handle whatever failure the authentication service offered up. If that happens to be indistinguishable from a failure to authenticate due to incorrect credentials from the perspective of the login form service, well, here we are.
At Meta's scale, there's likely quite a few underlying services. Which means we could be getting something a dozen or more complex interactions away from wherever the failures are happening.
Isn't this just the standard problem of reporting useful error messages? Like, yes, there are academic situations where you can't distinguish between two possible error sources, but the vast majority of insufficiently informative error messages in the real world arise because low effort was applied to doing so.
Yes, with the additions of sheer scale, a vast number of services, multiple layers, and the difficulty of defining "down" added in. I think the difficulty of reporting useful error messages is proportional to the number of places an error can reasonably happen and the number of connections it can happen over, and by any metric Meta's got a lot of those.
No, in that detecting when you should be reporting a useful error message is itself a complex problem. If a service you call gives you a nonsense response, what do you surface to the user? If a service times out, what do you report? How do you do all this without confusing, intimidating, and terrifying users to whom the phrase "service timeout" is technobabble?
> If a service you call gives you a nonsense response, what do you surface to the user?
If this occurred during the authentication process, I think I would tell the user "Sorry, the authentication process isn't working. Try again later." rather than "Invalid credentials". And you could include a "[technical details]" button that the user could click if they were curious or were in the process of troubleshooting.
> If that happens to be indistinguishable from a failure to authenticate due to incorrect credentials from the perspective of the login form service, well, here we are.
If you can't distinguish those, then that is bad software design.
Come on use a little imagination. DNS lookup for the db holding the shard with the user credentials disappears. Code isn’t expecting this, throws a generic 4xx because security instead of a generic 5xx (plenty of people writing auth code will take the stance all failures are presented the same as a bad password or non-existing username); caller interprets this a login failure.
Same auth system system used to validate logins to the bastions that have access to DNS. Voilá.
> plenty of people writing auth code will take the stance all failures are presented the same as a bad password or non-existing username
Those people would be wrong. You can take all unexpected errors and stick them behind a generic error message like "something went wrong" but you should not lie to your users with your error message.
If you have different messages for invalid username vs invalid password, you can exploit that to determine if a user has an account at a particular service.
"Invalid credentials" for either case solves this problem.
But sure, let's report infra failures different as "unexpected error"
Now, what happens if the unexpected error is only when checking passwords, but not usernames?
Do you report "invalid credentials" when given an invalid username, but "unexpected error" when given a valid name but invalid password?
If so, you're leaking information again and I can determine valid usernames.
So, safe approach is to report "invalid credentials" for either invalid data or partial unexpected errors.
Only time you could safely report "unexpected error" is if both username check and password check are failing, which is so rare that it's almost not worth handling. Esp. at the risk of doing wrong and leaking info again.
If you really want to hide whether a username is in use, then you also have to obscure the actual duration of the authentication process among other things. The amount of hoops you need to jump through to properly hide username usage are sufficient that you need to actually consider if this is a requirement or not. Otherwise, it is just a cargo cult security practice like password character requirements or mandated password reset periods.
In this case, Facebook does not treat hiding username usage as a requirement. Their password reset mechanism not only exposes username / phonenumber usage, but ties it to a name and picture. So yes, Facebook returning an error that says credentials are incorrect when it has infrastructure problems is absolutely a defect.
what if, if one service doesnt respond at all or responds with something that doesnt fit an expected format that it would if working correctly, the whole thing just says "sorry, we had an error, try again later"? if it has to check both at the same time, and cant check them independently, wouldn't that solve the vulnerability? or am i missing something? totally understandable if i am, i just want to learn /gen
Yea, the wife came to me in a bit of a panic that her Facebook account got hacked. I tried logging in to FB to check if I had been unfriended, and I also got errors indicating my password was incorrect. My FB password is 96 bits from /dev/urandom in a GPG-based password manager I wrote for myself a couple decades ago. So, no my password wasn't wrong, and I'm not a big enough target for someone to put enough effort into figuring out how to snarf up my password data and crack my GPG passphrase.
Anyway, when FB thought my password was wrong I calmed way down. I thought maybe FB corrupted their password DB or something, so I just tried to reset my password, got into an odd workflow loop, and then quacked "downdetector facebook".
that's actually really cool, i hadnt considered writing my own password manager but i feel like it'd be a fun and fairly useful project, did it take you particularly long to do? i'm interested in giving it a go :D
The heavy lifting is done by GPG in a subprocess, taking information on stdin or outputting the decrypted data on stdout. The rest is just generating the passwords, organizing the encrypted files, and perhaps interacting with the clipboard.
Yes. My spidey sense went off and I told my work I'll be off for an hour while I redo all my passwords... might still do that but glad to know it's not necessarily me getting hacked.
I called out some comment for being racist a little earlier (yeah I know, just report and move on...) and figured they'd managed to pwn my account somehow. Good to know it's not just me.
In an "anything's possible" sense then yeah. But the fact that FB was not letting me login with the credentials I knew to be correct was directly attributed to a global outage, rather than a me-specific issue. Which I can now verify by checking the devices that are authorised to my account.
So you're saying you do own your racism. Well good for you, one of the brave racists -- now we know what kind of a person you really are. But it doesn't mean you're right, it just means your opinion is worthless and you're not worth debating because you're an intellectually dishonest bigot, even worse for believing in scientific racism.
Edit:
Your beloved scientific racism is not reality, it's a pseudoscience, as foolish and wrong as Astrology and Phrenology and Homeopathic Medicine. You're still a intellectually dishonest bigot.
If you're so intellectually honest and sure of yourself, then why don't you state right now unequivocally for the record that you're an unrepentant racist bigot?
Scientific racism, sometimes termed biological racism, is the pseudoscientific belief that the human species can be subdivided into biologically distinct taxa called "races", and that empirical evidence exists to support or justify racism (racial discrimination), racial inferiority, or racial superiority. Before the mid-20th century, scientific racism was accepted throughout the scientific community, but it is no longer considered scientific. The division of humankind into biologically separate groups, along with the assignment of particular physical and mental characteristics to these groups through constructing and applying corresponding explanatory models, is referred to as racialism, race realism, or race science by those who support these ideas. Modern scientific consensus rejects this view as being irreconcilable with modern genetic research.
Scientific racism misapplies, misconstrues, or distorts anthropology (notably physical anthropology), craniometry, evolutionary biology, and other disciplines or pseudo-disciplines through proposing anthropological typologies to classify human populations into physically discrete human races, some of which might be asserted to be superior or inferior to others. Scientific racism was common during the period from the 1600s to the end of World War II, and was particularly prominent in European and American academic writings from the mid-19th century through the early-20th century. Since the second half of the 20th century, scientific racism has been discredited and criticized as obsolete, yet has persistently been used to support or validate racist world-views based upon belief in the existence and significance of racial categories and a hierarchy of superior and inferior races.
And if your grandmother had wheels, then she'd be a bicycle.
Since you just don't get it, and you're such an intellectually dishonest unrepentant racist whose opinions are so worthless they should be dismissed, I will explain it for you:
A statement in this form is always true:
"If <something that is false>, then <anything in the world you want to make up, true or false, no matter how stupid of implausible>."
Because <something that is false> like "If the opinion is based on facts that are true but inconvenient" means that you can say anything you like after that, such as "the intellectual dishonesty is dismissing them", and the entire statement is true, because the condition is false.
I know that's going to whoosh right over your head, but in other words, it's false that your opinion is based on facts that are true but inconvenient. Your opinion is based on lies and pseudoscience, and it is false, which is inconvenient for you.
Gino D'Acampo "If my Grandmother had wheels she would have been a bike" -18th May 2010:
Same same. I went through the password reset flow (I was overdue anyways), it never sent anything to my SMS, so I did it again with email, reset the password and went to log in with the new password, "Incorrect password" error. Old password, also incorrect.
Didn't help that I had just posted a lukewarm spicy take on how linguistic prescriptivism is BS.
All the while the website felt like it was unstable, hard to describe, but it felt like it was bouncing around between URLs too much and reloading a lot.
Definitely feels like a botched update on their end.
E: Instagram is misbehaving as well, banner loads but big "Something is wrong" error on the feed.
E: now youtube has "Something went wrong" - WTF. I can't believe I'm saying this, but thank goodness for reddit and X[itter]???
E: interesting, seeing a big spike across multiple platforms on downdetector, including AWS: https://downdetector.com/status/aws-amazon-web-services/ I'm not able to log in right now, but that could be PEBCAK, I have too many saved IDs and I don't want to fail2ban myself
downdetector reports has gone down but to me is still bugged out, been catching a livestream on youtube all along though, meta stocks are back up from the dip so I take it some regions are restored to normality
I heard for a while Netflix would fail open if auth was unavailable. Like it’s just movies just let em see it.
Facebook data is more sensitive. Not so much the data people go there to see, cool memes that their friends liked, but the list of friends and interests.
Other places I worked had the ability for Ops to push out a change saying the site was down for maintenance. After a while we stopped using it and just took the hit of a bunch of 5xx errors. Basically when the planned down times became shorter than the time to propagate the down setting.
Likewise, started password reset process that won't complete, asked my wife to double check my account wasn't compromised and posting cryptocurency crap or somesuch.
On a psychological note, I think the threat detection part of our brain doesn't always notify our conscious thought that it's actively monitoring for threats. I've often noticed that when I'm carefully handling a hot frying pan then my ringing phone is more likely to startle me than usual.
That makes sense. I've noticed too that my brain seems to have a threat pre-emption module as well as a threat reaction module. For example, I'll sometimes be walking and texting at the same time, only to stop in my tracks and suddenly realize that there's a hidden stair in front of me.