> When Facebook's down, people report problems with any site that has "Login with Facebook" as an option.
If users log into your site with Facebook, then the login functionality of your site effectively is down when "Login with Facebook" is down.
From the user's perspective, your subcontractors, including authentication subcontractors, are a problem for you to deal with and never show them. From your perspective, you could have architected your site in a way that logging in doesn't "go down" when Facebook login is down.
If the user chooses "Login with Facebook" over other authentication options available, and they don't want to use other options, educating them with a good error message might help. Or you could remove the Facebook login option, if you (totally reasonably) don't want Facebook's failures to reflect poorly on you.
> If users log into your site with Facebook, then the login functionality of your site effectively is down when "Login with Facebook" is down.
There are plenty of sites where "Login with Facebook" is a convenience but hardly the only way to log in. Reddit, for example, has "Login with Google" and "Login with Apple"; it would be highly misleading to claim "Reddit is down" if Google's OAuth flow was having an outage.
> educating them with a good error message might help
Nothing in the API or OAuth flow would make that doable in an automatic fashion with this outage. It'd have to be something you put up manually as a banner after hearing of the outage.
> Or you could remove the Facebook login option, if you (totally reasonably) don't want Facebook's failures to reflect poorly on you.
I don't particualrly care; we're talking about why DownDetector isn't necessarily ideal for assessing. It can be a useful signal, in some scenarios, but I've seen plenty of spurious signals come from it.
> Nothing in the API or OAuth flow would make that doable in an automatic fashion with this outage. It'd have to be something you put up manually as a banner after hearing of the outage.
That is fair: if I choose to architect my site such that a user-critical feature goes down when a 3rd party service goes down, it behooves me to monitor the 3rd party service and do whatever necessary to properly inform users what's going on.
I edited my post unfortunately after you replied, but another option is removing the parts of your site that rely on 3rd parties, if you don't want the failures of those 3rd parties to reflect poorly on you (which they reasonably would).
>we're talking about why DownDetector isn't necessarily ideal for assessing. It can be a useful signal, in some scenarios, but I've seen plenty of spurious signals come from it.
Indeed, and if a bunch of users say that a feature of your site is down, even if it's a result of a 3rd party failure: chances are, that part of your site is down, and it's partially your fault for relying on a 3rd party for that feature. The users correctly don't care what the root cause is, they expect you to either mitigate it or don't have a feature they rely upon be unreliable.
Ignore the comments on DownDetector for a moment and check out that huge spike in reports recently. Clearly something wrong happened with AWS's user experience. That's something AWS needs to resolve, in the eyes of their users.
>The chart shows a big spike this morning, but there was no AWS outage
Are you sure? If hundreds of users simultaneously reported there was some sort of outage, particularly a huge spike like we saw, chances are there was an outage.
>Again, DownDetector can be a useful "is something unusual happening right now" signal
Exactly! Specifically, "is something unusual happening right now with my site, in the eyes of my users?" Every site owner should know when that condition is true. What you think about your site "up-ness" isn't as important as what your users think about your site "up-ness". What you attribute your downtime to, isn't as important as what your users attribute your downtime to (you.)
> Clearly something is going on with AWS's user experience.
But that's not the case. It's a false positive.
Pick a DownDetector service and open the page every day for a few days. You'll see it most of the time just reflects people waking up in the US timezones.
Is it a false positive, though? The data shows there was an outage. We would need more evidence to conclude hundreds of users, at that 1 spike, weren't actually having issues.
In other words, we have hundreds of people saying there was an outage, and 1 person saying there wasn't.
That's a problem AWS needs to resolve, regardless of what they think might be the root cause. If the users weren't experiencing any issues with AWS, I doubt they'd be reporting it.
Your comment about timing is a good point: if people are working with AWS early in the day, and AWS is giving them problems, then they will probably report problems with AWS early in the day. I wouldn't expect them to report problems while they're sleeping.
Hundreds of users, representing more users who didn't bother reporting, say they experienced issues when interacting with AWS this morning, so we'll need better evidence to the contrary to conclude otherwise.
The fact that some people accessed AWS without reporting issues does not mean that all people did. For those who had issues, AWS is responsible for dealing with those perceptions.
Indeed, it could have been a fault that affected a subset of users, for example 1 service in 1 availability zone. That's still an outage in the eyes of users, which AWS is responsible for managing. It could have been an issue with a route from 1 ISP. That's still an outage in the eyes of users, which AWS is responsible for managing.
An even better example is the DownDetector page for Facebook, with hundreds of thousands of reports. Do we really think there's no correlation between what DownDetector reports and what users experience?
tl;dr: what users think about your site is more important than both what you think about your site and the reality of your site, and you should be tracking it.
If users log into your site with Facebook, then the login functionality of your site effectively is down when "Login with Facebook" is down.
From the user's perspective, your subcontractors, including authentication subcontractors, are a problem for you to deal with and never show them. From your perspective, you could have architected your site in a way that logging in doesn't "go down" when Facebook login is down.
If the user chooses "Login with Facebook" over other authentication options available, and they don't want to use other options, educating them with a good error message might help. Or you could remove the Facebook login option, if you (totally reasonably) don't want Facebook's failures to reflect poorly on you.