Status pages usually start as a human-updated thing because it's easy to implement.
Some time later, you might add an automated check thing that makes some synthetic requests to the service and validates what's returned. And maybe you wire that directly to the status page so issues can be shown as soon as possible.
Then, false alarms happen. Maybe someone forgot to rotate the credentials for the test account and it got locked out. Maybe the testing system has a bug. Maybe a change is pushed to the service that changes the output such that the test thinks the result is invalid. Maybe a localized infrastructure problem is preventing the testing system from reaching the service. There's a lot of ways for false alarms to appear, some intermittent and some persistent.
So then you spread out. You add more testing systems in diverse locations. You require some N of M tests to fail, and if that threshold is reached the status page gets updated automatically. That protects you from a few categories of false alarms, but not all of them.
You could go further to continue whacking away at the false alarm sources, but as you go you run into the same problem of service reliability, where each additional "9" costs much more than the one that came before. You reach a point where you realize the cost of making your automatic status page updates fully automatically accurate becomes prohibitive.
So you go back to having a human assess the alarm and authorize a status page update if it is legitimate.
Some time later, you might add an automated check thing that makes some synthetic requests to the service and validates what's returned. And maybe you wire that directly to the status page so issues can be shown as soon as possible.
Then, false alarms happen. Maybe someone forgot to rotate the credentials for the test account and it got locked out. Maybe the testing system has a bug. Maybe a change is pushed to the service that changes the output such that the test thinks the result is invalid. Maybe a localized infrastructure problem is preventing the testing system from reaching the service. There's a lot of ways for false alarms to appear, some intermittent and some persistent.
So then you spread out. You add more testing systems in diverse locations. You require some N of M tests to fail, and if that threshold is reached the status page gets updated automatically. That protects you from a few categories of false alarms, but not all of them.
You could go further to continue whacking away at the false alarm sources, but as you go you run into the same problem of service reliability, where each additional "9" costs much more than the one that came before. You reach a point where you realize the cost of making your automatic status page updates fully automatically accurate becomes prohibitive.
So you go back to having a human assess the alarm and authorize a status page update if it is legitimate.