Hacker Newsnew | past | comments | ask | show | jobs | submit | kedihacker's commentslogin

I think a audited algorithm where each type is strictly defined like int32 added to that really help with what exactly should be inputted to it so it remains correct.


I don't think aggregating the whole platform into one number is fair. It's like adding the whole aws into one number


On the other hand when you have a reasonably complex deployment it's easy to get swamped with dashboards showing CPU, Memory, I/O, application-metrics, signups, active users/sessions, etc.

Instead it's nice to think about how you can express the state of a complete system as a single number. It might be you divide active user sessions by database-connections, and then scale by memory capacity.

But as a single digit you can then get used to normal ranges, and have it always visible somewhere obvious. A single number won't show details, but when it changes you can go look at the specific metrics. It's a cute shorthand, and it can work well as a basic "are we normal" check.


splitting the status page like they do, to the point where it is only a bit of humourous exaggeration to say that they track broken `git push` and `git pull` separately, is a sleight of hand / accounting / SLA-fudging that we should not excuse

there is a subset of the site that pretty much everyone uses — git, issues, pull requests, actions — and if any part of that is broken then the site is broken and the status page should indicate how often this happens


> splitting the status page like they do, to the point where it is only a bit of humourous exaggeration to say that they track broken `git push` and `git pull` separately, is a sleight of hand / accounting / SLA-fudging that we should not excuse

This is a pretty ungenerous take. You could look at it the other way: if I don't use actions then it's useful for me to know that only actions are broken, and I can continue in my normal usage. If you bundle everything up then the status page is reporting an unhelpful false positive for me.


you can do both: report a number that shows how often your service as a whole is degraded, with a breakdown for individual components

example (not sponsored, i barely use codex and today's the first time i've ever had to look at this page; i don't know how much they're fudging the individual numbers or not reporting minor incidents):

https://status.openai.com/

most people who use chatgpt don't use all of the components under the "ChatGPT" heading. for codex, i don't use the vscode extension or codex web. etc


It’s obviously a meme website, the meme is more funny when the number isn’t high. Anyone looking for actual accurate info would go to the real status page.


Ironically I’ve never found official status pages to be all that accurate either since companies love to exclude all kinds of outages from counting towards uptime. Anthropic is hilariously egregious about that as a recent example I can think of, but I assume GitHub does the same since it’s so common in the industry.


If S3, EC2, EKS and RDB alone had a similar uptime as all of Github right now, we'd all know.

No one cares that much if repo wikis, commit stats or gist had these issues. It's the combination of inter-dependent services that are used in combination, like PRs, actions, discussions, etc.

If one were to build a single percentage for each of these components of both systems, github would still lose. Maybe it's a few days without outages more but this isn't a comparison.


From a user perspective this makes sense. But if you’re MSFT or GitHub this number is pretty embarrassing.

They would love if everyone on the platform used all of the features and had massive lock-in right? So if some part of that is always broken, it’s not a confidence booster for users to adopt more of the feature set.

Sure the more things you use the more likely it is that one has an issue but clearly stability isn’t a goal for these type of companies anymore.


Why embarrassing? This is normal MSFT


Github has far less services and regions that AWS.


I think the correct middle ground is a site that lets you select the parts of the platform you rely on and ignore the others. For example, GitHub is "down" for me when I can't push, process PRs, or release packages, but I don't care about Actions or AI features.


You’re kind of an outlier - nobody wants AI but Actions are core for tons of workflows and deployment pipelines. Everyone bought into the “only robots can deploy” mantra (correctly IMO, it’s a huge time and friction saver) only to be bit in the ass by the platform being so u reliable they can be stuck for days without deploys.


Thats kind of my point, everyone has a different set of GitHub features they rely on. Some people even want the AI bits.


end user doesn't care, if it don't work it don't work


It can skip but it has 7 more programs to go and it can only know the program after completing the first one so after first one there is no advantage


90% percent of gold is used in jewelry or bars so use value isn't that much unless price is prohibiting use cases.


Jewellery is a use for gold, people like it because it is pretty and shiny and easily worked not just because it is rare.

The artificial scarcity and lack of actual use of bitcoin really isn’t the same.


With banning and deboosting they need to be very accurate but with filtering they can be more liberal in excluding


They can do this with certificate transparency other wise CA can sign whatever date they want. But if they collude with CT that can issue rouge certificates for targeted attacks.


Yes, that's all right, there's already a requirement that they submit to one Google CT log and one non-Google CT log. They thought about it already. The playbook I mentioned they've been rehearsing contains specific threat against backdating certs, they say they'll distrust immediately if they detect, and they have means of detecting backdating on significant scale (esp. for LE, where they submit 100% issued certs, not just the subset that is intended for consumption with Chrome).


Well you need to stop them from getting incorporated into its training data


Only us east 1 gets new services immediately others might do but not a guarantee. Which regions are a good alternative


Well letting tb evolve over time and infect everyone is a lot more dangerous


Well microcontrollers can prevent you from repairing your own device with DRM and secure enclaves


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: