> The risk of ZScaler being a central point of failure is not considered. But - the risk of failing the compliance checkbox it satisfies is paramount.
You're conflating Risk and Impact, and you're not considering the target of that Risk and that Impact.
Failing an audit:
1. Risk: high (audits happen all the time)
2. Impact to business: minimal (audits are failed all the time and then rectified)
3. Impact to manager: high (manager gets dinged for a failing audit).
Compare with failing an actual threat/intrusion:
1. Risk: low (so few companies get hacked)
2. Impact to business: extremely high
3. Impact to manager: minimal, if audits were all passed.
Now, with that perspective, how do you expect a rational person to behave?
[EDIT: as some replies pointed out, I stupidly wrote "Risk" instead of "Odds" (or "Chance"). Risk is, of course, the expected value, which is probability X impact. My post would make a lot more sense if you mentally replace "Risk" with "probability".]
Moreover no manager gets dinged for "internet-wide" outages unfortunately, so the compliance department keeps calling the shots. The amount of times I've had to explain there's no added security in adding an "antivirus" to our linux servers as we already have proper monitoring at eBPF level is annoying.
I'd be fired if I caused enough loss in revenue to pay my own salary for a year.
I am responsible for my choices. I'm CTO, I don't doubt that in some cases execs cover for each other, but at least I have anecdotal experience of what it would take for me to be fired- and this is clearly communicated to me.
Hope you get paid a lot! Otherwise you are either in a very young or very stupid job.
I regularly spend multiples of my salary every month on various commitments my company makes, any small mistake could easily mean that its multiples of my salary type of problem within 10 days.
A friend of mine spent half a million on a storage device that we never used. It sat in the IT area for years until we were acquired. Everyone gave him so much shit. Finance asked me about it numerous times (going around my friend the CTO) so they could properly depreciate it. He didn't get dinged by the board at all. It remained an open secret. We were making million dollar decisions once a month, though.
> I regularly spend multiples of my salary every month on various commitments my company makes.
Yeah, same here.
But if I choose a vendor and that vendor fails us so catastrophically as to make us financially insolvent, then it's my job to have run a risk analysis and to have an answer for why.
If it's more cost effective to take an outage, that's fine, if it's not: then why didn't I have a DRP in place, why did we rely so much on one vendor, what's the exposure.
It's a pretty important part of being a serious business person.
Sure, but that's not what I said or you said, and my commentary was about relative measures of your salary to your budget.
If you can't make a mistake of your salary size in your budget then your budget is small or very tight, most corporations fuck up big multiples of their CTOs salary quarterly (but that turns out to be single digit percentage points of anything useful.)
> I'd be fired if I caused enough loss in revenue to pay my own salary for a year.
I'm not so sure.
I know of a major company that had a glitch, multiple times, that caused them to lose about ~15 million dollars at least once (a non-prod test hit prod because of a poorly designed too).
I was told the decision-makers decided not to fix the problem (the risk of losing more money again) because the "money had already been lost."
"no manager gets dinged for "internet-wide" outages"
Kind of like, nobody gets fired for hiring IBM, or using SAP. They are just so big, every manager can say, "look how many people are using them, how was I supposed to know they are crap".
But, seems like for uptime, someone should be identifiable. If your job is uptime, and there is a world wide outage, I'd think it would roll down hill onto someone.
> Kind of like, nobody gets fired for hiring IBM, or using SAP. They are just so big, every manager can say, "look how many people are using them, how was I supposed to know they are crap".
I wouldn't necessarily say IBM or SAP are "crap". It's much more likely that orgs buying into IBM or SAP don't the due diligence on what the true costs to properly set it up and keep it running, therefore cut tons of corners.
They basically want to own a Ferrari and when it comes to maintenance, they want run Regular gas and try to get their local mechanic to slap Ford parts on it because its too expensive to keep going back to the dealership.
The thing is usually this argument goes something like this:
A: Should prod be running a failover / <insert other safety mechanism>?
B: Yes!
A: This is how much it costs: <number>
B: Errm... Let me check... OK I got an answer, let's document how we'd do it, but we can't afford the overhead of an auto-failover setup.
And so then there will be 2 types of companies, the ones that "do it properly" will have more costs, their margins will be lower, over time they'll be less successful as long as no big incident happens. When a big incident happens though, for most businesses - recent history proves that if everyone was down, nobody really complains. If your customers have 1 vendor down due to this issue, they will complain, but if your customers have 10 vendors down, and are themselves down, they don't complain anymore. And so you get this tragedy of the commons type dynamic where it pays off to do what most people do rather than the right thing.
And the thing is, in practice, doing the thing most people do is probably not a bad yardstick - however disappointing that is. 20 years ago nobody had 2FA and it was acceptable, today most sites do and it's not acceptable anymore not to have it.
Parents may teach this to kids but the kids usually notice their parents don't practice what they preach. So they don't either.
The world is filled with people following everybody else off a cliff. If you're warning people or even just not playing along in a time of great hysteria, people at best ignore your warnings and direct verbal abuse at you. At worst, you can face active persecution for being right when the crowd has gone insane. So most people are cowards who go along to get along.
I think the parent was correct in the use of the word "Risk"; it's different than your definition, which appears to be closer to "likelihood".
Risk is a combination of likelihood and impact. If "risk" were just equivalent to "likelihood" then leaving without an umbrella on a cloudy day would be a "high-risk situation".
A rational person needs to weigh both the likelihood and impact of a threat in order to properly evaluate its risk. In many cases, the impact is high enough that even a low likelihood needs to be addressed.
ZScaler and similar software also has some hidden costs: Performance and all the other fun that comes with a proxy between you and the server you connect to.
> What I'm saying is that the business's interests are not aligned with the people comprising that business.
Yep, that's the point of capitalism.
> In that regard, what "the business" wants is irrelevant.
And yet here we are. Companies get fined left and right for breaching rules but it's ok because it earned them money. There are literal plans made to calculate whether it's profitable to cheat or not. In the current system, what the business wants always wins over individual qualms, unfortunately.
Because the punative system in most countries doesn't affect individuals. As a manager, you're not going to jail for breaking environmental laws, a different entity (the company) is paying for being caught. So, it's still the rational thing to do to break the environment laws to make your groups numbers go up and get a promo or bonus.
Almost correct, but you mean 'chance' where you write 'risk':
Risk = Chance × Impact
The chance of failing an audit initially are high (or medium, present at least). The impact is usually low-ish. It means a bunch of people need to fix policy and set out improvement plans in a rush. It won't cost you your certification if the rectification is handled properly.
It's actually possible that both of your examples are awarded the same level of risk, but in practice the latter example will have its chance minimized to make the risk look acceptable.
> Now, with that perspective, how do you expect a rational person to behave?
They'd deploy the software on the critical path. That's exactly GP's point, isn't it? That's why GP explicitly wants us to shift some of the blame from the business to the regulators. GP advocates for different regulatory incentives so that a rational person would then do the right thing instead of the wrong thing.
I’m at risk of sounding like chicken little, the reality is companies are getting popped all the time - you just don’t hear about them very often. The bar for media reporting is constantly being raised to the point where you only hear about the really big ones.
If you read through any of the weekly Risky Biz News posts [1] you’ll often see a five or more highly impactful incidents affecting government and industry, and they’re just the reported ones.
I wonder how much that's still true now that ransomware has apparently become viable.
Finding an insecure target, setup the data hostage situation, have the victim come to pay is scalable and could work in volume. If getting small money from a range of small targets becomes profitable, small fishes will bear sinilar risks to juicier targets.
But...surely you're also missing another point of consideration:
Single point of failure fails, taking down all your systems for an indeterminate length of time:
1. Risk: moderate (an auto-updating piece of software without adequate checks? yeah, that's gonna fail sooner or later)
2. Impact to business: high
3. Impact to manager: varies (depending on just how easy it is to spin the decision to go with a single point of failure rather than a more robust solution to the compliance mandate)
> 3. Impact to manager: minimal, if audits were all passed.
I don't know about you, but I'll be making sure everyone knows that the manager signed off on the spectacularly stupid idea to push through an update on a friday without testing.
You're conflating Risk and Impact, and you're not considering the target of that Risk and that Impact.
Failing an audit:
1. Risk: high (audits happen all the time)
2. Impact to business: minimal (audits are failed all the time and then rectified)
3. Impact to manager: high (manager gets dinged for a failing audit).
Compare with failing an actual threat/intrusion:
1. Risk: low (so few companies get hacked)
2. Impact to business: extremely high
3. Impact to manager: minimal, if audits were all passed.
Now, with that perspective, how do you expect a rational person to behave?
[EDIT: as some replies pointed out, I stupidly wrote "Risk" instead of "Odds" (or "Chance"). Risk is, of course, the expected value, which is probability X impact. My post would make a lot more sense if you mentally replace "Risk" with "probability".]