Sure, but if your doing work in machine learning that’s generally not the terminology used, hinting that this isn’t the area the author specializes in (which isn’t a bad thing, but take their explanations with a grain of salt).
Also, about the non-determinism issue, there was a post some time ago and that comes from the way the GPU does the calculations, something something floating point something.
So of course the algorithm is deterministic, but the real-life implementation isn't.
Floating point addition, for example, is not associative, so the order of taking a sum affects the result. If the summation were sequential and single threaded, it would be deterministic. But it happens in parallel, so timing variations affect the result.
But there is probabilistic sampling that happens (see "temperature").
> Floating point addition, for example, is not associative, so the order of taking a sum affects the result. If the summation were sequential and single threaded, it would be deterministic. But it happens in parallel, so timing variations affect the result.
In this sense, I don't think it's fair to say floating point math is non-deterministic, as much as parallel computation is non-deterministic. FP behaves in unexpected ways, but the same order of operations always yields the same unexpected results (except on Pentium 1).
Electricity, Cars, and Gas were once upon a time a luxury as well - reserved to those who could afford them or had unique access / credentials / skills. The people who were able to simplify and spread the advanced tech to the common person became Billionaires.