I like the surface dots like it is. It gives me two points of reference at the poles, and adds intuition for how long it takes to go around the sphere.
From that wikipedia article, delta is the ratio of y variance to x variance. If x variance is tiny compared to y variance (often the case in practice) then will we not get an ill-conditioned model due to the large delta?
If you take the limit of delta -> infinity then you will get beta_1 = s_xy / s_xx which is the OLS estimator.
In the wiki page, factor out delta^2 from the sqrt and take delta to infinity and you will get a finite value. Apologies for not detailing the proof here, it's not so easy to type math...
On classic big waterfall projects, you can find actual architects. Those are the ones drafting interfaces and delineating components/teams before the first source file is even committed.
Thanks for your comment! You are right, Barnes-Hut implementation brings UMAP down to O(N log N). I should have been more precise in the document. The main point is that even O(N log N) could be too much if you run this in a browser.. Thanks for clarifying!
You are strictly correct for a single pass!
log2(9000)~13, which is indeed much smaller than k=50.
The missing variable in that comparison is Iterations.
t-SNE and UMAP are iterative optimisation algorithms. They repeat that O(N log N) step hundreds of times to converge.
My approach is a closed-form linear solution (Ax=b) that runs exactly once.
So the wall-clock comparison is effectively:
Iterations * (N log N) VS 1 * (N *k)
That need for convergence is where the speedup comes from, not the complexity class per se.
Polars made the mistake of not maintaining row order for all operations, via the False-by-default argument of maintain_order. This is basically the billion-dollar null mistake for data frames.
Yeah that really should have been default. Very big footgun, especially when preserving ordering is default in pandas, numpy, etc. And especially when there is no ingrained index concept in polars, people might very well forget that one needs to have some natural keys and not rely on ordering. One needs to bring more of an SQL mindset.
The pro version comes with "Professional-grade creative suite", but they don't tell you what you're actually getting. It's just opaque corporate-speak one-liners "Make real progress toward your goals".
reply