Hacker Newsnew | past | comments | ask | show | jobs | submit | jmpeax's commentslogin

I like the surface dots like it is. It gives me two points of reference at the poles, and adds intuition for how long it takes to go around the sphere.

From that wikipedia article, delta is the ratio of y variance to x variance. If x variance is tiny compared to y variance (often the case in practice) then will we not get an ill-conditioned model due to the large delta?

If you take the limit of delta -> infinity then you will get beta_1 = s_xy / s_xx which is the OLS estimator.

In the wiki page, factor out delta^2 from the sqrt and take delta to infinity and you will get a finite value. Apologies for not detailing the proof here, it's not so easy to type math...


Don't get me started on "software architect".


On classic big waterfall projects, you can find actual architects. Those are the ones drafting interfaces and delineating components/teams before the first source file is even committed.


Actual architects design buildings.


I'm sorry. My fault for engaging you, I guess.


Even "code monkey" is generous.


How is blocking ad blockers going to make them $150m?


> They typically need to compare many or all points to each other, leading to O(N²) complexity.

UMAP is not O(n^2) it is O(n log n).


Thanks for your comment! You are right, Barnes-Hut implementation brings UMAP down to O(N log N). I should have been more precise in the document. The main point is that even O(N log N) could be too much if you run this in a browser.. Thanks for clarifying!


If k=50, then I'm pretty sure O(n log n) beats O(nk).


You are strictly correct for a single pass! log2(9000)~13, which is indeed much smaller than k=50. The missing variable in that comparison is Iterations. t-SNE and UMAP are iterative optimisation algorithms. They repeat that O(N log N) step hundreds of times to converge. My approach is a closed-form linear solution (Ax=b) that runs exactly once. So the wall-clock comparison is effectively: Iterations * (N log N) VS 1 * (N *k) That need for convergence is where the speedup comes from, not the complexity class per se.


Polars made the mistake of not maintaining row order for all operations, via the False-by-default argument of maintain_order. This is basically the billion-dollar null mistake for data frames.


Yeah that really should have been default. Very big footgun, especially when preserving ordering is default in pandas, numpy, etc. And especially when there is no ingrained index concept in polars, people might very well forget that one needs to have some natural keys and not rely on ordering. One needs to bring more of an SQL mindset.


> always respect human dignity even when nasty players try to make a dirty move against you

What a gem of a quote. A great way to avoid becoming a bitter person.


> does not provide any concrete proof, but it confirms many people's suspicions

Without proof there is no confirmation.


Formally? Sure. In the current zeitgeist it’s more than enough to start pointing fingers around, etc.


The pro version comes with "Professional-grade creative suite", but they don't tell you what you're actually getting. It's just opaque corporate-speak one-liners "Make real progress toward your goals".


Except on figure 1 they're all at 0, making it look like the authors didn't know how to use the models or deliberately made them do nothing.


I think it just looks that way because they used a linear x axis for comedic effect.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: