Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The data stops before it gets to useful sizes (10e7 and up). How are people implementing sorting algorithms not routinely working at 10e9-10e12 where the workload is actually a bottleneck?


I run ml algorithms like boosted trees (i.e xgboost) on data sets with 30k-1m rows and 200-2k columns. Sorting is the bottleneck, it's what the algorithm does. I doubt I'm special, and I'm sure these size are common


IIRC the average qsort len is less than 20 according to debian code search.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: