The data stops before it gets to useful sizes (10e7 and up). How are people impl... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		inciampati on June 10, 2023 \| parent \| context \| favorite \| on: A performance analysis of Intel x86-SIMD-sort (AVX... The data stops before it gets to useful sizes (10e7 and up). How are people implementing sorting algorithms not routinely working at 10e9-10e12 where the workload is actually a bottleneck?

henrydark on June 10, 2023 [–]

I run ml algorithms like boosted trees (i.e xgboost) on data sets with 30k-1m rows and 200-2k columns. Sorting is the bottleneck, it's what the algorithm does. I doubt I'm special, and I'm sure these size are common

nwmcsween on June 11, 2023 | [–]

IIRC the average qsort len is less than 20 according to debian code search.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact