An understated benefit of merge sort is that all of its access patterns have exc...

ErikCorry · on Jan 16, 2023

This is true and actually the animations probably understate it because they sort all the 4-element sections, then merge to 8-element sections etc.

The actual implementation in Toit does depth first, so eg. the leftmost 8,8->16 merge is done as soon as the 8-element ranges are ready.

sakras · on Jan 16, 2023

Locality is good during merge but there’s no parallelism - there’s a data dependency between the previous and next iteration because we don’t know which pointer we’re supposed to advance. I think it’s still unclear whether this necessarily is compensated for by the cache locality

halayli · on Jan 16, 2023

Quite the opposite. Merge sort is much more susceptible to cpu cache misses. You have to go out of your way to optimize mere sort to avoid cache issues. But the devil is in the details and it really depends on the arch, the dataset you're sorting (numbers vs strings makes a big difference), cores and cache levels available, memory available.

ErikCorry · on Jan 16, 2023

In practice, to avoid worse case quadratic time, you have to have a rather involved pivot selection in Quicksort. Eg nine random or evenly distributed elements which you put in three groups of three and take the median of the median.

Collecting those nine elements looks terrible for locality.

Funnelsort is a cache oblivious variant of merge sort which is provably cache optimal in some sense. I haven't tried implementing it.

gpderetta · on Jan 16, 2023

To handle quadratic edge cases you would switch to heapsort when the recursion depth goes above the threshold. I.e. introsort; no need for convoluted pivot selection methods.

ErikCorry · on Jan 16, 2023

That's usually called Introsort, but I think you'll find that most of these still consider at least the middle element for a pivot, so they still have a more complex cache behaviour than the naïve linear sweep from both ends of a simple Quicksort.

https://en.wikipedia.org/wiki/Introsort#Implementations

And heapsort has terrible cache behaviour.

Laakeri · on Jan 16, 2023

It's enough to just take one random element and use it as a pivot.

bee_rider · on Jan 16, 2023

In the rare event that you have to sort a big stack of paper, merge sort is also super intuitive and easy to apply by hand.

somat · on Jan 16, 2023

At one point I used to work at a place that had a production process that used several hundred ibm 3480(if I remember correctly) tape cartridges. I would do a LSB radix sort afterwards to get them back in order.

I think back on it with a certain amount of nostalgia now, but it was a stupid way to do operations, even at the time, They had invested big in computers in the 70's and were still maintaining the same tech stack in the early 00's when I worked there.

HelloNurse · on Jan 16, 2023

Merging stacks of papers by hand isn't the same as exchange-based computer mergesort: if the cost of inspecting and comparing items is negligible relative to actually moving them the basic, unit cost operation is splitting a sequence into three possibly empty sequences and splicing the middle piece into an arbitrary position in another sequence. In some cases reversing the spliced sequence is free ("pancake" sorting).

This cost model is an important reason to merge bottom up, from sorted subsequences, without arbitrary splitting that can add work but not reduce it.

dataflow · on Jan 16, 2023

I like to think so but I'm not sure that's true. At least I don't think I've ever seen anyone divide a stack in half, then divide the left one in half, then divide the left one of that in half, etc. as if they're doing recursive mergesort. I feel like most people start from the first card and build upward, either with insertion sort or with something like iterative mergesort.

bee_rider · on Jan 16, 2023

What I’ve done is: Grading data structures exams (appropriately enough), I’d grab a handful of them, and grade them, and then sort the little stack. Then I’d put it aside to deal with later.

Eventually I realized I was running out of desk space, so I started merging little stacks of around equal size.

So the base case wasn’t 1 element, but that’s conventional for actual implementations. And the splitting wasn’t quite recursive I guess. But partial credit at least.

makeset · on Jan 16, 2023

Because without the insertion cost of fixed locations, insertion sort is also NlogN: just a binary search per element.

bee_rider · on Jan 16, 2023

Hey wait, good point, what the heck kind of data structure even is a stack of papers? You can insert like a linked list but hop around like an array.

adwn · on Jan 16, 2023

Only for reasonably small stacks. Insertions at random indices won't stay O(1) when you're dealing with 100k pages and more – i.e., when you can't pick them up in one movement.

bee_rider · on Jan 16, 2023

I guess such a stack would fall over quickly, resulting in a heap. But again not the programming one.

adrianmonk · on Jan 16, 2023

Ironically, the answer isn't a stack.

derefr · on Jan 16, 2023

I think most people would shortcut the "recursive task-spawning" part by just taking all the pages and spreading them out individually side-by-side on a work surface.

dataflow · on Jan 16, 2023

That's why I said it's like the iterative version of mergesort, which goes bottom-up. People don't divide things in half like in traditional recursive mergesort, is what I was saying.

dhosek · on Jan 16, 2023

When I taught, I would sort graded papers/tests using quick sort. Put all the a–l in on pile and m–z in the other, then do each pile into halves and manually sort the quarters. And since I knew what the sorted set looked like, finding pívot points was trivial.