Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The clustering can still happen with any walk. The dumb walk is the best, but it is the most prone to clustering if the hash is simplistic. But if the hash is pseudo-random, it's as likely to get clusters as any other walk, and has better locality.


A direct addressing scheme with fixed cache line sized and aligned buckets [typically 8 machine words/entries] incurs precisely 1 cache line access. I think the strawman here is the "linked-list" bucket. If we even the playing field -- a fixed sized pre-allocation of space which open addressing certainly requires -- then we can dispense with a dynamically sized bucket [and the linked-list].

So then the issue is space efficiency of direct addressing schemes. Here n-choice (2 is optimal actually) hashing provides for excellent loading at the cost of (precisely) 2 [n] cache line accesses. Your linear probing for an empty slots is likely to incur that on average if not worse.

Cuckoo hashing, imo, is a hybrid that wants the deterministic access times of direct addressing (2 locations on reads) at the cost of variable times on writes (which are not worst than pure open addressing).


Since Robin hood hashing is a thing, you can even have assurances that any given "cluster" will still have good performance.

Robin-hood hashing is to "steal from the rich" (aka: if someone is 2-away from their ideal spot, and you're at 5-away from your ideal spot, you take the slot and make the 2-away "worse").




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: