That’s not really a trick? It’s an open addressing hash table (https://en.wikipe...

scrozart · on June 14, 2023

Yup. This is how they're taught in CS undergrad (as of 10 years ago, anyway).

sureglymop · on June 14, 2023

This is how it was taught to me in a CS undergrad class last week. What we didn't look at though was, what happens when the hash table is full? Does everything get rehashed into a bigger table? What are the Performance implications of that? I wish it would have gone just a little further.

jlokier · on June 14, 2023

Even before it's full, an open addressing hash table (also called closed hashing - confusing names!) becomes very slow as it gets close to full, because most of the table needs to be scanned on inserts, and on queries for the recently inserted vales.

When it's full, insertion requires rehashing into a bigger table. (Or you can use an overflow list when the table is full or nearly full, if you don't want to rehash and prefer to treat this case as rare and ok to be slow.)

So for a general purpose open addressing / closed hashing hash table where you don't know the number of items in advance, rehashing to a bigger table is done at some threshold before it's actually full.

As long as you do rehashing using a multiplier, for example doubling the size at 50% full (as opposed to adding a fixed amount to the size), it stays fast on average, but the rehashing operations are of course individually slow. They just don't happen often, so the average stays at O(1) time. This is called amortised O(1) time complexity. The ideal threshold for resizing depends on the probing type.

Linked list hash tables, (also called closed addressing or open hashing - I think linked list is a clearer name) don't suffer from catastrophic slowdown with size in the same way as the open addressing / closed hashing type. Nor abrupt failure when full, because they're never full. These can still be resized and this is required if O(1) time is required to arbitrary numbers of items, but the resizing threshold is not as critical. If not resized they gradually degrade to O(N) time performance when filled too much. Many applications use fixed size linked list hash tables safely because of this.

Array-of-arrays hash tables behave similarly to linked list hash tables.

sureglymop · on June 14, 2023

Thank you for your comment, it is really informative and I appreciate it a lot. We looked at the different approaches and I asked my teacher if there is no in-between method, that is, open address hashing combined with linked list hashing where e.g. after an arbitrary amount of elements in the linked list the table is scanned to find the next possible place. I'm sure many things have been tried before. What I don't get is, when is the rehashing done? After a specific insert operation? So then that insert operation will be much slower than all others? I understand that it doesn't matter if the runtime has the same upper bound but it's just not what I instinctively would have thought.

masklinn · on June 14, 2023

> Does everything get rehashed into a bigger table?

Usually yes, though it’s also possible to perform gradual resizing that’s less efficient so it’s only used when needed (e.g. real time systems which can’t afford a full resize).

> What are the Performance implications of that?

Same as a vector. That’s why hashmaps generally provide an amortized insertion time.

> What we didn't look at though was, what happens when the hash table is full?

Of note: a hash table never gets full, hash tables have a load factor (which mostly depends on the collision resolution algorithm, as that informs how performances degrade as collisions increase), and the table will resize when it exceeds its load factor, to maintain a “healthy” rate of collisions.

tomgp · on June 14, 2023

And 25 years ago :D (though we did the linked list implementation too)