Yes, but actually small can often be much larger then 100-300 depending on the s...

jcelerier · on Nov 16, 2023

... Or much smaller, I remember a benchmark for a case I had a few years ago, the cutoff I had for a map being faster than std::vector/array & linear probing was closer to N=10 that 100.

(Not std::map, at the time it must have been something like tsl:: hopscotch_map).

Note also that nowadays for instance boost comes with state-of-the-art flat_map and flat_unordered_map which gives both the cache coherency for small sizes and the algorithmic characteristics of various kinds of maps

gpderetta · on Nov 16, 2023

Exactly. I measured vector vs an efficient linear probing hash map very recently and the cutoff was single digit. Even against unordered_map or plain std::map the cutoff was surprisingly low (although in this case I would trust a synthetic benchmark significantly less).

Arelius · on Nov 16, 2023

Sure, the point here is with all the specific context specific details, your most likely comparing apples versus oranges. So simple complexity analysis or a general rule without benchmarks and a good understanding of the system and how it interacts with your details is not going to solve your problem.

menaerus · on Nov 16, 2023

So you're saying that if I had had to store 100 elements in the memory, I would be better off using hashmap instead of vector/array? What type of elements did you use in your experiment or how large they were and what was your access pattern?

Someone · on Nov 16, 2023

A successful search in a vector will do, on average, 50 comparisons, while the hash map version would hash the key, look up the bucket, typically find a single-item in that bucket (with only a 100 items in the hashtag, hash collisions will be highly unlikely), and do a single comparison.

For an unsuccessful search, the vector version would do 100 key comparisons, and the hashtag would do a single hash, lookup the bucket, and almost certainly find it empty.

So, if you make the comparison function relatively expensive, I can see the hash map being faster at search.

Even relatively short string keys might be sufficient here, if the string data isn’t stored inline. Then, the key comparisons are likely to cause more cache misses than accessing a single the bucket.

Of course, the moment you start iterating over all items often, the picture will change.

menaerus · on Nov 17, 2023

Searching the vector is literally incrementing a pointer over the data. The number of instructions needed to do the search is very small - e.g. ~15. This means that it can very easily fit into the CPU uOp cache but also makes it a candidate for the LSD cache. Both of those will be a major factor in hiding the latencies or getting rid of them in the CPU frontend fetch-decode pipeline, effectively making all the for-loop iterations only left to be bound by the CPU backend, or more specifically, branch-mispredictions (aka ROB flushing) and memory latencies.

Given the predictable access nature of vectors and their contiguous layout in the memory, the CPU backend will be able to take advantage of those facts and will be able to hide the memory latency (even within the L1+L2+L3 cache) by pre-fetching the data on consecutive cache-lines just as you go through the loop. Accessing the data that resides in L1 cache is ~4 cycles.

The "non-branchiness" of such code will make it predictable and as such will make it a good use of BTB buffers. Predictability will prevent the CPU from having to flush the ROB and hence flushing the whole pipeline and starting all over again. The cost of this is one of the largest there are within the CPU and it is ~15 cycles.

OTOH searching the open-addressing hashmap is the super-set of that - e.g. almost as if you're searching over an array of vectors. So, only the search code is: (1) By several factors larger, (2) Much more branchy, (3) Less predictable and (4) Less cache-friendly.

Algorithmically speaking, yes, what you're saying makes sense, but I think the whole picture can only be made once the hardware details are also taken into account. Vector approach will literally be only bound by the number of cycles it takes to fetch the data from L1 cache and I don't see that happening for a hash-map.

maccard · on Nov 16, 2023

No, benchmark it for your particular type, and decide based on that.

menaerus · on Nov 16, 2023

I think you're missing my point. I'm highly suspicious, or let's say intrigued, under what conditions one can come up with such conclusion. Therefore I asked for a clarification.