Here's what Oracle says for JDK8: "When the number of entries in the hash table ...

dhosek · on Oct 17, 2021

Based on some benchmarking details¹ for my application, it appears that Rust is willing to completely fill its hashmap. It may be that my use case is somewhat unusual in that I'm expecting most requests to not return a value,² but I found that just specifying a larger initial capacity dramatically sped up the application (I got a bigger speed boost from that than I did from switching from SipHash to FNV).

1. https://www.finl.xyz/2021/09/30/building-a-trie-in-rust-opti...

2. And since this won't be the only char-indexed map in the application, I will have to revisit this.

tialaramex · on Oct 17, 2021

The HashMap is "completely filled" in the sense that you get to put N items in a HashMap with capacity N, because that's specifically how Rust's HashMap chooses to define capacity as I explained above, so it's a tautological explanation.

The "real" capacity and load factor of the underlying data structure is not presented to you. Swisstables (the current implementation) can't be filled entirely, they need an empty slot to function correctly. There is some trickery to ensure that very small Swisstables get to have an "empty" slot that doesn't really exist, and thus doesn't need RAM. Since you can't ever write to it the non-existence doesn't matter. A Rust HashMap::with_capacity(0) is guaranteed not to allocate any actual heap storage, just like String::new() don't need heap to store the empty string inside it.

xxs · on Oct 17, 2021

>Rust is willing to completely fill its hashmap.

Which is the correct decision for lower size(s). I use dynamic fill factor with 1 being the default with <= 16 elements.

dhosek · on Oct 18, 2021

It depends on how you're using the hash lookup though. In my case, most lookups would be expected to not return a value, so completely filling the small hashmap will cause retrievals to be slower since it has to keep going until it determines that there's no match.

xxs · on Oct 18, 2021

Yes, of course. There are many use cases. Yet, in practical terms - most (90%) of look ups do return a value. For 'small' N, you can consider everything O(1). If you do know the use case - like most look ups fail, you can pick a proper impl. and tune it.

Yet again, very few cycles are spent on small hash table look ups in general. Yet, a lot of (like really a lot) is spent on small hash table wasted memory.

If you care about performance: measure, measure, measure... and know what you measure.