"When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets."
and it explicitly tells you that HashMap will default to 0.75 load factor, therefore the documentation is telling you that HashMap(400) is not a suitable container for 400 items but only up to 300 items.
Now, it's true that e.g. Rust's HashMap::with_capacity(400) is actually a container which promises it's suitable for 400 items‡ and that's more ergonomic, but I don't think we can call Java's choice here a mistake, it's just harder to use correctly.
‡"The hash map will be able to hold at least capacity elements without reallocating".
Based on some benchmarking details¹ for my application, it appears that Rust is willing to completely fill its hashmap. It may be that my use case is somewhat unusual in that I'm expecting most requests to not return a value,² but I found that just specifying a larger initial capacity dramatically sped up the application (I got a bigger speed boost from that than I did from switching from SipHash to FNV).
The HashMap is "completely filled" in the sense that you get to put N items in a HashMap with capacity N, because that's specifically how Rust's HashMap chooses to define capacity as I explained above, so it's a tautological explanation.
The "real" capacity and load factor of the underlying data structure is not presented to you. Swisstables (the current implementation) can't be filled entirely, they need an empty slot to function correctly. There is some trickery to ensure that very small Swisstables get to have an "empty" slot that doesn't really exist, and thus doesn't need RAM. Since you can't ever write to it the non-existence doesn't matter. A Rust HashMap::with_capacity(0) is guaranteed not to allocate any actual heap storage, just like String::new() don't need heap to store the empty string inside it.
It depends on how you're using the hash lookup though. In my case, most lookups would be expected to not return a value, so completely filling the small hashmap will cause retrievals to be slower since it has to keep going until it determines that there's no match.
Yes, of course. There are many use cases. Yet, in practical terms - most (90%) of look ups do return a value. For 'small' N, you can consider everything O(1).
If you do know the use case - like most look ups fail, you can pick a proper impl. and tune it.
Yet again, very few cycles are spent on small hash table look ups in general. Yet, a lot of (like really a lot) is spent on small hash table wasted memory.
If you care about performance: measure, measure, measure... and know what you measure.
"When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets."
and it explicitly tells you that HashMap will default to 0.75 load factor, therefore the documentation is telling you that HashMap(400) is not a suitable container for 400 items but only up to 300 items.
Now, it's true that e.g. Rust's HashMap::with_capacity(400) is actually a container which promises it's suitable for 400 items‡ and that's more ergonomic, but I don't think we can call Java's choice here a mistake, it's just harder to use correctly.
‡"The hash map will be able to hold at least capacity elements without reallocating".