Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is the real world use case for a HAMT data structure? Tree based associative arrays (like c++'s std::map) tend to have poor real world performance


They are used to implement Clojure's immutable map and set structures. They are absolutely critical to the philosophy of the language.


std::map and red-black trees are associative & sorted (by key); HAMTs are just plain associative. HAMTs have generally excellent performance: the typical implementations use 32-way branching, which allows for the top levels to be likely in cache.

The best thing about HAMTs is passing immutable ones around, which have free update costs and multi-thread safe. Clojure uses these to represent information that flows through a program. (By free update costs I mean: they are free enough to never rise to the programmer's attention. Obviously ops have costs.)


More efficient (in both CPU and memory) immutable/persistent collections, which have the advantage that they're intrinsically thread-safe. They can also be used to implement lock-free concurrent maps (Ctrie)

> Tree based associative arrays (like c++'s std::map) tend to have poor real world performance

That's something HAMT significantly improve upon. You're not going to get linear performances out of them but basic operations are generally O(ln N) where N is usually 32.


I think it's for an array of size N:

- O(1) - standard array (regardless of N)

- O(log_b(N)) - HAMT (b is 32 in clojure)

Since b is big, log_b(N) is quite small even for big numbers, e.g. log_32(2^20) = 4


Um, isn't that "constant-time in practice"? (Something like log-star.) Am I missing some matter of convention here?


Is there any research on the tradeoff between this kind of thread safety and the penalties incurred on the memory management side (specifically the fact that now you have far more heap allocations with immutable data and they have to synchronize too)?


I don't know of any formal research, but in Clojure, where HAMTs are a fundamental part of the language, the philosophy is more oriented towards paying for things (such as thread safety) with memory and CPU first, benchmarking to find bottlenecks second, and if there are any bottlenecks, optimizing accordingly. It was written with this in mind, which is why it requires world-class VMs like the JVM, CLR, and V8/JSC/SpiderMonkey/etc. that can deal with the GC and (in many cases) make runtime optimizations.

Also, I'm fairly certain synchronizing isn't an issue because there's nothing to synchronize on since the data structures are immutable. Am I understanding you correctly?


> Also, I'm fairly certain synchronizing isn't an issue because there's nothing to synchronize on since the data structures are immutable. Am I understanding you correctly?

No -- I'm referring to synchronization on the heap itself (think new()/delete()). Multiple threads allocating memory need to synchronize in order to avoid trampling on each other. You can't just get rid of synchronization entirely.


Immutable maps that have efficient update semantics


Depending on the use case, of course. Sometimes you need a structure that's both associative and ordered, for example, in which case std::map is handy.


HAMT are not intrinsically ordered or sorted.


The comment obviously refers to std::map, which it mentions explicitly and which guarantees ordering.


But the entire discussion is about HAMT not std::map, the parent comment only talks about std::map as an example of performance issues in tree-based maps in asking how that relates to HAMT, it doesn't actually care a whit about std::map.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: