Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Uthash – C macros for hash tables and more (github.com/troydhanson)
78 points by maydemir on March 26, 2022 | hide | past | favorite | 28 comments




And this one may be even "better" :) https://github.com/tylov/STC


Am I the only one confused by the C++ referential naming / the only C-but-not-C++ developer here? I have no idea what a shared_ptr is, or how a priority_queue behaves (is that a heap?)


The C++ envy does seem a bit peculiar, yes, but more so because my dissatisfaction with doing simple data structures in C is that I’m always doing a thousand twisty variations of the damn things, all just a little different, (usually to make allocations come out like I want them,) and it doesn’t feel like an STL-alike would solve that.

A priority queue is usually implemented as a heap (binary or otherwise) and frequently called that as well, but in more precise usage the gadget (abstract data type) you do INSERT and REMOVE-MIN to is the “priority queue”, and the implementation device of a(n implicit or explicit) tree where the value in any node is smaller than the values in its children is the “heap”. Roughly, “priority queue” is to “heap” like “ordered dictionary / map” is to “search tree”.

(I can’t think of any data structures that use heaps not for a priority queue except for treaps, but then my algorithm-fu is not particularly strong.)

As for shared_ptr, it’s just a refcounted pointer, but that bit of terminology is STL-specific so there might not be much value in learning it.


Just to clarify, the intent in my comment is that a whole bunch of the terms used there aren't "fluid associations" for me as a C developer. There's no issue looking them up, but they aren't what I would choose to name these containers.

(e.g. I would just call it "refcnt_ptr" and "heap" [if it is one], respectively)


shared_ptr is more special than a simple ref count though, it performs some level of mandatory thread synchronization for example; it's a C++-specific concept.

The STL was designed by A. Stepanov, who I suspect is simply too intelligent for this world. I have no idea what he's talking about half the time, but the rest is brilliant. Long story short; it's trying to solve a much bigger problem using a tool that wasn't built for it.


if stepanov would be too intelligent for this world, he would have designed proper iterators, not iterators which can run away. safe iterators are a thing, just not in the STL.

also he could have designed proper sets and usets. nobody really needs red-black trees when you have b- trees. likewise nobody needs linked-list usets, when you have open addressing or even swisstables.

you cannot copy from one container to another type. even the simpliest self-respecting libs support that.

you still have no string library, only memory buffers. many, many misdesigns in C++. just look at it, you'll get eyeblead


Some of that is limitations in the medium, he pushed C++ pretty hard outside its comfort zone implementing the STL.

Some of it is conventions of the medium, like the iterators; it's just the C++ way.

You copy (or move, there's a need for the distinction in C++) from ranges to iterators, it's just a more flexible perspective.

C++ has a string library, you may not like it but I'm pretty sure it exists.


indeed. auto, apply and threads!


Awesome. Thanks for sharing


There is Khash too


Don’t use macros for this. The better way to do it is to have headers that are designed to be included multiple times, each time with a different type. You end up with code that is possible to step through in a debugger instead of a macro-expanded mess.


I really wish there was an _Include C preprocessor instruction for this purpose (like there is _Pragma)


See also David Hanson's book C Interfaces and Implementations [http://drh.github.io/cii/toc.html].


This is probably the slowest commonly used hash table library.


https://github.com/tezc/sc/tree/master/map

For those who are interested in faster hashmaps, I tried bunch of hashmaps and this one performs better than others. This is for C. Maybe C++ has better hashmaps.


std::unordered_map is in the similar ballpark. linked lists, oom checks, integer key support.


In a "hold my beer"-moment, I've ported GNU Trove's hash map to use memory mapped files instead of allocated memory. With good reason, granted, and it's not commonly used, but compared to almost anything else, it is fairly slow.


That sounds pretty interesting (in a sick way ;) ) Are you able to elaborate a bit more what you were doing?


I'm using it for the lexicon in my search engine.

I basically need a mapping from arbitrary short strings to unique integer identifiers, with a cardinality in the billions. Too much to keep in memory, most db solutions keel over as well.


for this I would use CHM, ordered minimal perfect hashes with external mmap'ed keys. still constant lookup, just the position needs to be stored in the value also.


Problem is I don't know the items beforehand, instead I need to build the hash as I encounter new pages. I don't know any way of reconcile that with a minimal perfect hash.


ok, with trillions keys creating a CHM MPH is risky, and needs a few secs. then your external hash is good enough.


And std::unordered_map is then the slow version in C++ compared to absl::flat_hash_map or folly::F14FastMap


Care to elaborate a bit?


I know this from my time doing Leetcode with C. I prefer https://github.com/rxi/map for its API


Why not use the linux kernel's statically-sized hashtable code?

https://pastebin.ubuntu.com/p/rvT9yKTqnP/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: