Uthash – C macros for hash tables and more

eqvinox · on March 26, 2022

Same kind of thing, but typesafe:

https://github.com/FRRouting/frr/blob/master/lib/typesafe.h

https://github.com/FRRouting/frr/blob/master/doc/developer/l...

(disclaimer: it's my work.)

rurban · on March 26, 2022

mine is better :)

https://github.com/rurban/ctl/

florianist · on March 26, 2022

And this one may be even "better" :) https://github.com/tylov/STC

eqvinox · on March 26, 2022

Am I the only one confused by the C++ referential naming / the only C-but-not-C++ developer here? I have no idea what a shared_ptr is, or how a priority_queue behaves (is that a heap?)

mananaysiempre · on March 26, 2022

The C++ envy does seem a bit peculiar, yes, but more so because my dissatisfaction with doing simple data structures in C is that I’m always doing a thousand twisty variations of the damn things, all just a little different, (usually to make allocations come out like I want them,) and it doesn’t feel like an STL-alike would solve that.

A priority queue is usually implemented as a heap (binary or otherwise) and frequently called that as well, but in more precise usage the gadget (abstract data type) you do INSERT and REMOVE-MIN to is the “priority queue”, and the implementation device of a(n implicit or explicit) tree where the value in any node is smaller than the values in its children is the “heap”. Roughly, “priority queue” is to “heap” like “ordered dictionary / map” is to “search tree”.

(I can’t think of any data structures that use heaps not for a priority queue except for treaps, but then my algorithm-fu is not particularly strong.)

As for shared_ptr, it’s just a refcounted pointer, but that bit of terminology is STL-specific so there might not be much value in learning it.

eqvinox · on March 26, 2022

Just to clarify, the intent in my comment is that a whole bunch of the terms used there aren't "fluid associations" for me as a C developer. There's no issue looking them up, but they aren't what I would choose to name these containers.

(e.g. I would just call it "refcnt_ptr" and "heap" [if it is one], respectively)

codr7 · on March 26, 2022

shared_ptr is more special than a simple ref count though, it performs some level of mandatory thread synchronization for example; it's a C++-specific concept.

The STL was designed by A. Stepanov, who I suspect is simply too intelligent for this world. I have no idea what he's talking about half the time, but the rest is brilliant. Long story short; it's trying to solve a much bigger problem using a tool that wasn't built for it.

rurban · on March 26, 2022

if stepanov would be too intelligent for this world, he would have designed proper iterators, not iterators which can run away. safe iterators are a thing, just not in the STL.

also he could have designed proper sets and usets. nobody really needs red-black trees when you have b- trees. likewise nobody needs linked-list usets, when you have open addressing or even swisstables.

you cannot copy from one container to another type. even the simpliest self-respecting libs support that.

you still have no string library, only memory buffers. many, many misdesigns in C++. just look at it, you'll get eyeblead

codr7 · on March 27, 2022

Some of that is limitations in the medium, he pushed C++ pretty hard outside its comfort zone implementing the STL.

Some of it is conventions of the medium, like the iterators; it's just the C++ way.

You copy (or move, there's a need for the distinction in C++) from ranges to iterators, it's just a more flexible perspective.

C++ has a string library, you may not like it but I'm pretty sure it exists.

rurban · on March 26, 2022

indeed. auto, apply and threads!

amir734jj · on March 26, 2022

Awesome. Thanks for sharing

kzrdude · on March 26, 2022

There is Khash too

david2ndaccount · on March 26, 2022

Don’t use macros for this. The better way to do it is to have headers that are designed to be included multiple times, each time with a different type. You end up with code that is possible to step through in a debugger instead of a macro-expanded mess.

eqvinox · on March 26, 2022

I really wish there was an _Include C preprocessor instruction for this purpose (like there is _Pragma)

mci · on March 26, 2022

See also David Hanson's book C Interfaces and Implementations [http://drh.github.io/cii/toc.html].

macjc · on March 26, 2022

This is probably the slowest commonly used hash table library.

tirrex · on March 26, 2022

https://github.com/tezc/sc/tree/master/map

For those who are interested in faster hashmaps, I tried bunch of hashmaps and this one performs better than others. This is for C. Maybe C++ has better hashmaps.

rurban · on March 26, 2022

std::unordered_map is in the similar ballpark. linked lists, oom checks, integer key support.

marginalia_nu · on March 26, 2022

In a "hold my beer"-moment, I've ported GNU Trove's hash map to use memory mapped files instead of allocated memory. With good reason, granted, and it's not commonly used, but compared to almost anything else, it is fairly slow.

spicybright · on March 26, 2022

That sounds pretty interesting (in a sick way ;) ) Are you able to elaborate a bit more what you were doing?

marginalia_nu · on March 26, 2022

I'm using it for the lexicon in my search engine.

I basically need a mapping from arbitrary short strings to unique integer identifiers, with a cardinality in the billions. Too much to keep in memory, most db solutions keel over as well.

rurban · on March 26, 2022

for this I would use CHM, ordered minimal perfect hashes with external mmap'ed keys. still constant lookup, just the position needs to be stored in the value also.

marginalia_nu · on March 26, 2022

Problem is I don't know the items beforehand, instead I need to build the hash as I encounter new pages. I don't know any way of reconcile that with a minimal perfect hash.

rurban · on March 26, 2022

ok, with trillions keys creating a CHM MPH is risky, and needs a few secs. then your external hash is good enough.

kllrnohj · on March 26, 2022

And std::unordered_map is then the slow version in C++ compared to absl::flat_hash_map or folly::F14FastMap

hambaleeyo · on March 26, 2022

Care to elaborate a bit?

toastercat · on March 26, 2022

I know this from my time doing Leetcode with C. I prefer https://github.com/rxi/map for its API

aleden · on March 26, 2022

Why not use the linux kernel's statically-sized hashtable code?

https://pastebin.ubuntu.com/p/rvT9yKTqnP/