A little late to the party... > The slow cases usually tend to be trying to find...

A little late to the party...

> The slow cases usually tend to be trying to find wider slots and skipping through many smaller slots. There are often a lot of bookings up to the first wide enough slot too though, so it's a little bit of both.

This right here is the first critical issue. It doesn't matter how fast the data structure is if you're then going to do a linear search over the available slots to find the next one wide enough.

A good data structure for doing this efficiently is a range tree (https://en.wikipedia.org/wiki/Range_tree), where in each node you store the maximum width of the available slots covered by the node. That lets you find the first available slot after some time t and wider than some duration w in O(log(n)), where n is the number of available slots. (It might not be obvious if you're not familiar with range trees; I'm happy to provide more details.)

For the implementation, there are a few options:

A. The simplest is to use a complete range tree, where the leaf nodes are all the possible times. You can lazily create nodes to avoid massive memory usage for sparse ranges. The advantage is that you don't need to do any balancing; the disadvantage is that the time complexity is O(long(T)) where T is the total number of possible times; so it's going to be a little slower on very sparse datasets.

B. The O(log(n)) implementation that's being called for is a self-balancing binary tree (e.g. red-black), modified to also store the maximums in each node. Unfortunately most libraries don't give you low-level control over the tree's nodes, so you'd likely need to copy the code and modify it (or implement the self-balancing tree from scratch).

C. If those are still too slow (and you're certain that your implementation is really O(log(n))), you'll need to improve the cache efficiency. That basically comes down to using larger nodes. The obvious approach is to switch to a B-tree; but you could also keep your nodes binary and just change the way they are allocated to emulate the memory layout of a B-tree (this is simpler but slower because it still uses lots of pointers). Another idea is to replace the first few layers of the tree with a hash table (or a simple array if your data is dense enough). Likewise you can replace the leaf nodes with small arrays.