Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tree.h in OpenBSD: dependency-free intrusive binary tree (2002) (github.com/openbsd)
112 points by attractivechaos on April 9, 2021 | hide | past | favorite | 36 comments


For comparison, AVL tress are widely used in illumos. There are some nice big doc comments in these files as well.

https://github.com/illumos/illumos-gate/blob/master/usr/src/...

https://github.com/illumos/illumos-gate/blob/master/usr/src/...


It also represents a stable API and ABI, both in the kernel...

https://illumos.org/man/9F/avl

... and in user mode, via libavl:

https://illumos.org/man/3LIB/libavl

Our debugger, MDB, is able to use the offsets and pointers stored in the tree to provide generic walks through any AVL tree in a live system, core file, or crash dump.


The linux kernel also has some nice comments in its red black tree implementation.

https://github.com/torvalds/linux/blob/master/lib/rbtree.c


What you are showing is an implementation without long macros. It is cleaner and simpler. However, it is slower as compilers often can't inline user-provided comparison functions (though modern compilers seem to do a better job). For integers, extra comparison function calls are costly. With tree.h, you can implement the comparison in a macro. Then compilers will automatically inline the comparison. That is the most efficient way to implement generic binary search trees.


I don't think this matters too much. It's the overhead of an added call/return that is hot in the cache and the CPU has enough cycles of twiddling its thumbs while it's grabbing the next tree level from a lower cache or main memory...


You are right. There are lots of more operations in addition to comparisons. Just done a microbenchmark. The difference between macro- and function-based comparisons is largely indistinguishable.


I wonder if anyone did a comprehensive benchmark comparing all of these


For whatever it's worth, I did effectively benchmark the illumos AVL implementation against Rust's B-tree implementation[0], though that was certainly not the intent! Benchmarking the relative BSTs would be interesting, but (absent some small cycle count differences in degenerate cases like a single node) one very much expects BST performance to be dominated by memory behavior -- and the memory footprint (and cache behavior) of the relative implementations is much more likely to affect relative performance than other factors. Of course, systems can surprise, so hopefully someone does the experiment!

[0] http://dtrace.org/blogs/bmc/2018/09/28/the-relative-performa...


There shouldn't be much difference, particularly between the RB and AVL trees. The limiting factor is memory access latency in all cases since the CPU can't effectively prefetch tree node accesses for the next level down.


While both RB and AVL trees grow at O(lg N) height, the constant for AVL trees is smaller. This means that for a large AVL tree it might be 1 or 2 layers shorter than the equivalent RB tree. This comes at the expense of a higher constant for insertion and deletion.

In certain contexts saving a node transversal will have a meaningful impact on performance.


These are the most well known tree types in CS and banchmark results are unlikely to differ from what is widely published and written about the algorithms. Benchmark winner would depend on what usage pattern and metric you are interested in. Binary trees are simple, AVL is a bit faster to search and slower to update than RB.


Typesafe wrapper around the same code:

https://github.com/FRRouting/frr/blob/master/lib/typerb.h

https://github.com/FRRouting/frr/blob/master/lib/typerb.c

Typesafe wrapper with identical API around linked lists, heap & hash:

https://github.com/FRRouting/frr/blob/master/lib/typesafe.h

https://github.com/FRRouting/frr/blob/master/lib/typesafe.c

Docs:

http://docs.frrouting.org/projects/dev-guide/en/latest/lists...

Typesafe = each tree/list/... uses its own struct and the APIs directly use pointers to the actual member data structures rather than the tree implementation structs.

Disclaimer: I wrote these (only the wrapper in case of the RB tree, the entire thing for the others)


For the curious: The author of this, Niels Provos, is now the head of security at Stripe. He wrote it when he was a PhD student at the University of Michigan.


I had the pleasure of meeting Niels during a short stint at Google. Didn’t help my imposter syndrome xD.


The URL and (2002) make it seem like it hasn't be updated since, but the latest commit to it is from october.

These macros, along with the list/queue macros in queue.h are super useful. I've inclduded them in several projects and it's super simple to just drop in the header file since it's BSD licensed


But is it cache-friendly? Linked structures are great and all, but it seems that nowadays even in-memory B-trees might be better than overly-linked structures for the same reasons B-trees were efficient for on-disk storage (less moving around = better locality).

I suppose some cache-friendliness may also come from the chosen memory allocator for allocating new nodes. If objects are all the same size, a reasonably good slab allocator might continue to provide cache-friendliness.

I think I need an adult! Can someone help me sort this out?


In-memory B-tree is faster and uses less memory for small keys [1]. However, intrusive binary search trees (BSTs) are much more flexible and are the right choice for implementing kernels and malloc(). In addition, intrusive BSTs are often used to organize memory blocks scattered around. In-memory B-tree would lose its cache-efficiency in this case anyway.

[1] https://attractivechaos.wordpress.com/2008/09/24/b-tree-vs-b...


Not especially cache efficient compared to trees with higher branching factor, true. But intrusive data structures can be manipulated without memory allocation, such as when holding locks. That is the main reason they are useful in place of B-trees.


A characteristic of intrusive data structures is generally good cache friendliness. In this context "intrusive" means that the data structure is embedded with the data that's being stores.

In C terms as in this case that means that the pointers required of individual nodes aren't allocated in separate structs and are instead embedded in one struct that also includes the payload.

This means that the cache behavior is improved, as a given node is stored in a single location. Once you access a node the associated data is already in the cache, instead of having to be fetched via a separate pointer dereference.


Related, maybe interesting to this crowd: In FreeBSD, the RB tree implementation has been replaced with a "weak AVL" tree implementation:

https://github.com/freebsd/freebsd-src/commit/504ba65c8c9f7a...

> Weak AVL (Wavl) trees sit between AVL and red-black trees in terms of how strictly balance is enforced. They have the stricter balance of AVL trees as the tree is built - a wavl tree is an AVL tree until the first deletion. Once removals start, wavl trees are lazier about rebalancing than AVL trees, so that removals can be fast, but the balance of the tree can decay to that of a red-black tree. Subsequent insertions can push balance back toward the stricter AVL conditions.

> ...

> Testing has shown that for the cases where red-black trees do worst, wavl trees better balance leads to faster lookups, so that if lookups outnumber insertions by a nontrivial amount, lookup time saved exceeds the extra cost of balancing.


Beware of splay trees being modified on read access.

Meaning that in a threaded environment merely traversing or searching the tree will cause hidden performance issues due to the cache invalidation. Issues that don't exist with other balanced tree types.

That is, splay trees are a distinctly specialized binary tree type. Use with due care.


I have a fairly comprehensible left leaning rb tree implementation that I've been porting along my travels.

I keep it around for situations where a binary searched array isn't doable or good enough, but I still want ordered set functionality that isn't in the stdlib.

https://github.com/codr7/whirlog/blob/main/rb.lisp

https://github.com/codr7/libcodr7/blob/master/source/codr7/t...


Why is the code written using defines? I haven’t read much C and I’ve never seen C written this way before.


Looks like they are emulating generics (what would be accomplished with templates in C++).


tree.h implements intrusive data structures using code generation. Even in C++, to support an object being in multiple containers intrusively the templates get quite complex because you need to parameterize on the member field name. Some older implementation simply fallback to external code generation, including use of macros.

C application code using these data structures, particularly of the finely honed interfaces defined by queue.h and tree.h in BSD, tend to be extremely clear and simple. And at least the implementation of intrusive lists, particularly queue.h, is also quite straightforward, even considering the backslash-escaped multiline macros. (tree.h is admittedly hairier, but it's implementing an intrusive, non-recursive, type-safe red-black tree. Multiline macros are the least of your problems.)

None of the C-like languages make this easy at the type level. I'm not familiar with any statically-typed functional languages; but I doubt it's any easier because intrusive, inside-out struct/record object relations tend to be even more foreign. By resorting to rudimentary string interpolation (i.e. C macros), it's counter intuitively quite easy to implement this stuff in C. I suppose it's also easier in s-expression based languages, but s-expressions lend themselves to dynamic code generation.


This is indeed one of the two ways to get "generics" in C.

The other way is to set up a set of #defines before #including a file, so the #defines rename everything in the include. (The include can also be included multiple times in this pattern.) The latter seems to be more rare.


C11 added _Generic for overloading by type.


... which you use inside of a #define :)

Like,

  #define foo_generic(x) _Generic(x, ...)
[what I intend to say with this post: _Generic in no way replaces #define, in fact it is designed to work with it.]


This is not an uncommon technique to deal with repetitive code snippets.

Another example of it, off the top of my head, is sometimes the implementation of FFT butterflies.


My C skills aren't particularly great. I can't identify the syntax of "name##" in the macros and function definitions. Can someone tell me what this is called?



I had forgotten how much I like C an admire its combination of simplicity and expressivity. Thanks for reminding me :)


And? rb and splay tree, not particularly fast.

Normally you would use a btree, just when you need pointer and iterator stability you'd need these slower pointer chasing ones. This is not even documented.


tree.h is production-proven and available under a very liberal license (ISC). tree.h is also fairly extensively documented (e.g. https://man.openbsd.org/tree.3); the only real footgun the API allows is documented under "NOTES".

You are, of course, right that pointer-heavy data structures do not perform optimally on modern hardware. But I've still reached for tree.h quite a few times, because it gives you a very flexible data structure that Just Works with little fuss and minimal dependencies.


The link needs some TLC





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: