The Rustonomicon: The Dark Arts of Advanced and Unsafe Rust

Animats · on Sept 18, 2015

Interesting. Other than interfaces to non-Rust code, the big safety issues revolve around partial allocation of arrays. That's the machinery underneath the ability to grow a vector, and comes up with some other data structures.

It's not that hard to formalize safety in that area. You need some way to talk about an array being partially initialized. This requires some simple design-by-contract machinery - object invariants, basically. With some proof machinery, it should be possible to determine at compile time that a data manipulation is safe.

Rust already has the concept of raw memory space, which is treated as write-only. That takes care of initialization for non-array types. Partially initialized arrays require some additional annotation.

Suppose we have a primitive for talking about initialization:

    isvalid(T, arr, lo, hi)

This means that array arr consists of valid values of type T is initialized and valid for subscripts in the range lo to hi inclusive.

Now there's a way to talk about partially initialized arrays:

    pub struct Vec<T> {
        ptr: *mut T,   // oversimplifying here
        cap: usize,
        len: usize,
        invariant isvalid(T, *ptr, 0, len-1)
    }

This says that the array is only initialized up to len-1. The current requirement for "unsafe" comes from the lack of any way to talk about partial initialization. It's an expressive problem. Much of Rust's success comes from finding ways to talk about safety issues such as ownership within the language. This is an extension to that.

Then, a minimal theorem prover is needed, one which at least knows these theorems:

    j < i ==> isvalid(T, a, i, j) // the null case
    isvalid(T, a, i, j) and k >= i and k <= j ==> isvalid(T, a[k]) // element valid in range
    isvalid(T, a, i, j) and isvalid(T, a[j+1]) ==> isvalid(T, a, i, j+1) // extend by adding new valid element

With that info, it's possible to check operations such as push, pop, and array growth, where an array is partially initialized in a controlled way. Array references can also be checked; this implies that access to an element index >= len is an error, and the compiler now knows this.

When the compiler knows something like that, it can optimize. If you're iterating a vec, from 0 to len-1, the subscript check required is

   i < len

Within a FOR statement, the compiler already knows the range of i is within 0..len-1, so the subscript check becomes

   len-1 < len

which simplifies to True, eliminating the subscript check. That gets rid of the need for most unchecked subscripting.

This sort of proof machinery used to be exotic technology. (I was doing this kind of thing 35 years ago.[1]) Now it's well understood. It's quite possible to do this in a compiler today.

[1] http://www.animats.com/papers/verifier/verifiermanual.pdf

kibwen · on Sept 17, 2015

This book is the final product (and (not) master's thesis) of Alexis Beingessner (Gankro)'s internship at Mozilla this summer. He's gone a long way towards helping to define the guarantees that `unsafe` code must uphold and thereby auditing the soundness of the various `unsafe` bits in the stdlib.

Gankro · on Sept 17, 2015

Not my thesis by verdict of my supervisors -- I am now on my third attempt at a thesis.

(attempt 2 was http://cglab.ca/~abeinges/blah/too-many-lists/book/)

samth · on Sept 18, 2015

I think the claims about linked lists there are rather oversold, especially the parts about functional languages. First, linked lists are common, useful, and efficient in languages such as Racket or Ocaml. I'd really want to see performance results before accepting your claims. Second, Haskell is really not limited in the way you say. You can write all the programs you'd write in an imperative language if you're willing to have IO in your type.

aturon · on Sept 18, 2015

Regarding performance, one thing to keep in mind is that all of the languages you mention use GC, which can help with cache locality issues. In Rust, of course, the situation is different. Gankro also addresses the common and useful idioms you see in most functional languages through corresponding vector-centric functions on the Rust side. (Scala is perhaps a reasonable midpoint.)

That all said, I too would love to see some substantiation of the performance intuitions here.

samth · on Sept 18, 2015

I think one of the problems here is that the discussion of how bad linked lists are switches back and forth between taking about Rust specifically and linked lists generally.

Also, I think performance claims, especially ones as strong as the ones in that document, should be avoided without substantiation.

Gankro · on Sept 18, 2015

It's still a bit of a draft document (still missing the last chapter, for one). Happy to change stuff based on proof. That's one of the more dubious ones for sure.

aturon · on Sept 18, 2015

He does say "There are usually escape hatches, but at some limit you're just writing procedural code again."

zeckalpha · on Sept 18, 2015

Academic or Mozilla supervisors?

aturon · on Sept 18, 2015

Academic. The Mozilla supervisors (i.e. me) were very enthusiastic about the Rustonomicon being the thesis. But I think that what Gankro's working on now is even more exciting!

But I'll let him talk about that...

noobymatze · on Sept 17, 2015

Am I right in assuming, that the name is a reference to the Normanomicon from Fable II and Fable III? It's just priceless.

Thanks to all involved with Rust (also in light of the 1.3 release). From what I can tell (passively reading about language decisions and seeing the responsiveness of all involved on various communication channels), it's a great lesson in building a community while simultaneously creating a fun programming language.

I hope, I'll come to use it at work at some point.

P.S. First comment ever on the internet anywhere, hope I didn't violate any guidelines.

kibwen · on Sept 17, 2015

The name is actually an alteration of "The Necronomicon", which is from the works of H.P. Lovecraft (the C'thulhu guy). The quote at the beginning of the book is also an altered Lovecraft quote (which, historically, are quite popular as easter eggs in the Rust community). In the literature, it is an evil and otherworldly book which eventually drives its reader insane, which should give you some idea as to the content of the OP. :)

Also, welcome to the internet!

noobymatze · on Sept 17, 2015

Ah, obviously I have some reading up to do, thanks. :)

The one from Fable actually seems to be an alteration as well, giving me about the same idea about the content as knowing the actual reference would have. Fun stuff.

geofft · on Sept 17, 2015

I think it's primarily a reference to the Necronomicon, a fictional book within H.P. Lovecraft's stories that tells about the dark magic of the Great Old Ones and how to summon them. https://en.wikipedia.org/wiki/Necronomicon

There are other references to the -nomicon name, like Neal Stephenson's Cryptonomicon.

mikeash · on Sept 17, 2015

What was life like before you got sucked into this awful pit? Please reply quickly, before you invariably forget.

wldcordeiro · on Sept 17, 2015

It and the Normanomicon are references to the Necronomicon as others have mentioned.

Gankro · on Sept 17, 2015

Primary author here. AMA how Rust is unsound.

infinity0 · on Sept 18, 2015

I really like your writing style, I wish more people did this. It makes it fun to read, almost like a story. :)

valarauca1 · on Sept 17, 2015

A really interesting read. Its worth mentioning how little unsafe code you need in Rust. Even implementing strange data structures normally only requires 5-10 lines just to handle the manual memory management.