Interesting. Other than interfaces to non-Rust code, the big safety issues revolve around partial allocation of arrays. That's the machinery underneath the ability to grow a vector, and comes up with some other data structures.
It's not that hard to formalize safety in that area. You need some way to talk about an array being partially initialized. This requires some simple design-by-contract machinery - object invariants, basically. With some proof machinery, it should be possible to determine at compile time that a data manipulation is safe.
Rust already has the concept of raw memory space, which is treated as write-only. That takes care of initialization for non-array types. Partially initialized arrays require some additional annotation.
Suppose we have a primitive for talking about initialization:
isvalid(T, arr, lo, hi)
This means that array arr consists of valid values of type T is initialized and valid for subscripts in the range lo to hi inclusive.
Now there's a way to talk about partially initialized arrays:
This says that the array is only initialized up to len-1. The current requirement for "unsafe" comes from the lack of any way to talk about partial initialization. It's an expressive problem. Much of Rust's success comes from finding ways to talk about safety issues such as ownership within the language. This is an extension to that.
Then, a minimal theorem prover is needed, one which at least knows these theorems:
j < i ==> isvalid(T, a, i, j) // the null case
isvalid(T, a, i, j) and k >= i and k <= j ==> isvalid(T, a[k]) // element valid in range
isvalid(T, a, i, j) and isvalid(T, a[j+1]) ==> isvalid(T, a, i, j+1) // extend by adding new valid element
With that info, it's possible to check operations such as push, pop, and array growth, where an array is partially initialized in a controlled way. Array references can also be checked; this implies that access to an element index >= len is an error, and the compiler now knows this.
When the compiler knows something like that, it can optimize. If you're iterating a vec, from 0 to len-1, the subscript check required is
i < len
Within a FOR statement, the compiler already knows the range of i is within 0..len-1,
so the subscript check becomes
len-1 < len
which simplifies to True, eliminating the subscript check. That gets rid of the need for most unchecked subscripting.
This sort of proof machinery used to be exotic technology. (I was doing this kind of thing 35 years ago.[1]) Now it's well understood. It's quite possible to do this in a compiler today.
This book is the final product (and (not) master's thesis) of Alexis Beingessner (Gankro)'s internship at Mozilla this summer. He's gone a long way towards helping to define the guarantees that `unsafe` code must uphold and thereby auditing the soundness of the various `unsafe` bits in the stdlib.
I think the claims about linked lists there are rather oversold, especially the parts about functional languages. First, linked lists are common, useful, and efficient in languages such as Racket or Ocaml. I'd really want to see performance results before accepting your claims. Second, Haskell is really not limited in the way you say. You can write all the programs you'd write in an imperative language if you're willing to have IO in your type.
Regarding performance, one thing to keep in mind is that all of the languages you mention use GC, which can help with cache locality issues. In Rust, of course, the situation is different. Gankro also addresses the common and useful idioms you see in most functional languages through corresponding vector-centric functions on the Rust side. (Scala is perhaps a reasonable midpoint.)
That all said, I too would love to see some substantiation of the performance intuitions here.
I think one of the problems here is that the discussion of how bad linked lists are switches back and forth between taking about Rust specifically and linked lists generally.
Also, I think performance claims, especially ones as strong as the ones in that document, should be avoided without substantiation.
It's still a bit of a draft document (still missing the last chapter, for one). Happy to change stuff based on proof. That's one of the more dubious ones for sure.
Academic. The Mozilla supervisors (i.e. me) were very enthusiastic about the Rustonomicon being the thesis. But I think that what Gankro's working on now is even more exciting!
Am I right in assuming, that the name is a reference to the Normanomicon from Fable II and Fable III? It's just priceless.
Thanks to all involved with Rust (also in light of the 1.3 release). From what I can tell (passively reading about language decisions and seeing the responsiveness of all involved on various communication channels), it's a great lesson in building a community while simultaneously creating a fun programming language.
I hope, I'll come to use it at work at some point.
P.S. First comment ever on the internet anywhere, hope I didn't violate any guidelines.
The name is actually an alteration of "The Necronomicon", which is from the works of H.P. Lovecraft (the C'thulhu guy). The quote at the beginning of the book is also an altered Lovecraft quote (which, historically, are quite popular as easter eggs in the Rust community). In the literature, it is an evil and otherworldly book which eventually drives its reader insane, which should give you some idea as to the content of the OP. :)
Ah, obviously I have some reading up to do, thanks. :)
The one from Fable actually seems to be an alteration as well, giving me about the same idea about the content as knowing the actual reference would have. Fun stuff.
I think it's primarily a reference to the Necronomicon, a fictional book within H.P. Lovecraft's stories that tells about the dark magic of the Great Old Ones and how to summon them. https://en.wikipedia.org/wiki/Necronomicon
There are other references to the -nomicon name, like Neal Stephenson's Cryptonomicon.
A really interesting read. Its worth mentioning how little unsafe code you need in Rust. Even implementing strange data structures normally only requires 5-10 lines just to handle the manual memory management.
It's not that hard to formalize safety in that area. You need some way to talk about an array being partially initialized. This requires some simple design-by-contract machinery - object invariants, basically. With some proof machinery, it should be possible to determine at compile time that a data manipulation is safe.
Rust already has the concept of raw memory space, which is treated as write-only. That takes care of initialization for non-array types. Partially initialized arrays require some additional annotation.
Suppose we have a primitive for talking about initialization:
This means that array arr consists of valid values of type T is initialized and valid for subscripts in the range lo to hi inclusive.Now there's a way to talk about partially initialized arrays:
This says that the array is only initialized up to len-1. The current requirement for "unsafe" comes from the lack of any way to talk about partial initialization. It's an expressive problem. Much of Rust's success comes from finding ways to talk about safety issues such as ownership within the language. This is an extension to that.Then, a minimal theorem prover is needed, one which at least knows these theorems:
With that info, it's possible to check operations such as push, pop, and array growth, where an array is partially initialized in a controlled way. Array references can also be checked; this implies that access to an element index >= len is an error, and the compiler now knows this.When the compiler knows something like that, it can optimize. If you're iterating a vec, from 0 to len-1, the subscript check required is
Within a FOR statement, the compiler already knows the range of i is within 0..len-1, so the subscript check becomes which simplifies to True, eliminating the subscript check. That gets rid of the need for most unchecked subscripting.This sort of proof machinery used to be exotic technology. (I was doing this kind of thing 35 years ago.[1]) Now it's well understood. It's quite possible to do this in a compiler today.
[1] http://www.animats.com/papers/verifier/verifiermanual.pdf