> The compiler has already the concept of scope and variables existing at least in the optimizer.
My gut feeling is that while the high-level concepts might share a name I'm not actually sure if they're similar enough for useful transfer? The optimizer is working on a very different representation at a later stage of the compilation process, so I'm a bit skeptical about the level of similarity and/or transferability when you get down to details. I guess a more concrete example might be like comparing type inference with optimizer value range analysis - both will analyze a CFG, but beyond that they're working on different-enough representations that transforming work on the latter to useful work on the former seems unlikely to me (though I'm a nobody, so take that with an appropriate grain of salt).
For example, consider work on improved lifetime analysis in Clang [0], which seems to be discussing a from-scratch implementation based on concepts from Polonius and doesn't seem to reference anything from LLVM. And more generally, the fact that neither GCC nor Clang appear to have discussed reusing concepts from their optimizer passes to recreate the borrow checker or borrow checker-like functionality makes it seem more likely to me that there's some fundamental distinction and/or additional considerations that make such a project difficult.
> Ownership tracking programs for C have been existing for 20 years
Could you give some examples? Not sure I've heard of anything that would fill the same niche as the borrow checker.
> so it can't be too hard to integrate it in the compiler, once it has been implemented for one language.
Sorry, I'm getting a bit confused here. When you say "integrate it in the compiler", by "it" do you mean the above mentioned "ownership tracking programs for C", or do you mean features implemented in the GCC Rust frontend?
In any case, as far as the borrow checker goes gccrs is currently planning on reusing rustc's borrowck implementation so that's going to be a bit of a hurdle to integrating similar functionality into other frontends. I don't know whether they plan on eventually writing an independent borrow checking implementation. Not sure if you had other features in mind, either.
You're probably right, I was saying it just can be transferred, with a large "just".
> the fact that neither GCC nor Clang appear to have discussed reusing concepts from their optimizer passes to recreate the borrow checker or borrow checker-like functionality
There is however a commitment from GCC, that every UB the compiler exploits must be reported by -fanalyzer, otherwise it's a compiler bug.
Where I got to know the concept is SPlint: http://splint.org/ Last I checked the development stopped in 2010 and the implementation shipped in Debian was buggy, but there seams to be newer development on Github. The initial commit is from 2000-06-13.
I've given up on using it due to bugs, but I do use the annotation to specify, among other things, ownership semantics in C.
Any C API already documents ownership semantics, otherwise its underspecified and can't be used. It's just specified in prose instead of code. The semantics are however more often more complicated then a simple owning pointer. A common thing is for example, that whether ownership was transferred depends on the return value of the called function.
There are C APIs out there without the necessary documentation, but you can't actually use them, without either introducing leaks, use-after-free bugs or reading the source code.
> Sorry, I'm getting a bit confused here. When you say "integrate it in the compiler", by "it" do you mean the above mentioned "ownership tracking programs for C", or do you mean features implemented in the GCC Rust frontend?
"it" means ownership tracking implementation intended for Rust in the frontend.
> I was saying it just can be transferred, with a large "just".
I'm not really convinced it can be transferred at all, but I'm not convinced it can't be either. The "just" feels so large as effectively
> There is however a commitment from GCC, that every UB the compiler exploits must be reported by -fanalyzer, otherwise it's a compiler bug.
Huh, don't think I've heard that commitment before. Do you mean that the GCC devs intend for -fanalyzer to (eventually?) guarantee catching all exploitable UB (which would be... ambitious, to say the least), or that -fanalyzer is a best-effort analysis? The docs currently state the latter more or less ("It is neither sound nor complete: it can have false positives and false negatives.") but that doesn't necessarily rule out attempts to make it so later (though that feels like it should run into Rice's theorem and/or false positive rate issues and/or require code alterations).
The closest thing I heard of is something about Clang/LLVM aiming to catch all the UB it exploits using sanitizers, but that's done at runtime so it's a lot easier to be precise about what you catch.
Ah. I suppose that counts, though I would probably describe Frama-C as more than just an ownership tracking program given its other capabilities. I guess it technically could fill the same niche as the borrow checker, though given its capabilities and what's needed to use it I think there's probably not a lot of practical overlap in use cases.
Haven't heard of that one before. It does look like it can provide (some?) similar capabilities, though perhaps not to the same level of soundness as what the borrow checker provides. From one of the papers linked on the website [0]:
> In real programs it is sometimes necessary to use weaker assumptions about memory use. The `owned` annotation denotes a reference with an obligation to release storage. Unlike `only`, however, other external references (marked with `dependent` annotations) may share this object. It is up to the programmer to ensure that the lifetime of a `dependent `reference is contained within the lifetime of the corresponding `owned` reference.
It's also not quite clear to me whether Splint can cover more "interesting" borrow checker cases like those involving named lifetimes or view structs, but given this is the first time I've heard of it I definitely don't have the experience or knowledge to say for sure.
> There are C APIs out there without the necessary documentation, but you can't actually use them, without either introducing leaks, use-after-free bugs or reading the source code.
Sure, and that's what makes analysis so practically difficult. Whole program analysis doesn't scale well, standard C doesn't have enough information for cheap inference, etc., etc.
> "it" means ownership tracking implementation intended for Rust in the frontend.
In that case I'm not sure if gccrs would provide the implementation you hope for since they currently plan on integrating rustc's borrow checker implementation as-is. I'm not aware of a desire to write an independent borrow checker implementation at the moment as well.
> Huh, don't think I've heard that commitment before. Do you mean that the GCC devs intend for -fanalyzer to (eventually?) guarantee catching all exploitable UB (which would be... ambitious, to say the least), or that -fanalyzer is a best-effort analysis? The docs currently state the latter more or less ("It is neither sound nor complete: it can have false positives and false negatives.")
Both actually. Any UB exploits not catched by -fanalyzer would need to be disabled. However I can't find a reference to this, so maybe my memory is deceiving me.
When writing Frama-C what I was thinking of was actually PVS-Studio (https://pvs-studio.com/), as this can also be used by students. It's also more of a standalone linter.
>> It is up to the programmer to ensure that the lifetime of a `dependent` reference is contained within the lifetime of the corresponding `owned` reference.
Yes, this is an escape hatch, when the pointer shenanigans can't be fully described. But I heard Rust also has those. If you want your program to be described by ownership semantics, you will make use of this less and less.
> named lifetimes support
I don't know enough Rust, but from what I read at https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html, yes it does. Specifying that the 'lifetime' of the return value corresponds to a parameter happens with the `returned` annotation. The cool thing in SPlint is that you can describe the lifetime of param1.foo.bar[0..42] . It also has several types of 'lifetimes': allocated, readable and writable which is useful to represent uninitialized memory, meaning after a function call some stuff is newly uninitialized, that before the call wasn't. You also can combine this with parameters, so you can say that param1.baz[0..param2] is writable and param1.baz[0..param3] is readable and also that readable param1.baz[0..X] and writable param1.baz[0..Y] always means that X > Y.
It doesn't use the term 'lifetime', but talks about owned, allocated, initialized, readable and writable memory. In addition it also supports adding other properties, so much more then 'lifetimes' can be tracked. The manual shows as an example how it can be used to track variables that are tainted by user input (10.1). What I think is missing though, are conditionals on the return value.
How much of these features can be written in Rust? (Honest question)
> view struct support
I don't know really what these are. Maybe I already described that above?
That would be quite the surprise to me. Quite unfortunate that you can't find a source given what the -fanalyzer docs currently say.
> Any UB exploits not catched by -fanalyzer would need to be disabled.
I'd be curious as to the hypothetical performance impact of this, as well as the amount of work it'd take to make -fanalyzer reliable enough.
> When writing Frama-C what I was thinking of was actually PVS-Studio (https://pvs-studio.com/), as this can also be used by students. It's also more of a standalone linter.
Ah, that's quite different.
> Yes, this is an escape hatch, when the pointer shenanigans can't be fully described.
Ah, that's fair. Bit different than what Rust offers, but what you say makes sense.
> But I heard Rust also has those.
Sort of yes, sort of no. Rust has an escape hatch in `unsafe`, but it technically doesn't disable any checks - for example, the borrow checker will check the validity of references, bounds checks will continue to be inserted, etc. regardless of whether you're in an `unsafe` block or not. What it does instead is to give you the ability to perform `unsafe` operations, which for borrow checker-related shenanigans would typically involve dealing with pointers (to be specific, dereferencing them since some other pointer operations are considered safe) since the borrow checker doesn't check pointers.
> I don't know enough Rust, but from what I read at https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html, yes it does. Specifying that the 'lifetime' of the return value corresponds to a parameter happens with the `returned` annotation.
That's one of the things named lifetimes let you do, but named lifetimes are flexible enough for more than just that use case, from slightly more complex things like dealing with multiple independent lifetimes at the same time (for example, returning two references instead of just one, or structs/functions using multiple lifetimes), to more obscure stuff like dealing with higher-ranked trait bounds [1].
Kind of a tangent, but the manual is a bit unclear to me as to what `returned` is capable of. It states "The returned annotation denotes a parameter that may be aliased by the return value.", but that's immediately followed by "Splint checks the call assuming the result may be an alias to the returned parameter." And later in the example it states "Because of the `returned` qualifier, Splint assumes the result of `intSet_insert` is the same storage as its first parameter, in this case the storage returned by `intSet_new`."
Does `returned` require that the annotated parameter correspond exactly to the return value, or can a "subset" of the parameter be returned? In more concrete terms, would something like the following be accepted (assuming the access is in bounds, of course)?
int *get_fifth(/*@returned*/ int *slice) {
return &slice[4];
}
What about instances where the lifetime of the output may be tied to multiple parameters? For example, consider the following Rust function:
fn max<'a>(a: &'a i32, b: &'a i32) -> &'a i32 {
if a > b { a } else { b }
}
My naive attempt at a translation to Splint would be:
int *max(/*@returned*/ int *a, /*@returned*/ int *b) {
if (*a > *b) { return a; } else { return b; }
}
Would that be accepted as well?
In general, though, it does seem that Splint is able to describe many common patterns the borrow checker is also able to cover. I suspect the differences (in both directions) are probably only going to emerge for more complex use cases.
> You also can combine this with parameters, so you can say that param1.baz[0..param2] is writable and param1.baz[0..param3] is readable and also that readable param1.baz[0..X] and writable param1.baz[0..Y] always means that X > Y.
To be honest I don't quite follow, but I would guess that Rust isn't capable of anything similar since there isn't a way to describe properties for a subset of a slice in signatures.
> How much of these features can be written in Rust? (Honest question)
Assuming I'm understanding these correctly:
- owned: Typically represented via Box<T> or plain non-reference types.
- allocated, initialized, readable, writable: I believe these are generally handled via MaybeUninit [0] because everything is otherwise assumed to be properly initialized. Readable/writable might need &/&mut on top of MaybeUninit.
- Other properties: Might depend on the exact properties, but I think stuff like tainted input would usually be represented directly in the type system - in this example, taintedness would be types and transitions would be functions. There are at least two ways to implement that that I can think of right now. The first is via a simple newtype (might have syntax errors, but the gist should be clear):
> That's one of the things named lifetimes let you do, but named lifetimes are flexible enough for more than just that use case, from slightly more complex things like dealing with multiple independent lifetimes at the same time (for example, returning two references instead of just one, or structs/functions using multiple lifetimes), to more obscure stuff like dealing with higher-ranked trait bounds [1].
In SPlint you would describe something as `special` and then describe the individual elements.
> Kind of a tangent, but the manual is a bit unclear to me as to what `returned` is capable of.
Aliasing is mutual and storage refers to the region an object lives. It doesn't refer to exact pointer values.
> Would that be accepted as well?
$ cat test.c
int *
max (/*@returned@*/ int * a, /*@returned@*/ int * b) {
if (*a > *b) return a; else return b;
}
int
main (void)
{
int * a = malloc (sizeof *a);
int * b = malloc (sizeof *b);
int * c;
if (!a || !b) abort ();
*a = 4;
*b = 6;
c = max (a, b);
free (a);
free (b);
free (c);
return 0;
}
$ splint test.c
Splint 3.1.1 --- 05 Jan 2023
Finished checking --- 2 code warnings
test.c: (in function main)
test.c:22:8: Dead storage c passed as out parameter to free: c
Memory is used after it has been released (either by passing as an only param
or assigning to an only global). (Use -usereleased to inhibit warning)
test.c:20:8: Storage c released
test.c:2:1: Function exported but not used outside test: max
A declaration is exported, but not used outside this module. Declaration can
use static qualifier. (Use -exportlocal to inhibit warning)
test.c:4:1: Definition of max
> In SPlint you would describe something as `special` and then describe the individual elements.
Interesting. That does seem to provide more flexibility, though based on what the manual says I still feel like it's not quite to the same level as named lifetimes since `special` looks like it revolves around allocation/initialization?
At least based on a quick skim of the manual structs with non-owning pointers still seems like a potential difference since from what I can tell struct field checks are either for ownership (/@only@/ for fields by default), initialization/allocation (`partial`, state clauses), or requires overhead (/@refs@/). Nothing quite like "the data here will live for some arbitrary lifetime(s) dictated by the use context".
> Aliasing is mutual and storage refers to the region an object lives. It doesn't refer to exact pointer values.
I... think that answers my question? Some quick tests seem to bear that out as well.
> $ cat test.c
So I think I'm a bit dim and for some reason I thought Splint was not free. Sorry for the bother! I could have tried things out myself this entire time!
Good to see that /@returned@/ works like I hoped at least.
> At least based on a quick skim of the manual structs with non-owning pointers still seems like a potential difference since from what I can tell struct field checks are either for ownership (/@only@/ for fields by default), initialization/allocation (`partial`, state clauses), or requires overhead (/@refs@/). Nothing quite like "the data here will live for some arbitrary lifetime(s) dictated by the use context".
Then I don't understand (your/Rusts) understanding of lifetimes. As to my understanding, the lifetime of an object is bound by the lifetime of the underlying allocation and is the time during which the storage of the allocation is initialized without interruption.
> I thought Splint was not free
It is for example in Debian, but I have recompiled it myself, since the Debian version has some bugs.
> As to my understanding, the lifetime of an object is bound by the lifetime of the underlying allocation and is the time during which the storage of the allocation is initialized without interruption.
I think that's more or less the same definition used by Rust. It's just that the borrow checker gives you some more options to use/manipulate lifetimes.
Not entirely sure this would help, the consider this type from earlier:
struct ByteSlice<'a> {
slice: &'a [u8],
}
This describes a struct containing a non-owning reference to a slice of bytes where the reference has some "placeholder" lifetime 'a (where 'a could be checked at the point of use and in a more abstract way without necessarily knowing about an actual underlying object). This could be handy if you need to refer to multiple subsets of a whole and don't want to make a copy - say, if you were writing a zero-copy parser. The borrow checker will ensure that the underlying data will always be valid for as long as this struct is used.
I didn't see an obvious analogous annotation in the Splint user manual that lets it check this kind of construct. This struct doesn't own the data it references, so /*@only@*/ doesn't apply. The reference itself as well as the data it references will be initialized since `MaybeUninit` isn't involved, so `partial` and state clauses don't apply. And no reference counting is involved, so /*@refs@*/ doesn't apply either. `dependent` seems like it might fit, but that isn't checked.
The names also let you be more specific with your lifetimes. For example, you could create a struct with references with two different but overlapping lifetimes (though I think such cases would be rare in practice):
My gut feeling is that while the high-level concepts might share a name I'm not actually sure if they're similar enough for useful transfer? The optimizer is working on a very different representation at a later stage of the compilation process, so I'm a bit skeptical about the level of similarity and/or transferability when you get down to details. I guess a more concrete example might be like comparing type inference with optimizer value range analysis - both will analyze a CFG, but beyond that they're working on different-enough representations that transforming work on the latter to useful work on the former seems unlikely to me (though I'm a nobody, so take that with an appropriate grain of salt).
For example, consider work on improved lifetime analysis in Clang [0], which seems to be discussing a from-scratch implementation based on concepts from Polonius and doesn't seem to reference anything from LLVM. And more generally, the fact that neither GCC nor Clang appear to have discussed reusing concepts from their optimizer passes to recreate the borrow checker or borrow checker-like functionality makes it seem more likely to me that there's some fundamental distinction and/or additional considerations that make such a project difficult.
> Ownership tracking programs for C have been existing for 20 years
Could you give some examples? Not sure I've heard of anything that would fill the same niche as the borrow checker.
> so it can't be too hard to integrate it in the compiler, once it has been implemented for one language.
Sorry, I'm getting a bit confused here. When you say "integrate it in the compiler", by "it" do you mean the above mentioned "ownership tracking programs for C", or do you mean features implemented in the GCC Rust frontend?
In any case, as far as the borrow checker goes gccrs is currently planning on reusing rustc's borrowck implementation so that's going to be a bit of a hurdle to integrating similar functionality into other frontends. I don't know whether they plan on eventually writing an independent borrow checking implementation. Not sure if you had other features in mind, either.
[0]: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-a...