> That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.
UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.
There is an important difference for this case though. It C it’s fine to have pointers into uninitialized memory as you as you don’t read them until after initializing. You can write through those pointers the same way you always do. In Rust it’s UB as soon as you “produce” an invalid value, which includes references to uninitialized memory. Everything uses references in Rust but when dealing with uninitialized memory you have to scrupulously avoid them, and instead write through raw pointers. This means you can’t reuse any code that writes through &mut. Also, the rules change over time. At one point I had unsafe code that had a Vec of uninitialized elements, which was ok because I never produced a reference to any element until after I had written them (through raw pointers). But they later changed the Vec docs to say that’s UB, I guess because they want to reserve the right to use references even if you never call a method that returns a reference.
This stopped being much of a problem when MaybeUninit was stabilized. Now you can stick to using &MaybeUninit<T> / &mut MaybeUninit<T> instead of needing to juggle *T / *mut T and carefully track converting that to &T / &mut T only when it's known to be initialized, and you can't accidentally use a MaybeUninit<T> where you meant to use a T because the types are different.
It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.
I don’t get the difference. In both C and Rust you can have pointers to uninitialized memory. In both languages, you can’t use them except in very specific circumstances (which are AFAIK identical).
There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.
I agree with you. My point is that the additional feature (references) creates a new potential for UB that doesn’t exist in C, and that justifies the “doesn't really ever occupy my mind as a problem” statement being criticized upthread. You can’t compare C to Rust-without-references because no one writes Rust that way. It’s not like C++-without-exceptions which is a legitimate subset that people use.
It's an open question whether creating a reference to an uninitialized value is instant UB, or only UB if that reference is misused (e.g. if copy_to_slice reads an uninitialized byte). The specific discussion is whether the language requires "recursive validity for references", which would mean constructing a reference to an invalid value is "language UB" (your program is not well specified and the compiler is allowed to "miscompile" it) rather than "library UB" (your program is well-specified, but functions you call might not expect an uninitialized buffer and trigger language UB). See the discussion here: https://github.com/rust-lang/unsafe-code-guidelines/issues/3...
Currently, the team is leaning in the direction of not requiring recursive validity for references. This would mean your code is not language UB as long as you can assume `set_len` and `copy_to_slice` never read from 'data`. However, it's still considered library UB, as this assumption is not documented or specified anywhere and is not guaranteed -- changes to safe code in your program or in the standard library can turn this into language UB, so by doing something like this you're writing fragile code that gives up a lot of Rust's safety by design.
That's right. Line 3 is undefined behaviour because you are creating mutable references to the uninit spare capacity of the vec. copy_to_slice only works with writing to initialized slices. The proper way for you example to mess with the uninitialized memory on a vec would be only use raw pointers or calling the newly added Vec::spare_capacity_mut function on the vec that returns a slice of MaybeUninit
Yes, this is the case that I ran into as well. You have to zero memory before reading and/or have some crazy combination of tracking what’s uninitialized capacity or initialized len, I think the rust stdlib write trait for &mut Vec got butchered over this concern.
It’s strictly more complicated and slower than the obvious thing to do and only exists to satisfy the abstract machine.
No. The correct way to write that code is to use .spare_capacity_mut() to get a &mut [MaybeUninit<T>], then write your Ts into that using .write_copy_of_slice(), then .set_len(). And that will not be any slower (though obviously more complicated) than the original incorrect code.
As I wrote in https://news.ycombinator.com/item?id=44048391 , you have to get used to copying the libstd impl when working with MaybeUninit. For my code I put a "TODO(rustup)" comment on such copies, to remind myself to revisit them every time I update the Rust version in toolchain.toml
Valgrind doesn’t tell you about UB, just if the code did something incorrect with memory and that depends on what the optimizer did if you did write UB code. You’ll need Miri to tell you if this kind of code is triggering UB which works by evaluating and analyzing the mid level of compiler output to check if Rust rules about safety are followed.
But that’s precisely NOT the problem that exists in OPs code. It’s a problem Valgrind will detect if and only if the optimizer does something weird to exploit the UB in the code which may or may not happen AND doesn’t even necessarily happen on that line of code which will leave you scratching your head.
UB is weird and valgrind is not a tool for detecting UB. For that you want Miri or UBSAN. Valgrind’s equivalent is ASAN and MSAN which catch UB issues incidentally in some rare cases and not necessarily where the UB actually happened.
I suspect that the main reason it doesn't really occupy the author's mind is that even though it's possible to misuse read(), it's really not that hard to actually use it safely.
It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.
The reason this particular UB doesn't need mindspace for C programmers is because it's not even meaningful to do anything with the parts of the buffer beyond the written length.
Most other UBs related to datums that you think you can do something with.
This function now works with both initialized and uninitialized data in practice. It also is transparent over whether the output buffer is an `u8` (a byte buffer to write it out into a `File`) or `u16` (a buffer for then using the UTF16). I've never had to think about whether this doesn't work (in this particular context; let's ignore any alignment concerns for writes into `out` in this example) and I don't recall running into any issues writing such code in a long long time.
If I write the equivalent code in Rust I may write
The problem is now obvious to me, but at least my intention is clear: "Come here! Give me your uninitialized arrays! I don't care!". But this is not the end of the problem, because writing this code is theoretically unsafe. If you have a `[u8]` slice for `out` you have to convert it to `[MaybeUninit<u8>]`, but then the function could theoretically write uninitialized data and that's UB isn't it? So now I have to think about this problem and write this instead:
...and that will also be unsafe, because now I have to convert my actual `[MaybeUninit<u8>]` buffer (for file writes) to `[u8]` for calls to this API.
Long story short, this is a problem that occupies my mind when writing in Rust, but not in C. That doesn't mean that C's many unsafeties don't worry me, it just means that this _particular_ problem type described above doesn't come up as an issue in C code that I write.
That's a fair workaround for my specific example. But I believe it's possible to contrive a different example where such a solution would not be possible. Put differently, I only tried to convey the overall idea of what I think is a shortcoming in Rust at the moment.
Edit: Also, I believe your code would fail my second section, as the `convert` function would have difficulty accepting a `[u8]` slice. Converting `[u8]` to `[MaybeUninit<u8>]` is not safe per se.
Yeah, you’d need to do something like accept an enum that is either &mut [u8] or &mut [MaybeUninit<u8>], and make a couple of impl From<>’s so callers can .into() whatever they want to pass…
But I don’t think this is really a shortcoming, so much as a simple consequence of strong typing. If you want take “whatever” as a parameter, you have to spell out the types that satisfy it, whether it’s via a trait, or an enum with specific variants, etc. You don’t get to just cast things to void and hope for the best, and still call the result safe.
UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.