> That's roughly how it works in C, and I know that it's also UB there if you do...

mk12 · 2025-05-21T04:27:19 1747801639

There is an important difference for this case though. It C it’s fine to have pointers into uninitialized memory as you as you don’t read them until after initializing. You can write through those pointers the same way you always do. In Rust it’s UB as soon as you “produce” an invalid value, which includes references to uninitialized memory. Everything uses references in Rust but when dealing with uninitialized memory you have to scrupulously avoid them, and instead write through raw pointers. This means you can’t reuse any code that writes through &mut. Also, the rules change over time. At one point I had unsafe code that had a Vec of uninitialized elements, which was ok because I never produced a reference to any element until after I had written them (through raw pointers). But they later changed the Vec docs to say that’s UB, I guess because they want to reserve the right to use references even if you never call a method that returns a reference.

Arnavion · 2025-05-21T04:50:31 1747803031

This stopped being much of a problem when MaybeUninit was stabilized. Now you can stick to using &MaybeUninit<T> / &mut MaybeUninit<T> instead of needing to juggle *T / *mut T and carefully track converting that to &T / &mut T only when it's known to be initialized, and you can't accidentally use a MaybeUninit<T> where you meant to use a T because the types are different.

It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.

codeflo · 2025-05-21T06:36:11 1747809371

I don’t get the difference. In both C and Rust you can have pointers to uninitialized memory. In both languages, you can’t use them except in very specific circumstances (which are AFAIK identical).

There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.

mk12 · 2025-05-21T17:14:57 1747847697

I agree with you. My point is that the additional feature (references) creates a new potential for UB that doesn’t exist in C, and that justifies the “doesn't really ever occupy my mind as a problem” statement being criticized upthread. You can’t compare C to Rust-without-references because no one writes Rust that way. It’s not like C++-without-exceptions which is a legitimate subset that people use.

nemothekid · 2025-05-21T04:55:09 1747803309

Bizarre. I think I've been writing broken Rust code for a couple years. If I understand you correctly something like:

    let mut data = Vec::with_capacity(sz);
    unsafe { data.set_len(sz) };
    buf.copy_to_slice(data.as_mut_slice());

is UB?

NobodyNada · 2025-05-21T15:04:25 1747839865

It's an open question whether creating a reference to an uninitialized value is instant UB, or only UB if that reference is misused (e.g. if copy_to_slice reads an uninitialized byte). The specific discussion is whether the language requires "recursive validity for references", which would mean constructing a reference to an invalid value is "language UB" (your program is not well specified and the compiler is allowed to "miscompile" it) rather than "library UB" (your program is well-specified, but functions you call might not expect an uninitialized buffer and trigger language UB). See the discussion here: https://github.com/rust-lang/unsafe-code-guidelines/issues/3...

Currently, the team is leaning in the direction of not requiring recursive validity for references. This would mean your code is not language UB as long as you can assume `set_len` and `copy_to_slice` never read from 'data`. However, it's still considered library UB, as this assumption is not documented or specified anywhere and is not guaranteed -- changes to safe code in your program or in the standard library can turn this into language UB, so by doing something like this you're writing fragile code that gives up a lot of Rust's safety by design.

ironhaven · 2025-05-21T05:13:08 1747804388

That's right. Line 3 is undefined behaviour because you are creating mutable references to the uninit spare capacity of the vec. copy_to_slice only works with writing to initialized slices. The proper way for you example to mess with the uninitialized memory on a vec would be only use raw pointers or calling the newly added Vec::spare_capacity_mut function on the vec that returns a slice of MaybeUninit

bombela · 2025-05-21T16:49:07 1747846147

Why not simply:

    let mut data = Vec::with_capacity(sz);
    data.extend(&buf[..sz]);

Vec::extend extends a container from an iterable. A Vec/slice is iterable.

And from the doc:

> This implementation is specialized for slice iterators, where it uses copy_from_slice to append the entire slice at once.

Of course this trivial example could also be written as:

    let mut data = buf.clone();

vgatherps · 2025-05-21T05:02:01 1747803721

Yes, this is the case that I ran into as well. You have to zero memory before reading and/or have some crazy combination of tracking what’s uninitialized capacity or initialized len, I think the rust stdlib write trait for &mut Vec got butchered over this concern.

It’s strictly more complicated and slower than the obvious thing to do and only exists to satisfy the abstract machine.

Arnavion · 2025-05-21T05:06:29 1747803989

No. The correct way to write that code is to use .spare_capacity_mut() to get a &mut [MaybeUninit<T>], then write your Ts into that using .write_copy_of_slice(), then .set_len(). And that will not be any slower (though obviously more complicated) than the original incorrect code.

vgatherps · 2025-05-21T06:11:26 1747807886

Oh this is very nice, I think it was stabilized since I wrote said code.

nemothekid · 2025-05-21T06:22:05 1747808525

write_copy_of_slice doesn't look to be stable. I'll mess around with godbolt, but my hope that whatever incantation is used compiles down to a memcpy

Arnavion · 2025-05-21T06:33:27 1747809207

As I wrote in https://news.ycombinator.com/item?id=44048391 , you have to get used to copying the libstd impl when working with MaybeUninit. For my code I put a "TODO(rustup)" comment on such copies, to remind myself to revisit them every time I update the Rust version in toolchain.toml

nemothekid · 2025-05-21T08:11:47 1747815107

In other words the """safe""" stable code looks like this:

    let mut data = Vec::with_capacity(sz);
    let mut dst_uninit = data.spare_capacity_mut();
    let uninit_src: &[MaybeUninit<T>] = unsafe { transmute(buf) };
    dst_uninit.copy_from_slice(uninit_src);
    unsafe { data.set_len(sz) };

Arnavion · 2025-05-21T09:10:44 1747818644

That's correct.

eptcyka · 2025-05-21T05:08:49 1747804129

Valgrind it :)

vlovich123 · 2025-05-21T14:06:26 1747836386

Valgrind doesn’t tell you about UB, just if the code did something incorrect with memory and that depends on what the optimizer did if you did write UB code. You’ll need Miri to tell you if this kind of code is triggering UB which works by evaluating and analyzing the mid level of compiler output to check if Rust rules about safety are followed.

eptcyka · 2025-05-22T08:32:54 1747902774

Reading from uninitialised memory is a fault that valgrind will detect.

vlovich123 · 2025-05-23T14:30:52 1748010652

But that’s precisely NOT the problem that exists in OPs code. It’s a problem Valgrind will detect if and only if the optimizer does something weird to exploit the UB in the code which may or may not happen AND doesn’t even necessarily happen on that line of code which will leave you scratching your head.

UB is weird and valgrind is not a tool for detecting UB. For that you want Miri or UBSAN. Valgrind’s equivalent is ASAN and MSAN which catch UB issues incidentally in some rare cases and not necessarily where the UB actually happened.

uecker · 2025-05-21T06:21:07 1747808467

It is also not UB to read uninitialized values through a pointer in C for types that do not have non-value representations.

usefulcat · 2025-05-21T04:32:55 1747801975

I suspect that the main reason it doesn't really occupy the author's mind is that even though it's possible to misuse read(), it's really not that hard to actually use it safely.

It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.

o11c · 2025-05-21T04:32:14 1747801934

The reason this particular UB doesn't need mindspace for C programmers is because it's not even meaningful to do anything with the parts of the buffer beyond the written length.

Most other UBs related to datums that you think you can do something with.

lhecker · 2025-05-21T09:32:39 1747819959

What I meant is that if I write a UTF8 --> UTF16 conversion function for my editor in C I can write

  size_t convert(state_t* state, const void* inp, void* out)

This function now works with both initialized and uninitialized data in practice. It also is transparent over whether the output buffer is an `u8` (a byte buffer to write it out into a `File`) or `u16` (a buffer for then using the UTF16). I've never had to think about whether this doesn't work (in this particular context; let's ignore any alignment concerns for writes into `out` in this example) and I don't recall running into any issues writing such code in a long long time.

If I write the equivalent code in Rust I may write

  fn convert(&mut self, inp: &[u8], out: &mut [MaybeUninit<u8>]) -> usize

The problem is now obvious to me, but at least my intention is clear: "Come here! Give me your uninitialized arrays! I don't care!". But this is not the end of the problem, because writing this code is theoretically unsafe. If you have a `[u8]` slice for `out` you have to convert it to `[MaybeUninit<u8>]`, but then the function could theoretically write uninitialized data and that's UB isn't it? So now I have to think about this problem and write this instead:

  fn convert(&mut self, inp: &[u8], out: &mut [u8]) -> usize

...and that will also be unsafe, because now I have to convert my actual `[MaybeUninit<u8>]` buffer (for file writes) to `[u8]` for calls to this API.

Long story short, this is a problem that occupies my mind when writing in Rust, but not in C. That doesn't mean that C's many unsafeties don't worry me, it just means that this _particular_ problem type described above doesn't come up as an issue in C code that I write.

Edit: Also, what usefulcat said.

ninkendo · 2025-05-21T12:01:14 1747828874

Why wouldn’t you accept a &mut [MaybeUninit<T>] and return a &mut [u8], hiding the unsafe bits that transmute the underlying reference?

Something like:

  fn convert<'i, 'o>(inp: &'i [u8], buf: &'o mut MaybeUninit<u8>) -> &'o mut [u8]

(Honest question, actually… because the above may be impossible to write and I’m on my phone and can’t try it.)

Edit: it works: https://play.rust-lang.org/?version=stable&mode=debug&editio...

lhecker · 2025-05-21T16:14:42 1747844082

That's a fair workaround for my specific example. But I believe it's possible to contrive a different example where such a solution would not be possible. Put differently, I only tried to convey the overall idea of what I think is a shortcoming in Rust at the moment.

Edit: Also, I believe your code would fail my second section, as the `convert` function would have difficulty accepting a `[u8]` slice. Converting `[u8]` to `[MaybeUninit<u8>]` is not safe per se.

ninkendo · 2025-05-21T23:04:47 1747868687

Yeah, you’d need to do something like accept an enum that is either &mut [u8] or &mut [MaybeUninit<u8>], and make a couple of impl From<>’s so callers can .into() whatever they want to pass…

But I don’t think this is really a shortcoming, so much as a simple consequence of strong typing. If you want take “whatever” as a parameter, you have to spell out the types that satisfy it, whether it’s via a trait, or an enum with specific variants, etc. You don’t get to just cast things to void and hope for the best, and still call the result safe.