...don't violate strict aliasing then?

matheusmoreira · on Oct 16, 2021

Aliasing is ubiquitous in systems programming. Reinterpreting arbitrary types as arrays of bytes, for example. Honestly, it's kind of amazing that C even has this rule. Seems to be an attempt to be competitive with Fortran.

simias · on Oct 16, 2021

It's definitely rife with footguns but you can very much write aliasing-safe systems code (or even kernel code, for that matter). Or at the very least you should strive to contain the alias-infringing code in well delimited sections of the source code that are built with special flags for instance.

If optimizations break code due to aliasing violations I would really recommend fixing the code, not turning the optimizations off!

Also if anything C's aliasing is less strict than Fortran by default, hence the later introduction of "restrict" to allow further optimizations.

matheusmoreira · on Oct 16, 2021

> you can very much write aliasing-safe systems code (or even kernel code, for that matter)

Yeah, sometimes it's possible. Usually by making a mess of everything with unions. If I remember correctly, type punning with unions is still illegal C code but in practice every compiler understands the idiom.

The simple and intuitive solution to many problems is to cast the data to the new pointer and work directly with it. This should always produce correct code no matter what. People think like this and they write code with these assumptions in mind. In practice, nobody really cares too much what the C standard says. What matters is whether the compilers produce the desired code.

saagarjha · on Oct 17, 2021

Pinning via unions is legal in C. Casting via a pointer is not and may produce incorrect code.

eperdew · on Oct 16, 2021

This is why the aliasing rules in C explicitly allow aliasing any pointer type with a char*.

matheusmoreira · on Oct 16, 2021

Yeah, but char sucks. It's not a synonym for octet. Not guaranteed to be 8 bits. Its signedness is even ambiguous unless unsigned is specified.

The correct type is uint8_t/u8. Sadly, using that to alias other types is undefined. I've also seen hashing algorithms which used uint16_t/u16, with similar aliasing optimization bugs. Sometimes you want to reinterpret things as a struct, too.

This is the reason why Linux compiles with strict aliasing disabled.

quietbritishjim · on Oct 16, 2021

Aliasing with unsigned char* is also allowed. (Oddly, signed char is not, even if char is signed.)

uint8_t is not guaranteed to be unsigned char but in practice almost always is. GCC did originally have separate 8 bit types when stdint.h was introduced but quickly changed to a typedef for char-based types precisely to allow using it for aliasing.

Yes, technically char may not be 8 bits but in practice that is very rare (and you can statically assert it).

Overall IMO the best solution is always use uint8_t and turn off optimisations on those rare weird platforms where it's not an alias for unsigned char for whatever reason.

simias · on Oct 16, 2021

I don't understand, if you're trying to work around aliasing restrictions why would you use `uint8_t*` in the first place?

By definition `sizeof(char) == 1`, so that's almost always what you want when messing with types in C anyway. What you want is bytes, not octets.

kevin_thibedeau · on Oct 16, 2021

chars can be two octets on some DSP platforms that lack byte addressability.

loeg · on Oct 16, 2021

sizeof(char) must always be 1, regardless of how many bits or octets that represents. On such a platform, uint8_t does not exist.

matheusmoreira · on Oct 16, 2021

> uint8_t is not guaranteed to be unsigned char but in practice almost always is.

Does this imply any unsigned char typedef is able to alias anything? Or is uint8_t a compiler special case?

simias · on Oct 16, 2021

`char` is not guaranteed to be 8 bits and in some more exotic environments (DSPs for instance) may not be.

IIRC POSIX guarantees that char is 8 bits though (but I still think that the sign is implementation-dependent).

But as I said in a parent comment, I don't understand why it's even relevant. If you want to alias any type then use `char *` and not anything else. I don't understand why one would prefer using stdint for that.

matheusmoreira · on Oct 16, 2021

Because sometimes people need to reinterpret data as an array of 8/16/32/64 bit elements. Sometimes people also need to reinterpret things as a structure.

This is independent of how many bytes the underlying platform can address. If we have 8 bit processing code but the platform can only address 16 bits at a time, it should be up to the compiler to generate code that works. Compilers already do stuff like that in other circumstances.

jeffbee · on Oct 16, 2021

Those people need to be copying, otherwise the reinterpretation might not be working. The char data might not be correctly aligned, for example. Recently went through a big nightmare where a C++ codebase that had accreted on x86 was thought to be ported to another platform where alignment actually matters and there were all manner of rare low-level malfunctions stemming from the idea that you can just wantonly cast a char* to structured data.

jcranmer · on Oct 16, 2021

The strict aliasing rules look through typedefs (and const/volatile-qualifiers). It's possible to use char, signed char, and unsigned char, or any typedef thereof, or any typedef of any typedef, etc., to access any memory whatsoever.